[UsageCount] Usage count per result split by datasource #318

Merged
miriam.baglioni merged 4 commits from UsageStatsRecordDS into beta 2024-04-02 10:21:40 +02:00

This PR extends the code for the ingestion of the Usage Count at the level of the result. It splits the count for result wrt the Datasource contributing to that count. For each indicator one unit is specified for each datasource contributing to that indicator value. The datasource key is the value of the key element in the unit for the measure, while the count for that datasource is in the value.
So given that the downloads and views for R1 come from three different data sources fake1, fake2, and fake3 as

datasource_fake_identifier_1 => downloads = 0, views = 5
datasource_fake_identifier_2 => downloads = 1, views = 1
datasource_fake_identifier_3 => downloads = 3, views = 9

we will get for R1 an element measures as

{
   "measures":[
      {
         "id":"downloads",
         "unit":[
            {
               "key":"10|datasource_fake_identifier_1",
               "value":"0",
               "dataInfo":{
                [...]
               }
            },
            {
               "key":"10|datasource_fake_identifier_2",
               "value":"1",
               "dataInfo":{
                [...]
               }
            },
            {
               "key":"10|datasource_fake_identifier_3",
               "value":"3",
               "dataInfo":{
                [...]
               }
            }
         ]
      },
      {
         "id":"views",
         "unit":[
            {
               "key":"10|datasource_fake_identifier_1",
               "value":"5",
               "dataInfo":{
                [...]
               }
            },
            {
               "key":"10|datasource_fake_identifier_2",
               "value":"1",
               "dataInfo":{
                [...]
               }
            },
            {
               "key":"10|datasource_fake_identifier_3",
               "value":"9",
               "dataInfo":{
                [...]
               }
            }
         ]
      }
   ]
}

The JSON above is the serialization of the internal model, our idea is to produce the following XML snippet out of it, to be made part of the result level information on the Solr records.

<measure id="downloads" count="0" datasource="datasource_fake_identifier_1" />
<measure id="views" count="5" datasource="datasource_fake_identifier_1" />

<measure id="downloads" count="1" datasource="datasource_fake_identifier_2" />
<measure id="views" count="1" datasource="datasource_fake_identifier_2" />

<measure id="downloads" count="3" datasource="datasource_fake_identifier_3" />
<measure id="views" count="9" datasource="datasource_fake_identifier_3" />

would this serialisation be ok for the portal presentation requirements? The information we are missing here is the datasource name, but the same information is available in both the collectedfrom and hostedby elements.

This PR extends the code for the ingestion of the Usage Count at the level of the result. It splits the count for result wrt the Datasource contributing to that count. For each indicator one unit is specified for each datasource contributing to that indicator value. The datasource key is the value of the key element in the unit for the measure, while the count for that datasource is in the value. So given that the downloads and views for R1 come from three different data sources fake1, fake2, and fake3 as ``` datasource_fake_identifier_1 => downloads = 0, views = 5 datasource_fake_identifier_2 => downloads = 1, views = 1 datasource_fake_identifier_3 => downloads = 3, views = 9 ``` we will get for R1 an element measures as ```json { "measures":[ { "id":"downloads", "unit":[ { "key":"10|datasource_fake_identifier_1", "value":"0", "dataInfo":{ [...] } }, { "key":"10|datasource_fake_identifier_2", "value":"1", "dataInfo":{ [...] } }, { "key":"10|datasource_fake_identifier_3", "value":"3", "dataInfo":{ [...] } } ] }, { "id":"views", "unit":[ { "key":"10|datasource_fake_identifier_1", "value":"5", "dataInfo":{ [...] } }, { "key":"10|datasource_fake_identifier_2", "value":"1", "dataInfo":{ [...] } }, { "key":"10|datasource_fake_identifier_3", "value":"9", "dataInfo":{ [...] } } ] } ] } ``` The JSON above is the serialization of the internal model, our idea is to produce the following XML snippet out of it, to be made part of the result level information on the Solr records. ``` <measure id="downloads" count="0" datasource="datasource_fake_identifier_1" /> <measure id="views" count="5" datasource="datasource_fake_identifier_1" /> <measure id="downloads" count="1" datasource="datasource_fake_identifier_2" /> <measure id="views" count="1" datasource="datasource_fake_identifier_2" /> <measure id="downloads" count="3" datasource="datasource_fake_identifier_3" /> <measure id="views" count="9" datasource="datasource_fake_identifier_3" /> ``` would this serialisation be ok for the portal presentation requirements? The information we are missing here is the datasource name, but the same information is available in both the `collectedfrom` and `hostedby` elements.
alessia.bardi was assigned by miriam.baglioni 2023-06-30 18:53:29 +02:00
claudio.atzori was assigned by miriam.baglioni 2023-06-30 18:53:29 +02:00
miriam.baglioni added 1 commit 2023-06-30 18:53:30 +02:00
miriam.baglioni added 1 commit 2023-06-30 19:05:21 +02:00
claudio.atzori added this to the OpenAIRE project 2023-10-16 11:27:09 +02:00
claudio.atzori modified the project from OpenAIRE to OpenAIRE - DNet 2023-10-26 09:59:04 +02:00
Collaborator

Hello! The pull request seems very nice! I will plan to adjust the parsing code in the portal side. Just please keep me posted when the update will be available.

Hello! The pull request seems very nice! I will plan to adjust the parsing code in the portal side. Just please keep me posted when the update will be available.
Collaborator

Hi again @miriam.baglioni and @claudio.atzori! I was just checking this again and i am wondering if it is possible to also include the data source name together with the data source identifier for display purposes.

Hi again @miriam.baglioni and @claudio.atzori! I was just checking this again and i am wondering if it is possible to also include the data source name together with the data source identifier for display purposes.

Hi again @miriam.baglioni and @claudio.atzori! I was just checking this again and i am wondering if it is possible to also include the data source name together with the data source identifier for display purposes.

Hi Konstantina! The solution that Miriam and I proposed was based on the assumption to not alter the model from how it is currently defined. So we cannot include another field at the same level of the key element to store the datasource name, but we can agree on something dirtier, e.g. combining the datasource id and its name in the key field, concatenating the two strings with a separator character/sequence in between.

Would this be acceptable for you?

> Hi again @miriam.baglioni and @claudio.atzori! I was just checking this again and i am wondering if it is possible to also include the data source name together with the data source identifier for display purposes. Hi Konstantina! The solution that Miriam and I proposed was based on the assumption to *not* alter the model from how it is currently defined. So we cannot include another field at the same level of the `key` element to store the datasource name, but we can agree on something dirtier, e.g. combining the datasource id and its name in the `key` field, concatenating the two strings with a separator character/sequence in between. Would this be acceptable for you?
Collaborator

Hi Claudio! Apologies for the late response. It completely slipped my attention.
Yes, combining the datasource id and its name in the key field sounds fine. Maybe we could use the same format/hack we are applying for some of the refine filters: id||name.
What do you think?

Hi Claudio! Apologies for the late response. It completely slipped my attention. Yes, combining the datasource id and its name in the key field sounds fine. Maybe we could use the same format/hack we are applying for some of the refine filters: **id||name**. What do you think?
Author
Member

Hi @konstantina.galouni OK I will use || as the split character

Hi @konstantina.galouni OK I will use **||** as the split character
Collaborator

Great! Thank you very much, Miriam!

Great! Thank you very much, Miriam!
miriam.baglioni added 2 commits 2024-01-29 18:12:40 +01:00
miriam.baglioni changed title from WIP: [UsageCount] Usage count per result split by datasource to [UsageCount] Usage count per result split by datasource 2024-01-30 12:17:05 +01:00
Author
Member

The code was extended to include also the name of the datasource and not only its identifier as requested

The code was extended to include also the name of the datasource and not only its identifier as requested
miriam.baglioni merged commit 64cbd8abe9 into beta 2024-04-02 10:21:40 +02:00
Sign in to join this conversation.
No description provided.