[UsageCount] Usage count per result split by datasource #318

miriam.baglioni · 2023-06-30T18:53:29+02:00

miriam.baglioni commented

2023-06-30 18:53:29 +02:00

This PR extends the code for the ingestion of the Usage Count at the level of the result. It splits the count for result wrt the Datasource contributing to that count. For each indicator one unit is specified for each datasource contributing to that indicator value. The datasource key is the value of the key element in the unit for the measure, while the count for that datasource is in the value.
So given that the downloads and views for R1 come from three different data sources fake1, fake2, and fake3 as

datasource_fake_identifier_1 => downloads = 0, views = 5
datasource_fake_identifier_2 => downloads = 1, views = 1
datasource_fake_identifier_3 => downloads = 3, views = 9

we will get for R1 an element measures as

{
   "measures":[
      {
         "id":"downloads",
         "unit":[
            {
               "key":"10|datasource_fake_identifier_1",
               "value":"0",
               "dataInfo":{
                [...]
               }
            },
            {
               "key":"10|datasource_fake_identifier_2",
               "value":"1",
               "dataInfo":{
                [...]
               }
            },
            {
               "key":"10|datasource_fake_identifier_3",
               "value":"3",
               "dataInfo":{
                [...]
               }
            }
         ]
      },
      {
         "id":"views",
         "unit":[
            {
               "key":"10|datasource_fake_identifier_1",
               "value":"5",
               "dataInfo":{
                [...]
               }
            },
            {
               "key":"10|datasource_fake_identifier_2",
               "value":"1",
               "dataInfo":{
                [...]
               }
            },
            {
               "key":"10|datasource_fake_identifier_3",
               "value":"9",
               "dataInfo":{
                [...]
               }
            }
         ]
      }
   ]
}

The JSON above is the serialization of the internal model, our idea is to produce the following XML snippet out of it, to be made part of the result level information on the Solr records.

<measure id="downloads" count="0" datasource="datasource_fake_identifier_1" />
<measure id="views" count="5" datasource="datasource_fake_identifier_1" />

<measure id="downloads" count="1" datasource="datasource_fake_identifier_2" />
<measure id="views" count="1" datasource="datasource_fake_identifier_2" />

<measure id="downloads" count="3" datasource="datasource_fake_identifier_3" />
<measure id="views" count="9" datasource="datasource_fake_identifier_3" />

would this serialisation be ok for the portal presentation requirements? The information we are missing here is the datasource name, but the same information is available in both the collectedfrom and hostedby elements.

This PR extends the code for the ingestion of the Usage Count at the level of the result. It splits the count for result wrt the Datasource contributing to that count. For each indicator one unit is specified for each datasource contributing to that indicator value. The datasource key is the value of the key element in the unit for the measure, while the count for that datasource is in the value. So given that the downloads and views for R1 come from three different data sources fake1, fake2, and fake3 as ``` datasource_fake_identifier_1 => downloads = 0, views = 5 datasource_fake_identifier_2 => downloads = 1, views = 1 datasource_fake_identifier_3 => downloads = 3, views = 9 ``` we will get for R1 an element measures as ```json { "measures":[ { "id":"downloads", "unit":[ { "key":"10|datasource_fake_identifier_1", "value":"0", "dataInfo":{ [...] } }, { "key":"10|datasource_fake_identifier_2", "value":"1", "dataInfo":{ [...] } }, { "key":"10|datasource_fake_identifier_3", "value":"3", "dataInfo":{ [...] } } ] }, { "id":"views", "unit":[ { "key":"10|datasource_fake_identifier_1", "value":"5", "dataInfo":{ [...] } }, { "key":"10|datasource_fake_identifier_2", "value":"1", "dataInfo":{ [...] } }, { "key":"10|datasource_fake_identifier_3", "value":"9", "dataInfo":{ [...] } } ] } ] } ``` The JSON above is the serialization of the internal model, our idea is to produce the following XML snippet out of it, to be made part of the result level information on the Solr records. ``` <measure id="downloads" count="0" datasource="datasource_fake_identifier_1" /> <measure id="views" count="5" datasource="datasource_fake_identifier_1" /> <measure id="downloads" count="1" datasource="datasource_fake_identifier_2" /> <measure id="views" count="1" datasource="datasource_fake_identifier_2" /> <measure id="downloads" count="3" datasource="datasource_fake_identifier_3" /> <measure id="views" count="9" datasource="datasource_fake_identifier_3" /> ``` would this serialisation be ok for the portal presentation requirements? The information we are missing here is the datasource name, but the same information is available in both the `collectedfrom` and `hostedby` elements.

alessia.bardi was assigned by miriam.baglioni

2023-06-30 18:53:29 +02:00

claudio.atzori was assigned by miriam.baglioni

2023-06-30 18:53:29 +02:00

miriam.baglioni added 1 commit 2023-06-30 18:53:30 +02:00

55ea485783 [UsageCount] split the count for result at the level of the datasource. for each indicator one unit is specified for each datasource contrinuting to that indicator value. The datasource key is the value of the key element in the unit for the measure, while the count for that datasource is in the value

miriam.baglioni added 1 commit 2023-06-30 19:05:21 +02:00

4c9bc4c3a5 refactoring

claudio.atzori added this to the OpenAIRE project 2023-10-16 11:27:09 +02:00

claudio.atzori modified the project from OpenAIRE to OpenAIRE - DNet

2023-10-26 09:59:04 +02:00

konstantina.galouni commented

2023-10-26 13:01:57 +02:00

Hello! The pull request seems very nice! I will plan to adjust the parsing code in the portal side. Just please keep me posted when the update will be available.

konstantina.galouni commented

2023-11-08 14:33:39 +01:00

Hi again @miriam.baglioni and @claudio.atzori! I was just checking this again and i am wondering if it is possible to also include the data source name together with the data source identifier for display purposes.

claudio.atzori commented

2023-11-09 11:20:23 +01:00

Hi again @miriam.baglioni and @claudio.atzori! I was just checking this again and i am wondering if it is possible to also include the data source name together with the data source identifier for display purposes.

Hi Konstantina! The solution that Miriam and I proposed was based on the assumption to not alter the model from how it is currently defined. So we cannot include another field at the same level of the key element to store the datasource name, but we can agree on something dirtier, e.g. combining the datasource id and its name in the key field, concatenating the two strings with a separator character/sequence in between.

Would this be acceptable for you?

> Hi again @miriam.baglioni and @claudio.atzori! I was just checking this again and i am wondering if it is possible to also include the data source name together with the data source identifier for display purposes. Hi Konstantina! The solution that Miriam and I proposed was based on the assumption to *not* alter the model from how it is currently defined. So we cannot include another field at the same level of the `key` element to store the datasource name, but we can agree on something dirtier, e.g. combining the datasource id and its name in the `key` field, concatenating the two strings with a separator character/sequence in between. Would this be acceptable for you?

konstantina.galouni commented

2023-11-27 14:42:41 +01:00

Hi Claudio! Apologies for the late response. It completely slipped my attention.
Yes, combining the datasource id and its name in the key field sounds fine. Maybe we could use the same format/hack we are applying for some of the refine filters: id||name.
What do you think?

Hi Claudio! Apologies for the late response. It completely slipped my attention. Yes, combining the datasource id and its name in the key field sounds fine. Maybe we could use the same format/hack we are applying for some of the refine filters: **id||name**. What do you think?

miriam.baglioni commented

2023-11-27 15:08:16 +01:00

Hi @konstantina.galouni OK I will use || as the split character

Hi @konstantina.galouni OK I will use **||** as the split character

konstantina.galouni commented

2023-11-27 16:35:17 +01:00

Great! Thank you very much, Miriam!

miriam.baglioni added 2 commits 2024-01-29 18:12:40 +01:00

e9131f4e4a mergin with branch beta

a418dacb47 [UsageCount] code extention to include also the name of the datasource

miriam.baglioni changed title from ~~WIP: [UsageCount] Usage count per result split by datasource~~ to [UsageCount] Usage count per result split by datasource

2024-01-30 12:17:05 +01:00

miriam.baglioni commented

2024-01-30 12:18:29 +01:00

The code was extended to include also the name of the datasource and not only its identifier as requested

miriam.baglioni merged commit 64cbd8abe9 into beta

2024-04-02 10:21:40 +02:00

miriam.baglioni referenced this issue from a commit

2024-04-02 10:21:40 +02:00

Merge pull request '[UsageCount] Usage count per result split by datasource' (#318) from UsageStatsRecordDS into beta

Sign in to join this conversation.

No reviewers