[UsageCount] Usage count per result split by datasource #318

Merged
miriam.baglioni merged 4 commits from UsageStatsRecordDS into beta 4 weeks ago
Collaborator

This PR extends the code for the ingestion of the Usage Count at the level of the result. It splits the count for result wrt the Datasource contributing to that count. For each indicator one unit is specified for each datasource contributing to that indicator value. The datasource key is the value of the key element in the unit for the measure, while the count for that datasource is in the value.
So given that the downloads and views for R1 come from three different data sources fake1, fake2, and fake3 as

datasource_fake_identifier_1 => downloads = 0, views = 5
datasource_fake_identifier_2 => downloads = 1, views = 1
datasource_fake_identifier_3 => downloads = 3, views = 9

we will get for R1 an element measures as

{
   "measures":[
      {
         "id":"downloads",
         "unit":[
            {
               "key":"10|datasource_fake_identifier_1",
               "value":"0",
               "dataInfo":{
                [...]
               }
            },
            {
               "key":"10|datasource_fake_identifier_2",
               "value":"1",
               "dataInfo":{
                [...]
               }
            },
            {
               "key":"10|datasource_fake_identifier_3",
               "value":"3",
               "dataInfo":{
                [...]
               }
            }
         ]
      },
      {
         "id":"views",
         "unit":[
            {
               "key":"10|datasource_fake_identifier_1",
               "value":"5",
               "dataInfo":{
                [...]
               }
            },
            {
               "key":"10|datasource_fake_identifier_2",
               "value":"1",
               "dataInfo":{
                [...]
               }
            },
            {
               "key":"10|datasource_fake_identifier_3",
               "value":"9",
               "dataInfo":{
                [...]
               }
            }
         ]
      }
   ]
}

The JSON above is the serialization of the internal model, our idea is to produce the following XML snippet out of it, to be made part of the result level information on the Solr records.

<measure id="downloads" count="0" datasource="datasource_fake_identifier_1" />
<measure id="views" count="5" datasource="datasource_fake_identifier_1" />

<measure id="downloads" count="1" datasource="datasource_fake_identifier_2" />
<measure id="views" count="1" datasource="datasource_fake_identifier_2" />

<measure id="downloads" count="3" datasource="datasource_fake_identifier_3" />
<measure id="views" count="9" datasource="datasource_fake_identifier_3" />

would this serialisation be ok for the portal presentation requirements? The information we are missing here is the datasource name, but the same information is available in both the collectedfrom and hostedby elements.

This PR extends the code for the ingestion of the Usage Count at the level of the result. It splits the count for result wrt the Datasource contributing to that count. For each indicator one unit is specified for each datasource contributing to that indicator value. The datasource key is the value of the key element in the unit for the measure, while the count for that datasource is in the value. So given that the downloads and views for R1 come from three different data sources fake1, fake2, and fake3 as ``` datasource_fake_identifier_1 => downloads = 0, views = 5 datasource_fake_identifier_2 => downloads = 1, views = 1 datasource_fake_identifier_3 => downloads = 3, views = 9 ``` we will get for R1 an element measures as ```json { "measures":[ { "id":"downloads", "unit":[ { "key":"10|datasource_fake_identifier_1", "value":"0", "dataInfo":{ [...] } }, { "key":"10|datasource_fake_identifier_2", "value":"1", "dataInfo":{ [...] } }, { "key":"10|datasource_fake_identifier_3", "value":"3", "dataInfo":{ [...] } } ] }, { "id":"views", "unit":[ { "key":"10|datasource_fake_identifier_1", "value":"5", "dataInfo":{ [...] } }, { "key":"10|datasource_fake_identifier_2", "value":"1", "dataInfo":{ [...] } }, { "key":"10|datasource_fake_identifier_3", "value":"9", "dataInfo":{ [...] } } ] } ] } ``` The JSON above is the serialization of the internal model, our idea is to produce the following XML snippet out of it, to be made part of the result level information on the Solr records. ``` <measure id="downloads" count="0" datasource="datasource_fake_identifier_1" /> <measure id="views" count="5" datasource="datasource_fake_identifier_1" /> <measure id="downloads" count="1" datasource="datasource_fake_identifier_2" /> <measure id="views" count="1" datasource="datasource_fake_identifier_2" /> <measure id="downloads" count="3" datasource="datasource_fake_identifier_3" /> <measure id="views" count="9" datasource="datasource_fake_identifier_3" /> ``` would this serialisation be ok for the portal presentation requirements? The information we are missing here is the datasource name, but the same information is available in both the `collectedfrom` and `hostedby` elements.
alessia.bardi was assigned by miriam.baglioni 10 months ago
claudio.atzori was assigned by miriam.baglioni 10 months ago
miriam.baglioni added 1 commit 10 months ago
miriam.baglioni added 1 commit 10 months ago
claudio.atzori added this to the OpenAIRE project 6 months ago
claudio.atzori modified the project from OpenAIRE to OpenAIRE - DNet 6 months ago

Hello! The pull request seems very nice! I will plan to adjust the parsing code in the portal side. Just please keep me posted when the update will be available.

Hello! The pull request seems very nice! I will plan to adjust the parsing code in the portal side. Just please keep me posted when the update will be available.

Hi again @miriam.baglioni and @claudio.atzori! I was just checking this again and i am wondering if it is possible to also include the data source name together with the data source identifier for display purposes.

Hi again @miriam.baglioni and @claudio.atzori! I was just checking this again and i am wondering if it is possible to also include the data source name together with the data source identifier for display purposes.
Owner

Hi again @miriam.baglioni and @claudio.atzori! I was just checking this again and i am wondering if it is possible to also include the data source name together with the data source identifier for display purposes.

Hi Konstantina! The solution that Miriam and I proposed was based on the assumption to not alter the model from how it is currently defined. So we cannot include another field at the same level of the key element to store the datasource name, but we can agree on something dirtier, e.g. combining the datasource id and its name in the key field, concatenating the two strings with a separator character/sequence in between.

Would this be acceptable for you?

> Hi again @miriam.baglioni and @claudio.atzori! I was just checking this again and i am wondering if it is possible to also include the data source name together with the data source identifier for display purposes. Hi Konstantina! The solution that Miriam and I proposed was based on the assumption to *not* alter the model from how it is currently defined. So we cannot include another field at the same level of the `key` element to store the datasource name, but we can agree on something dirtier, e.g. combining the datasource id and its name in the `key` field, concatenating the two strings with a separator character/sequence in between. Would this be acceptable for you?

Hi Claudio! Apologies for the late response. It completely slipped my attention.
Yes, combining the datasource id and its name in the key field sounds fine. Maybe we could use the same format/hack we are applying for some of the refine filters: id||name.
What do you think?

Hi Claudio! Apologies for the late response. It completely slipped my attention. Yes, combining the datasource id and its name in the key field sounds fine. Maybe we could use the same format/hack we are applying for some of the refine filters: **id||name**. What do you think?
Poster
Collaborator

Hi @konstantina.galouni OK I will use || as the split character

Hi @konstantina.galouni OK I will use **||** as the split character

Great! Thank you very much, Miriam!

Great! Thank you very much, Miriam!
miriam.baglioni added 2 commits 3 months ago
miriam.baglioni changed title from WIP: [UsageCount] Usage count per result split by datasource to [UsageCount] Usage count per result split by datasource 3 months ago
Poster
Collaborator

The code was extended to include also the name of the datasource and not only its identifier as requested

The code was extended to include also the name of the datasource and not only its identifier as requested
miriam.baglioni merged commit 64cbd8abe9 into beta 4 weeks ago
The pull request has been merged as 64cbd8abe9.
You can also view command line instructions.

Step 1:

From your project repository, check out a new branch and test the changes.
git checkout -b UsageStatsRecordDS beta
git pull origin UsageStatsRecordDS

Step 2:

Merge the changes and update on Gitea.
git checkout beta
git merge --no-ff UsageStatsRecordDS
git push origin beta
Sign in to join this conversation.
No reviewers
No Milestone
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#318
Loading…
There is no content yet.