Add missing enrichment files

This commit is contained in:
Serafeim Chatzopoulos 2022-12-02 12:26:21 +02:00
parent 01e4744550
commit eb2364b8f4
2 changed files with 92 additions and 0 deletions

View File

@ -0,0 +1,38 @@
# Bulk Tagging/Deduction
The Deduction process (also known as “bulk tagging”) enriches each record with new information that can be derived from the existing property values.
This process is used to associate results to community/research initiatives that are part of OpenAIRE.
As of November 2022, three procedures are in place to relate a research product to a research initiative, infrastructure (RI) or community (RC) based on:
* subjects: it is possible to specify a list of subjects that are relevant for the RC/RI. Every time one of the subjects is found among the subjects of a result, the result is linked to the RC/RI.
<p align="center">
<img loading="lazy" alt="Bulktagging Subject" src="/img/docs/enrichment/bulktagging_subject.png" width="50%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
* data sources: it is possible to list a set of data sources relevant for the RC/RI. All the results collected from these data sources will be linked to the RC/RI
<p align="center">
<img loading="lazy" alt="Bulktagging Data source" src="/img/docs/enrichment/bulktagging_datasource.png" width="50%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
When only some results collected from a datasource are relevant for the RC/RI, it is possible to specify a set of selection constraints (SC) that have to be verified before linking the result to the
community. The selection constraint has the form <strong>SC = S1 or S2 or ... or Sn</strong>. The generic Si has the form <strong>Si = s<sub>i1</sub> and s<sub>i2</sub> and ...and s<sub>in</sub></strong> and each s<sub>ij</sub> is a condition on a specific field of the result. The set of fields that can be specified is <strong>F={title, author, contributor, description, orcid}</strong>,
while the set of condition can be among <strong>V={contains, equals, not_contains, not_equals, contains_ignorecase, equals_ignorecase, not_contains_ignorecase, not_equal_ignorecase}</strong>, and the value is free text.
A possible selection criteria can be: “All the products whose contributor contains DARIAH “
<p align="center">
<img loading="lazy" alt="Bulktagging Data source" src="/img/docs/enrichment/bulktagging_selconstraints.png" width="70%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
* Zenodo community: it is possible to list a set of Zenodo communities relevant for the RC/RI. All the products collected from the listed Zenodo communities are linked to the RC/RI
<p align="center">
<img loading="lazy" alt="Bulktagging Zenodo Community" src="/img/docs/enrichment/bulktagging_zenodo.png" width="50%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
The list of subjects, Zenodo communities and data sources used to enrich the products are defined by the managers of the community gateway or infrastructure monitoring dashboard associated with the RC/RI.

View File

@ -0,0 +1,54 @@
# Propagation
This process enriches the graph by adding new links and/or new properties. The new information is added by exploiting existing semantic
relationships and values between the involved entities
As of November 2022, the following procedures are in place:
* Country propagation: updates the property “country” of a results. This happens when the result is collected from an institutional datasource or when the datasource hosting the result is inserted in a whitelist. For all the results whose hosting datasource verifies one of the conditions above, the country of the organization providing the datasource is added to the country of the result: e.g. publication collected from an institutional repository maintained by an italian university will be enriched with the property “country = IT”.
<p align="center">
<img loading="lazy" alt="Country Propagation" src="/img/docs/enrichment/propagation_country.png" width="50%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
* Project propagation: adds a "isProducedBy" relationship (and its inverse) between a Project P and Result R1, if R1 has a strong semantic relationship with another Result R2 and P produces R2: e.g. publication linked to project P “is supplemented by” a dataset D. Dataset D will get the link to project P. The relationships considered for this procedure are “isSupplementedBy” and “isSupplementTo”.
<p align="center">
<img loading="lazy" alt="Project Propagation" src="/img/docs/enrichment/propagation_resulttoproject.png" width="40%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
* Result to RC/RI through organization propagation. The manager of the RC/RI can specify a set of organizations whose product are relevant for the
community.
Each result having such a relation of affiliation with at least one organization relevant for the RC/RI will be linked to it.
<p align="center">
<img loading="lazy" alt="Result to community through organization propagation" src="/img/docs/enrichment/propagation_resulttocommunitythroughorganization.png" width="50%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
* Result to RC/RI through semantic relation: extends the set of products linked to a RC/RI by exploiting strong semantic relationships between the results;
e.g. if a result R1 is associated to the community C and is supplemented by a result R2 then the result R2 will be linked to the community. The relationships considered for this procedure are “isSupplementedBy” and “supplements”.
<p align="center">
<img loading="lazy" alt="Result to community through semantic relation propagation" src="/img/docs/enrichment/propagation_resulttocommunitythroughsemrel.png" width="40%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
* ORCID identifiers to result through semantic relation. This propagation enriches the results by adding ORCID identifiers to authors. The added ORCID will be marked as "potential" since they have been inserted through propagation.
The process considers the set of overlapping authors between results (R1 and R2) linked with a strong semantic relationship (IsSupplementedBy, IsSupplementTo).
For each author A in the overlapping set, if R1 provides the ORCID value for A and R2 does not, then the author A in R2 will be enriched with the information of the ORCID found in R1.
<p align="center">
<img loading="lazy" alt="Orcid propation through semantic relation" src="/img/docs/enrichment/propagation_orcid.png" width="40%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
* affiliation to organization through institutional repository. This propagation adds one "hasAuthorInstitution" relationship (and its inverse)
between a Result R and Organization O,
if R was collected from a datasource D with type institutional repository, and D was provided by O.
<p align="center">
<img loading="lazy" alt="Affiliation propagation through institutional repository" src="/img/docs/enrichment/propagation_affiliationistrepo.png" width="40%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
* affiliation to organization through semantic relation. This propagation adds one "hasAuthorInstitution" relationship (and its inverse) between a
Result R and an Organization O,
if R has an affiliation relation with an organization O1 that is in relation "isChildOf" with O.
<p align="center">
<img loading="lazy" alt="Affiliation propagation through semantic relation" src="/img/docs/enrichment/propagation_organizationsemrel.png" width="40%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
The algorithm exploits only the organization leaves that are in a "IsChildOf" relation with another organization. So far one single step is done
<p align="center">
<img loading="lazy" alt="propagation strategy" src="/img/docs/enrichment/organization_tree.png" width="40%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>