Merge with main

This commit is contained in:
Serafeim Chatzopoulos 2022-12-21 19:13:15 +02:00
parent fdc331641d
commit 79e3a5b563
39 changed files with 29 additions and 27 deletions

View File

Before

Width:  |  Height:  |  Size: 38 KiB

After

Width:  |  Height:  |  Size: 38 KiB

View File

Before

Width:  |  Height:  |  Size: 38 KiB

After

Width:  |  Height:  |  Size: 38 KiB

View File

Before

Width:  |  Height:  |  Size: 44 KiB

After

Width:  |  Height:  |  Size: 44 KiB

View File

Before

Width:  |  Height:  |  Size: 96 KiB

After

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 180 KiB

View File

Before

Width:  |  Height:  |  Size: 256 KiB

After

Width:  |  Height:  |  Size: 256 KiB

View File

Before

Width:  |  Height:  |  Size: 68 KiB

After

Width:  |  Height:  |  Size: 68 KiB

View File

Before

Width:  |  Height:  |  Size: 51 KiB

After

Width:  |  Height:  |  Size: 51 KiB

View File

Before

Width:  |  Height:  |  Size: 74 KiB

After

Width:  |  Height:  |  Size: 74 KiB

View File

Before

Width:  |  Height:  |  Size: 32 KiB

After

Width:  |  Height:  |  Size: 32 KiB

View File

Before

Width:  |  Height:  |  Size: 54 KiB

After

Width:  |  Height:  |  Size: 54 KiB

View File

Before

Width:  |  Height:  |  Size: 474 KiB

After

Width:  |  Height:  |  Size: 474 KiB

View File

Before

Width:  |  Height:  |  Size: 32 KiB

After

Width:  |  Height:  |  Size: 32 KiB

View File

Before

Width:  |  Height:  |  Size: 90 KiB

After

Width:  |  Height:  |  Size: 90 KiB

View File

Before

Width:  |  Height:  |  Size: 30 KiB

After

Width:  |  Height:  |  Size: 30 KiB

View File

Before

Width:  |  Height:  |  Size: 30 KiB

After

Width:  |  Height:  |  Size: 30 KiB

View File

Before

Width:  |  Height:  |  Size: 53 KiB

After

Width:  |  Height:  |  Size: 53 KiB

View File

Before

Width:  |  Height:  |  Size: 43 KiB

After

Width:  |  Height:  |  Size: 43 KiB

View File

Before

Width:  |  Height:  |  Size: 41 KiB

After

Width:  |  Height:  |  Size: 41 KiB

View File

Before

Width:  |  Height:  |  Size: 38 KiB

After

Width:  |  Height:  |  Size: 38 KiB

View File

Before

Width:  |  Height:  |  Size: 37 KiB

After

Width:  |  Height:  |  Size: 37 KiB

View File

Before

Width:  |  Height:  |  Size: 43 KiB

After

Width:  |  Height:  |  Size: 43 KiB

View File

@ -5,7 +5,7 @@ The OpenAIRE Graph comprises several types of [entities](../category/entities) a
The latest version of the JSON schema can be found on the [Downloads](../downloads/full-graph) section.
<p align="center">
<img loading="lazy" alt="Data model" src="/img/docs/data-model.png" width="80%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Data model" src={require('../assets/img/data-model.png').default} width="80%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
The figure above, presents the graph's data model.

View File

@ -14,7 +14,7 @@ The transformation process includes the application of cleaning functions whose
In addition, the OpenAIRE Graph is extended with other relevant scholarly communication sources that need special handling, either because they do not strictly follow the OpenAIRE Guidelines or due to the vast amount of data of data they offer (e.g. DOIBoost, that merges Crossref, ORCID, Microsoft Academic Graph, and Unpaywall).
<p align="center">
<img loading="lazy" alt="Aggregation" src="/img/docs/aggregation.png" width="65%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Aggregation" src={require('../../assets/img/aggregation.png').default} width="65%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
The OpenAIRE aggregation system collects information about objects of the research life-cycle compliant to the [OpenAIRE acquisition policy](https://www.openaire.eu/content-acquisition-policy) from [different types of data sources](https://explore.openaire.eu/search/find/dataproviders):

View File

@ -3,5 +3,6 @@
OpenAIRE collects metadata records from more than 70K scholarly communication sources from all over the world, including Open Access institutional repositories, data archives, journals. All the metadata records (i.e. descriptions of research products) are put together in a data lake, together with records from Crossref, Unpaywall, ORCID, Grid.ac, and information about projects provided by national and international funders. Dedicated inference algorithms applied to metadata and to the full-texts of Open Access publications enrich the content of the data lake with links between research results and projects, author affiliations, subject classification, links to entries from domain-specific databases. Duplicated organisations and results are identified and merged together to obtain an open, trusted, public resource enabling explorations of the scholarly communication landscape like never before.
<p align="center">
<img loading="lazy" alt="Data provision" src="/img/docs/architecture.png" width="120%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Data provision" src={require('../assets/img/architecture.png').default} width="100%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>

View File

@ -8,13 +8,13 @@ As of November 2022, three procedures are in place to relate a research product
* subjects: it is possible to specify a list of subjects that are relevant for the RC/RI. Every time one of the subjects is found among the subjects of a result, the result is linked to the RC/RI.
<p align="center">
<img loading="lazy" alt="Bulktagging Subject" src="/img/docs/enrichment/bulktagging_subject.png" width="50%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Bulktagging Subject" src={require('../../assets/img/enrichment/bulktagging_subject.png').default} width="50%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
* data sources: it is possible to list a set of data sources relevant for the RC/RI. All the results collected from these data sources will be linked to the RC/RI
<p align="center">
<img loading="lazy" alt="Bulktagging Data source" src="/img/docs/enrichment/bulktagging_datasource.png" width="50%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Bulktagging Data source" src={require('../../assets/img/enrichment/bulktagging_datasource.png').default} width="50%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
When only some results collected from a datasource are relevant for the RC/RI, it is possible to specify a set of selection constraints (SC) that have to be verified before linking the result to the
@ -23,14 +23,14 @@ while the set of condition can be among <strong>V={contains, equals, not_contain
A possible selection criteria can be: “All the products whose contributor contains DARIAH “
<p align="center">
<img loading="lazy" alt="Bulktagging Data source" src="/img/docs/enrichment/bulktagging_selconstraints.png" width="70%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Bulktagging Data source" src={require('../../assets/img/enrichment/bulktagging_selconstraints.png').default} width="70%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
* Zenodo community: it is possible to list a set of Zenodo communities relevant for the RC/RI. All the products collected from the listed Zenodo communities are linked to the RC/RI
<p align="center">
<img loading="lazy" alt="Bulktagging Zenodo Community" src="/img/docs/enrichment/bulktagging_zenodo.png" width="50%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Bulktagging Zenodo Community" src={require('../../assets/img/enrichment/bulktagging_zenodo.png').default} width="50%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>

View File

@ -7,38 +7,39 @@ As of November 2022, the following procedures are in place:
* Country propagation: updates the property “country” of a results. This happens when the result is collected from an institutional datasource or when the datasource hosting the result is inserted in a whitelist. For all the results whose hosting datasource verifies one of the conditions above, the country of the organization providing the datasource is added to the country of the result: e.g. publication collected from an institutional repository maintained by an italian university will be enriched with the property “country = IT”.
<p align="center">
<img loading="lazy" alt="Country Propagation" src="/img/docs/enrichment/propagation_country.png" width="50%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Country Propagation" src={require('../../assets/img/enrichment/propagation_country.png').default} width="50%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
* Project propagation: adds a "isProducedBy" relationship (and its inverse) between a Project P and Result R1, if R1 has a strong semantic relationship with another Result R2 and P produces R2: e.g. publication linked to project P “is supplemented by” a dataset D. Dataset D will get the link to project P. The relationships considered for this procedure are “isSupplementedBy” and “isSupplementTo”.
<p align="center">
<img loading="lazy" alt="Project Propagation" src="/img/docs/enrichment/propagation_resulttoproject.png" width="40%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Project Propagation" src={require('../../assets/img/enrichment/propagation_resulttoproject.png').default} width="40%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
* Result to RC/RI through organization propagation. The manager of the RC/RI can specify a set of organizations whose product are relevant for the
community.
Each result having such a relation of affiliation with at least one organization relevant for the RC/RI will be linked to it.
<p align="center">
<img loading="lazy" alt="Result to community through organization propagation" src="/img/docs/enrichment/propagation_resulttocommunitythroughorganization.png" width="50%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Result to community through organization propagation" src={require('../../assets/img/enrichment/propagation_resulttocommunitythroughorganization.png').default}
width="50%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
* Result to RC/RI through semantic relation: extends the set of products linked to a RC/RI by exploiting strong semantic relationships between the results;
e.g. if a result R1 is associated to the community C and is supplemented by a result R2 then the result R2 will be linked to the community. The relationships considered for this procedure are “isSupplementedBy” and “supplements”.
<p align="center">
<img loading="lazy" alt="Result to community through semantic relation propagation" src="/img/docs/enrichment/propagation_resulttocommunitythroughsemrel.png" width="40%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Result to community through semantic relation propagation" src={require('../../assets/img/enrichment/propagation_resulttocommunitythroughsemrel.png').default} width="40%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
* ORCID identifiers to result through semantic relation. This propagation enriches the results by adding ORCID identifiers to authors. The added ORCID will be marked as "potential" since they have been inserted through propagation.
The process considers the set of overlapping authors between results (R1 and R2) linked with a strong semantic relationship (IsSupplementedBy, IsSupplementTo).
For each author A in the overlapping set, if R1 provides the ORCID value for A and R2 does not, then the author A in R2 will be enriched with the information of the ORCID found in R1.
<p align="center">
<img loading="lazy" alt="Orcid propation through semantic relation" src="/img/docs/enrichment/propagation_orcid.png" width="40%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Orcid propation through semantic relation" src={require('../../assets/img/enrichment/propagation_orcid.png').default} width="40%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
* affiliation to organization through institutional repository. This propagation adds one "hasAuthorInstitution" relationship (and its inverse)
between a Result R and Organization O,
if R was collected from a datasource D with type institutional repository, and D was provided by O.
<p align="center">
<img loading="lazy" alt="Affiliation propagation through institutional repository" src="/img/docs/enrichment/propagation_affiliationistrepo.png" width="40%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Affiliation propagation through institutional repository" src={require('../../assets/img/enrichment/propagation_affiliationistrepo.png').default} width="40%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
* affiliation to organization through semantic relation. This propagation adds one "hasAuthorInstitution" relationship (and its inverse) between a
@ -46,9 +47,9 @@ Result R and an Organization O,
if R has an affiliation relation with an organization O1 that is in relation "isChildOf" with O.
<p align="center">
<img loading="lazy" alt="Affiliation propagation through semantic relation" src="/img/docs/enrichment/propagation_organizationsemrel.png" width="40%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Affiliation propagation through semantic relation" src={require('../../assets/img/enrichment/propagation_organizationsemrel.png').default} width="40%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
The algorithm exploits only the organization leaves that are in a "IsChildOf" relation with another organization. So far one single step is done
<p align="center">
<img loading="lazy" alt="propagation strategy" src="/img/docs/enrichment/organization_tree.png" width="40%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="propagation strategy" src={require('../../assets/img/enrichment/organization_tree.png').default} width="40%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>

View File

@ -10,7 +10,7 @@ The deduplication process can be divided into three different phases:
* Duplicates grouping (transitive closure)
<p align="center">
<img loading="lazy" alt="Deduplication Workflow" src="/img/docs/deduplication-workflow.png" width="100%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Deduplication Workflow" src={require('../../assets/img/deduplication-workflow.png').default} width="85%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
### Candidate identification (clustering)

View File

@ -43,7 +43,7 @@ The comparison goes through the following decision tree:
5. *legalname check*: comparison of the normalized `legalnames` with the `Jaro-Winkler` distance to determine if it is higher than `0.9`. If so, a similarity relation is drawn. Otherwise, no similarity relation is drawn.
<p align="center">
<img loading="lazy" alt="Organization Decision Tree" src="/img/docs/decisiontree-organization.png" width="100%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Organization Decision Tree" src={require('../../assets/img/decisiontree-organization.png').default} width="100%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
[//]: # (Link to the image: https://docs.google.com/drawings/d/1YKInGGtHu09QG4pT2gRLEum4LxU82d4nKkvGNvRQmrg/edit?usp=sharing)

View File

@ -34,7 +34,7 @@ The comparison goes through different stages:
5. *strong check*: comparison composed by three substages involving the (i) comparison of the author list sizes and the version of the record to determine if they are coherent, (ii) comparison of the record titles with the Levenshtein distance to determine if it is higher than 0.99, (iii) "smart" comparison of the author lists to check if common authors are more than 60%.
<p align="center">
<img loading="lazy" alt="Publications Decision Tree" src="/img/docs/decisiontree-publication.png" width="100%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Publications Decision Tree" src={require('../../assets/img/decisiontree-publication.png').default} width="100%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
[//]: # (Link to the image: https://docs.google.com/drawings/d/19SIilTp1vukw6STMZuPMdc0pv0ODYCiOxP7OU3iPWK8/edit?usp=sharing)
@ -47,7 +47,7 @@ The comparison goes through different stages:
3. *strong check*: comparison of the record titles with Levenshtein distance. If the measure is above 0.99, then the similarity relation is drawn
<p align="center">
<img loading="lazy" alt="Software Decision Tree" src="/img/docs/decisiontree-software.png" width="85%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Software Decision Tree" src={require('../../assets/img/decisiontree-software.png').default} width="85%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
[//]: # (Link to the image: https://docs.google.com/drawings/d/19gd1-GTOEEo6awMObGRkYFhpAlO_38mfbDFFX0HAkuo/edit?usp=sharing)
@ -57,7 +57,7 @@ For each pair of datasets or other types of research products in a cluster the s
The decision tree is almost identical to the publication decision tree, with the only exception of the *instance type check* stage. Since such type of record does not have a relatable instance type, the check is not performed and the decision tree node is skipped.
<p align="center">
<img loading="lazy" alt="Dataset and Other types of research products Decision Tree" src="/img/docs/decisiontree-dataset-orp.png" width="90%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
<img loading="lazy" alt="Dataset and Other types of research products Decision Tree" src={require('../../assets/img/decisiontree-dataset-orp.png').default} width="90%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
[//]: # (Link to the image: https://docs.google.com/drawings/d/1uBa7Bw2KwBRDUYIfyRr_Keol7UOeyvMNN7MPXYLg4qw/edit?usp=sharing)

View File

@ -31,18 +31,18 @@ Also consider adding one of the following badges to your service with the approp
<div className="row">
<div className="col col--4 left-badge">
<a target="_blank" href={require('../assets/openaire-badge-1.zip').default} download>
<img loading="lazy" alt="Openaire badge" src={require('../assets/openaire-badge-1.png').default} className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module pagination-nav__link" style={{ paddingTop: '1.2em', paddingBottom: '1.2em'}} title="Click to download"/>
<a target="_blank" href={require('../assets/badges/openaire-badge-1.zip').default} download>
<img loading="lazy" alt="Openaire badge" src={require('../assets/badges/openaire-badge-1.png').default} className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module pagination-nav__link" style={{ paddingTop: '1.2em', paddingBottom: '1.2em'}} title="Click to download"/>
</a>
</div>
<div className="col col--4 mid-badge">
<a target="_blank" href={require('../assets/openaire-badge-2.zip').default} download>
<img loading="lazy" alt="Openaire badge" src={require('../assets/openaire-badge-2.png').default} className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module pagination-nav__link dark-badge" style={{ paddingTop: '1.2em', paddingBottom: '1.2em'}} title="Click to download"/>
<a target="_blank" href={require('../assets/badges/openaire-badge-2.zip').default} download>
<img loading="lazy" alt="Openaire badge" src={require('../assets/badges/openaire-badge-2.png').default} className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module pagination-nav__link dark-badge" style={{ paddingTop: '1.2em', paddingBottom: '1.2em'}} title="Click to download"/>
</a>
</div>
<div className="col col--4 right-badge">
<a target="_blank" href={require('../assets/openaire-badge-3.zip').default} download>
<img loading="lazy" alt="Openaire badge" src={require('../assets/openaire-badge-3.png').default} className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module pagination-nav__link" style={{ paddingTop: '1.2em', paddingBottom: '1.2em'}} title="Click to download"/>
<a target="_blank" href={require('../assets/badges/openaire-badge-3.zip').default} download>
<img loading="lazy" alt="Openaire badge" src={require('../assets/badges/openaire-badge-3.png').default} className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module pagination-nav__link" style={{ paddingTop: '1.2em', paddingBottom: '1.2em'}} title="Click to download"/>
</a>
</div>
</div>

View File

@ -20,7 +20,7 @@
Manola, Natalia and
Principe, Pedro},
title = {OpenAIRE Research Graph Dump},
month = jun,
month = Jun,
year = 2022,
note = {{A new version of this dataset is published every 6
months. The content available on the OpenAIRE

Binary file not shown.

Before

Width:  |  Height:  |  Size: 278 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 83 KiB