Update graph name, logo, and badges
Before Width: | Height: | Size: 38 KiB After Width: | Height: | Size: 70 KiB |
Before Width: | Height: | Size: 38 KiB After Width: | Height: | Size: 60 KiB |
Before Width: | Height: | Size: 44 KiB After Width: | Height: | Size: 72 KiB |
Before Width: | Height: | Size: 420 KiB After Width: | Height: | Size: 394 KiB |
|
@ -1,6 +1,6 @@
|
||||||
# Data model
|
# Data model
|
||||||
|
|
||||||
The OpenAIRE Research Graph comprises several types of [entities](../category/entities) and [relationships](./relationships) among them.
|
The OpenAIRE Graph comprises several types of [entities](../category/entities) and [relationships](./relationships) among them.
|
||||||
|
|
||||||
The latest version of the JSON schema can be found on the [Downloads](../downloads/full-graph) section.
|
The latest version of the JSON schema can be found on the [Downloads](../downloads/full-graph) section.
|
||||||
|
|
||||||
|
@ -20,6 +20,6 @@ responsible for operating data sources or consisting the affiliations of Product
|
||||||
|
|
||||||
:::note Further reading
|
:::note Further reading
|
||||||
|
|
||||||
A detailed report on the OpenAIRE Research Graph Data Model can be found on [Zenodo](https://zenodo.org/record/2643199).
|
A detailed report on the OpenAIRE Graph Data Model can be found on [Zenodo](https://zenodo.org/record/2643199).
|
||||||
:::
|
:::
|
||||||
|
|
||||||
|
|
|
@ -3,6 +3,6 @@
|
||||||
"position": 1,
|
"position": 1,
|
||||||
"link": {
|
"link": {
|
||||||
"type": "generated-index",
|
"type": "generated-index",
|
||||||
"description": "The main entities of the OpenAIRE Research Graph are listed below."
|
"description": "The main entities of the OpenAIRE Graph are listed below."
|
||||||
}
|
}
|
||||||
}
|
}
|
|
@ -1,6 +1,6 @@
|
||||||
# PIDs and identifiers
|
# PIDs and identifiers
|
||||||
|
|
||||||
One of the challenges towards the stability of the contents in the OpenAIRE Research Graph consists of making its identifiers and records stable over time.
|
One of the challenges towards the stability of the contents in the OpenAIRE Graph consists of making its identifiers and records stable over time.
|
||||||
The barriers to this scenario are many, as the Graph keeps a map of data sources that is subject to constant variations: records in repositories vary in content,
|
The barriers to this scenario are many, as the Graph keeps a map of data sources that is subject to constant variations: records in repositories vary in content,
|
||||||
original IDs, and PIDs, may disappear or reappear, and the same holds for the repository or the metadata collection it exposes.
|
original IDs, and PIDs, may disappear or reappear, and the same holds for the repository or the metadata collection it exposes.
|
||||||
Not only, but the mappings applied to the original contents may also change and improve over time to catch up with the changes in the input records.
|
Not only, but the mappings applied to the original contents may also change and improve over time to catch up with the changes in the input records.
|
||||||
|
|
|
@ -4,14 +4,14 @@ sidebar_position: 1
|
||||||
|
|
||||||
# Aggregation
|
# Aggregation
|
||||||
|
|
||||||
OpenAIRE materializes an open, participatory research graph (the OpenAIRE Research Graph) where products of the research life-cycle (e.g. scientific literature, research data, project, software) are semantically linked to each other and carry information about their access rights (i.e. if they are Open Access, Restricted, Embargoed, or Closed) and the sources from which they have been collected and where they are hosted. The OpenAIRE Research Graph is materialised via a set of autonomic, orchestrated workflows operating in a regimen of continuous data aggregation and integration. [1]
|
OpenAIRE materializes an open, participatory research graph (the OpenAIRE Graph) where products of the research life-cycle (e.g. scientific literature, research data, project, software) are semantically linked to each other and carry information about their access rights (i.e. if they are Open Access, Restricted, Embargoed, or Closed) and the sources from which they have been collected and where they are hosted. The OpenAIRE Graph is materialised via a set of autonomic, orchestrated workflows operating in a regimen of continuous data aggregation and integration. [1]
|
||||||
|
|
||||||
## What does OpenAIRE collect?
|
## What does OpenAIRE collect?
|
||||||
|
|
||||||
OpenAIRE aggregates metadata records describing objects of the research life-cycle from content providers compliant to the [OpenAIRE guidelines](https://guidelines.openaire.eu/) and from entity registries (i.e. data sources offering authoritative lists of entities, like [OpenDOAR](https://v2.sherpa.ac.uk/opendoar/), [re3data](https://www.re3data.org/), [DOAJ](https://doaj.org/), and various funder databases). After collection, metadata are transformed according to the OpenAIRE internal metadata model, which is used to generate the final OpenAIRE Research Graph, accessible from the [OpenAIRE EXPLORE portal](https://explore.openaire.eu) and the [APIs](https://graph.openaire.eu/develop/).
|
OpenAIRE aggregates metadata records describing objects of the research life-cycle from content providers compliant to the [OpenAIRE guidelines](https://guidelines.openaire.eu/) and from entity registries (i.e. data sources offering authoritative lists of entities, like [OpenDOAR](https://v2.sherpa.ac.uk/opendoar/), [re3data](https://www.re3data.org/), [DOAJ](https://doaj.org/), and various funder databases). After collection, metadata are transformed according to the OpenAIRE internal metadata model, which is used to generate the final OpenAIRE Graph, accessible from the [OpenAIRE EXPLORE portal](https://explore.openaire.eu) and the [APIs](https://graph.openaire.eu/develop/).
|
||||||
|
|
||||||
The transformation process includes the application of cleaning functions whose goal is to ensure that values are harmonised according to a common format (e.g. dates as YYYY-MM-dd) and, whenever applicable, to a common controlled vocabulary. The controlled vocabularies used for cleansing are accessible at [api.openaire.eu/vocabularies](https://api.openaire.eu/vocabularies/). Each vocabulary features a set of controlled terms, each with one code, one label, and a set of synonyms. If a synonym is found as field value, the value is updated with the corresponding term.
|
The transformation process includes the application of cleaning functions whose goal is to ensure that values are harmonised according to a common format (e.g. dates as YYYY-MM-dd) and, whenever applicable, to a common controlled vocabulary. The controlled vocabularies used for cleansing are accessible at [api.openaire.eu/vocabularies](https://api.openaire.eu/vocabularies/). Each vocabulary features a set of controlled terms, each with one code, one label, and a set of synonyms. If a synonym is found as field value, the value is updated with the corresponding term.
|
||||||
In addition, the OpenAIRE Research Graph is extended with other relevant scholarly communication sources that need special handling, either because they do not strictly follow the OpenAIRE Guidelines or due to the vast amount of data of data they offer (e.g. DOIBoost, that merges Crossref, ORCID, Microsoft Academic Graph, and Unpaywall).
|
In addition, the OpenAIRE Graph is extended with other relevant scholarly communication sources that need special handling, either because they do not strictly follow the OpenAIRE Guidelines or due to the vast amount of data of data they offer (e.g. DOIBoost, that merges Crossref, ORCID, Microsoft Academic Graph, and Unpaywall).
|
||||||
|
|
||||||
<p align="center">
|
<p align="center">
|
||||||
<img loading="lazy" alt="Aggregation" src={require('../../assets/img/aggregation.png').default} width="65%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
|
<img loading="lazy" alt="Aggregation" src={require('../../assets/img/aggregation.png').default} width="65%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
|
||||||
|
@ -30,7 +30,7 @@ Relationships between objects are collected from the data sources, but also auto
|
||||||
|
|
||||||
## What kind of data sources are in OpenAIRE?
|
## What kind of data sources are in OpenAIRE?
|
||||||
|
|
||||||
Objects and relationships in the OpenAIRE Research Graph are extracted from information packages, i.e. metadata records, collected from data sources of the following kinds:
|
Objects and relationships in the OpenAIRE Graph are extracted from information packages, i.e. metadata records, collected from data sources of the following kinds:
|
||||||
|
|
||||||
- *Literature, Institutional and thematic repositories*: Information systems where scientists upload the bibliographic metadata and full-texts of their articles, due to obligations from their organization or due to community practices (e.g. ArXiv, Europe PMC);
|
- *Literature, Institutional and thematic repositories*: Information systems where scientists upload the bibliographic metadata and full-texts of their articles, due to obligations from their organization or due to community practices (e.g. ArXiv, Europe PMC);
|
||||||
- *Open Access Publishers and journals*: Information system of open access publishers or relative journals, which offer bibliographic metadata and PDFs of their published articles;
|
- *Open Access Publishers and journals*: Information system of open access publishers or relative journals, which offer bibliographic metadata and PDFs of their published articles;
|
||||||
|
|
|
@ -33,7 +33,7 @@ The metadata collection process identifies the most recent record date available
|
||||||
|
|
||||||
### Entity Mapping
|
### Entity Mapping
|
||||||
|
|
||||||
The table below describes the mapping from the XML baseline records to the OpenAIRE Research Graph dump format.
|
The table below describes the mapping from the XML baseline records to the OpenAIRE Graph dump format.
|
||||||
|
|
||||||
| OpenAIRE Result field path | Datacite record JSON path | # Notes |
|
| OpenAIRE Result field path | Datacite record JSON path | # Notes |
|
||||||
|--------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
|--------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||||
|
|
|
@ -68,7 +68,7 @@ Records in Crossref are ruled out according to the following criteria
|
||||||
|
|
||||||
Records with `type=dataset` are mapped into OpenAIRE results of type dataset. All others are mapped as OpenAIRE results of type publication.
|
Records with `type=dataset` are mapped into OpenAIRE results of type dataset. All others are mapped as OpenAIRE results of type publication.
|
||||||
|
|
||||||
### Mapping Crossref properties into the OpenAIRE Research Graph
|
### Mapping Crossref properties into the OpenAIRE Graph
|
||||||
|
|
||||||
Properties in OpenAIRE results are set based on the logic described in the following table:
|
Properties in OpenAIRE results are set based on the logic described in the following table:
|
||||||
|
|
||||||
|
@ -222,7 +222,7 @@ Miriam will modify the process to ensure that:
|
||||||
* Only papers with DOI are considered
|
* Only papers with DOI are considered
|
||||||
* Since for the same DOI we have multiple version of item with different MAG PaperId, we only take one per DOI (the last one we process). We call this dataset `Papers_distinct`
|
* Since for the same DOI we have multiple version of item with different MAG PaperId, we only take one per DOI (the last one we process). We call this dataset `Papers_distinct`
|
||||||
|
|
||||||
When mapping MAG records to the OpenAIRE Research Graph, we consider the following MAG tables:
|
When mapping MAG records to the OpenAIRE Graph, we consider the following MAG tables:
|
||||||
* `PaperAbstractsInvertedIndex`: for the paper abstracts
|
* `PaperAbstractsInvertedIndex`: for the paper abstracts
|
||||||
* `Authors`: for the authors. The MAG data is pre-processed by grouping authors by PaperId
|
* `Authors`: for the authors. The MAG data is pre-processed by grouping authors by PaperId
|
||||||
* `Affiliations` and `PaperAuthorAffiliations`: to generate links between publications and organisations
|
* `Affiliations` and `PaperAuthorAffiliations`: to generate links between publications and organisations
|
||||||
|
|
|
@ -69,7 +69,7 @@ curl -s "https://www.ebi.ac.uk/europepmc/webservices/rest/MED/33024307/datalinks
|
||||||
```
|
```
|
||||||
|
|
||||||
## Mapping
|
## Mapping
|
||||||
The table below describes the mapping from the EBI links records to the OpenAIRE Research Graph dump format.
|
The table below describes the mapping from the EBI links records to the OpenAIRE Graph dump format.
|
||||||
We filter all the target links with pid type **ena**, **pdb** or **uniprot**
|
We filter all the target links with pid type **ena**, **pdb** or **uniprot**
|
||||||
For each target we construct a Bioentity with the following mapping
|
For each target we construct a Bioentity with the following mapping
|
||||||
|
|
||||||
|
|
|
@ -12,7 +12,7 @@ Pubmed exposes an entry point FTP with all the updates for each one. [ftp baseli
|
||||||
|
|
||||||
## Entity Mapping
|
## Entity Mapping
|
||||||
|
|
||||||
The table below describes the mapping from the XML baseline records to the OpenAIRE Research Graph dump format.
|
The table below describes the mapping from the XML baseline records to the OpenAIRE Graph dump format.
|
||||||
|
|
||||||
| OpenAIRE Result field path | PubMed record field xpath | Notes |
|
| OpenAIRE Result field path | PubMed record field xpath | Notes |
|
||||||
|--------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
|--------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||||
|
|
|
@ -14,7 +14,7 @@ The data curation activity is twofold, on one end pivots around the disambiguati
|
||||||
Duplicates among organizations are therefore managed through three different stages:
|
Duplicates among organizations are therefore managed through three different stages:
|
||||||
* *Creation of Suggestions*: executes an automatic workflow that performs the deduplication and prepare new suggestions for the curators to be processed;
|
* *Creation of Suggestions*: executes an automatic workflow that performs the deduplication and prepare new suggestions for the curators to be processed;
|
||||||
* *Curation*: manual editing of the organization records performed by the data curators;
|
* *Curation*: manual editing of the organization records performed by the data curators;
|
||||||
* *Creation of Representative Organizations*: executes an automatic workflow that creates curated organizations and exposes them on the OpenAIRE Research Graph by using the curators' feedback from the OpenOrgs underlying database.
|
* *Creation of Representative Organizations*: executes an automatic workflow that creates curated organizations and exposes them on the OpenAIRE Graph by using the curators' feedback from the OpenOrgs underlying database.
|
||||||
|
|
||||||
The next sections describe the above mentioned stages.
|
The next sections describe the above mentioned stages.
|
||||||
|
|
||||||
|
@ -61,7 +61,7 @@ Note that if a curator does not provide a feedback on a similarity relation sugg
|
||||||
|
|
||||||
### Creation of Representative Organizations
|
### Creation of Representative Organizations
|
||||||
|
|
||||||
This stage executes an automatic workflow that faces the *duplicates grouping* stage to create representative organizations and to update them on the OpenAIRE Research Graph. Such organizations are obtained via transitive closure and the relations used comes from the curators' feedback gathered on the OpenOrgs underlying Database.
|
This stage executes an automatic workflow that faces the *duplicates grouping* stage to create representative organizations and to update them on the OpenAIRE Graph. Such organizations are obtained via transitive closure and the relations used comes from the curators' feedback gathered on the OpenOrgs underlying Database.
|
||||||
|
|
||||||
#### Duplicates grouping (transitive closure)
|
#### Duplicates grouping (transitive closure)
|
||||||
|
|
||||||
|
|
|
@ -2,11 +2,11 @@
|
||||||
|
|
||||||
At the very end of the graph production workflow, a step is dedicated to perform certain finalisation operations, that we describe in this page,
|
At the very end of the graph production workflow, a step is dedicated to perform certain finalisation operations, that we describe in this page,
|
||||||
aiming to improve the overall quality of the data.
|
aiming to improve the overall quality of the data.
|
||||||
The output of this final step is the final version of the OpenAIRE Research Graph.
|
The output of this final step is the final version of the OpenAIRE Graph.
|
||||||
|
|
||||||
## Filtering
|
## Filtering
|
||||||
|
|
||||||
Bibliographic records that do not meet minimal requirements for being part of the OpenAIRE Research Graph are eliminated during this phase.
|
Bibliographic records that do not meet minimal requirements for being part of the OpenAIRE Graph are eliminated during this phase.
|
||||||
Currently, the only criteria applied horizontally to the entire graph aims at excluding scientific results whose title is not meaningful for citation purposes.
|
Currently, the only criteria applied horizontally to the entire graph aims at excluding scientific results whose title is not meaningful for citation purposes.
|
||||||
Then, different criteria are applied in the pre-processing of specific sub-collections:
|
Then, different criteria are applied in the pre-processing of specific sub-collections:
|
||||||
|
|
||||||
|
|
|
@ -1,16 +1,16 @@
|
||||||
# Indexing
|
# Indexing
|
||||||
|
|
||||||
The final version of the OpenAIRE Research Graph is indexed on a Solr server that is used by the OpenAIRE portals ([EXPLORE](https://explore.openaire.eu), [CONNECT](https://connect.openaire.eu), [PROVIDE](https://provide.openaire.eu)) and APIs, the latter adopted by several third-party applications and organizations, such as:
|
The final version of the OpenAIRE Graph is indexed on a Solr server that is used by the OpenAIRE portals ([EXPLORE](https://explore.openaire.eu), [CONNECT](https://connect.openaire.eu), [PROVIDE](https://provide.openaire.eu)) and APIs, the latter adopted by several third-party applications and organizations, such as:
|
||||||
|
|
||||||
* The OpenAIRE Graph APIs and Portals will offer to the EOSC (European Open Science Cloud) an Open Science Resource Catalogue, keeping an up to date map of all research results (publications, datasets, software), services, organizations, projects, funders in Europe and beyond.
|
* The OpenAIRE Graph APIs and Portals will offer to the EOSC (European Open Science Cloud) an Open Science Resource Catalogue, keeping an up to date map of all research results (publications, datasets, software), services, organizations, projects, funders in Europe and beyond.
|
||||||
|
|
||||||
* DSpace & EPrints repositories can install the OpenAIRE plugin to expose OpenAIRE compliant metadata records via their OAI-PMH endpoint and offer to researchers the possibility to link their depositions to the funding project, by selecting it from the list of project provided by OpenAIRE.
|
* DSpace & EPrints repositories can install the OpenAIRE plugin to expose OpenAIRE compliant metadata records via their OAI-PMH endpoint and offer to researchers the possibility to link their depositions to the funding project, by selecting it from the list of project provided by OpenAIRE.
|
||||||
|
|
||||||
* EC participant portal (Sygma - System for Grant Management) uses the OpenAIRE API in the “Continuous Reporting” section. Sygma automatically fetches from the OpenAIRE Search API the list of publications and datasets in the OpenAIRE Research Graph that are linked to the project. The user can select the research products from the list and easily compile the continuous reporting data of the project.
|
* EC participant portal (Sygma - System for Grant Management) uses the OpenAIRE API in the “Continuous Reporting” section. Sygma automatically fetches from the OpenAIRE Search API the list of publications and datasets in the OpenAIRE Graph that are linked to the project. The user can select the research products from the list and easily compile the continuous reporting data of the project.
|
||||||
|
|
||||||
* ScholExplorer is used by different players of the scholarly communication ecosystem. For example, [Elsevier](https://www.elsevier.com/authors/tools-and-resources/research-data/data-base-linking) uses its API to make the links between
|
* ScholExplorer is used by different players of the scholarly communication ecosystem. For example, [Elsevier](https://www.elsevier.com/authors/tools-and-resources/research-data/data-base-linking) uses its API to make the links between
|
||||||
publications and datasets automatically appear on ScienceDirect.
|
publications and datasets automatically appear on ScienceDirect.
|
||||||
ScholExplorer indexes the links among the four major types of research products (API v3) available in the OpenAIRE Research Graph and makes them available through an HTTP API that allows
|
ScholExplorer indexes the links among the four major types of research products (API v3) available in the OpenAIRE Graph and makes them available through an HTTP API that allows
|
||||||
to search them by the following criteria:
|
to search them by the following criteria:
|
||||||
* Links whose source object has a given PID or PID type;
|
* Links whose source object has a given PID or PID type;
|
||||||
* Links whose source object has been published by a given data source ("data source as publisher");
|
* Links whose source object has been published by a given data source ("data source as publisher");
|
||||||
|
|
|
@ -8,7 +8,7 @@ sidebar_position: 2
|
||||||
This version is not accompanied with public dump files, hence the files in this section are based on [v5.0.0](/docs/5.0.0/) of the Graph. The data of v.5.1.0 are only exposed via the [OpenAIRE Graph API](https://graph.openaire.eu/develop/) and added-value services that are built on top of this version of the Graph (e.g., the [OpenAIRE Explore](https://explore.openaire.eu/)). If you would be interested to get bulk access to Graph v5.1.0 data, please contact us via our [helpdesk](https://graph.openaire.eu/support).
|
This version is not accompanied with public dump files, hence the files in this section are based on [v5.0.0](/docs/5.0.0/) of the Graph. The data of v.5.1.0 are only exposed via the [OpenAIRE Graph API](https://graph.openaire.eu/develop/) and added-value services that are built on top of this version of the Graph (e.g., the [OpenAIRE Explore](https://explore.openaire.eu/)). If you would be interested to get bulk access to Graph v5.1.0 data, please contact us via our [helpdesk](https://graph.openaire.eu/support).
|
||||||
:::
|
:::
|
||||||
|
|
||||||
The large size of the OpenAIRE Research Graph is a major impediment for beginners to familiarise with the underlying data model and explore its contents.
|
The large size of the OpenAIRE Graph is a major impediment for beginners to familiarise with the underlying data model and explore its contents.
|
||||||
Working with the Graph in its full size typically requires access to a huge distributed computing infrastructure which cannot be easily accessible to everyone.
|
Working with the Graph in its full size typically requires access to a huge distributed computing infrastructure which cannot be easily accessible to everyone.
|
||||||
[The OpenAIRE Beginner’s Kit]( https://doi.org/10.5281/zenodo.7490192) aims to address this issue. It consists of two components:
|
[The OpenAIRE Beginner’s Kit]( https://doi.org/10.5281/zenodo.7490192) aims to address this issue. It consists of two components:
|
||||||
|
|
||||||
|
|
|
@ -8,7 +8,7 @@ sidebar_position: 1
|
||||||
This version is not accompanied with public dump files, hence the files in this section are based on [v5.0.0](/docs/5.0.0/) of the Graph. The data of v.5.1.0 are only exposed via the [OpenAIRE Graph API](https://graph.openaire.eu/develop/) and added-value services that are built on top of this version of the Graph (e.g., the [OpenAIRE Explore](https://explore.openaire.eu/)). If you would be interested to get bulk access to Graph v5.1.0 data, please contact us via our [helpdesk](https://graph.openaire.eu/support).
|
This version is not accompanied with public dump files, hence the files in this section are based on [v5.0.0](/docs/5.0.0/) of the Graph. The data of v.5.1.0 are only exposed via the [OpenAIRE Graph API](https://graph.openaire.eu/develop/) and added-value services that are built on top of this version of the Graph (e.g., the [OpenAIRE Explore](https://explore.openaire.eu/)). If you would be interested to get bulk access to Graph v5.1.0 data, please contact us via our [helpdesk](https://graph.openaire.eu/support).
|
||||||
:::
|
:::
|
||||||
|
|
||||||
You can download the full OpenAIRE Research Graph Dump as well as its schema from the following links:
|
You can download the full OpenAIRE Graph Dump as well as its schema from the following links:
|
||||||
|
|
||||||
Dataset: https://doi.org/10.5281/zenodo.3516917
|
Dataset: https://doi.org/10.5281/zenodo.3516917
|
||||||
|
|
||||||
|
@ -21,7 +21,7 @@ a tar archive containing gz files, each with one json per line.
|
||||||
|
|
||||||
## How to acknowledge this work
|
## How to acknowledge this work
|
||||||
|
|
||||||
Open Science services are open and transparent and survive thanks to your active support and to the visibility and reward they gather. If you use one of the [OpenAIRE Research Graph dumps](https://doi.org/10.5281/zenodo.3516917) for your research, please provide a proper citation following the recommendation that you find on the dump's Zenodo page or as provided below.
|
Open Science services are open and transparent and survive thanks to your active support and to the visibility and reward they gather. If you use one of the [OpenAIRE Graph dumps](https://doi.org/10.5281/zenodo.3516917) for your research, please provide a proper citation following the recommendation that you find on the dump's Zenodo page or as provided below.
|
||||||
|
|
||||||
:::note How to cite
|
:::note How to cite
|
||||||
|
|
||||||
|
|
|
@ -8,7 +8,7 @@ sidebar_position: 3
|
||||||
This version is not accompanied with public dump files, hence the files in this section are based on [v5.0.0](/docs/5.0.0/) of the Graph. The data of v.5.1.0 are only exposed via the [OpenAIRE Graph API](https://graph.openaire.eu/develop/) and added-value services that are built on top of this version of the Graph (e.g., the [OpenAIRE Explore](https://explore.openaire.eu/)). If you would be interested to get bulk access to Graph v5.1.0 data, please contact us via our [helpdesk](https://graph.openaire.eu/support).
|
This version is not accompanied with public dump files, hence the files in this section are based on [v5.0.0](/docs/5.0.0/) of the Graph. The data of v.5.1.0 are only exposed via the [OpenAIRE Graph API](https://graph.openaire.eu/develop/) and added-value services that are built on top of this version of the Graph (e.g., the [OpenAIRE Explore](https://explore.openaire.eu/)). If you would be interested to get bulk access to Graph v5.1.0 data, please contact us via our [helpdesk](https://graph.openaire.eu/support).
|
||||||
:::
|
:::
|
||||||
|
|
||||||
In order to facilitate users, different dumps are available under the Zenodo community called [OpenAIRE Research Graph](https://zenodo.org/communities/openaire-research-graph).
|
In order to facilitate users, different dumps are available under the Zenodo community called [OpenAIRE Graph](https://zenodo.org/communities/openaire-research-graph).
|
||||||
This page lists all alternative dumps currently available.
|
This page lists all alternative dumps currently available.
|
||||||
|
|
||||||
|
|
||||||
|
@ -31,7 +31,7 @@ The dump consists of a tar archive containing gzip files with one json per line.
|
||||||
|
|
||||||
This dataset is licensed under a Creative Commons Attribution 4.0 International License.
|
This dataset is licensed under a Creative Commons Attribution 4.0 International License.
|
||||||
It contains metadata records of research products (research literature, data, software, other types of research products) with funding
|
It contains metadata records of research products (research literature, data, software, other types of research products) with funding
|
||||||
information available in the OpenAIRE Research Graph. Records are grouped by funder in a dedicated archive file. Each tar archive contains
|
information available in the OpenAIRE Graph. Records are grouped by funder in a dedicated archive file. Each tar archive contains
|
||||||
gzip files, each with one json record per line. The model of this dump differs from the one of the whole graph.
|
gzip files, each with one json record per line. The model of this dump differs from the one of the whole graph.
|
||||||
Please refer [here](#alternative-sub-graph-data-model) for details on the data model of this dump.
|
Please refer [here](#alternative-sub-graph-data-model) for details on the data model of this dump.
|
||||||
|
|
||||||
|
@ -42,7 +42,7 @@ Please refer [here](#alternative-sub-graph-data-model) for details on the data m
|
||||||
Schema: https://doi.org/10.5281/zenodo.4238938
|
Schema: https://doi.org/10.5281/zenodo.4238938
|
||||||
|
|
||||||
This dataset is licensed under a Creative Commons Attribution 4.0 International License.
|
This dataset is licensed under a Creative Commons Attribution 4.0 International License.
|
||||||
It contains the metadata records of projects collected by OpenAIRE in a given time frame. Usually one deposition of collected projects is done for each release of the OpenAIRE Research Graph
|
It contains the metadata records of projects collected by OpenAIRE in a given time frame. Usually one deposition of collected projects is done for each release of the OpenAIRE Graph
|
||||||
The deposition is one tar archive containing gzip files, each with one json record per line.
|
The deposition is one tar archive containing gzip files, each with one json record per line.
|
||||||
|
|
||||||
## The dumps about research communities, initiatives and infrastructures
|
## The dumps about research communities, initiatives and infrastructures
|
||||||
|
@ -61,7 +61,7 @@ Please refer [here](#alternative-sub-graph-data-model) for details on the data m
|
||||||
|
|
||||||
## Alternative sub-graph data model
|
## Alternative sub-graph data model
|
||||||
|
|
||||||
It should be noted that the dumps for research communities, infrastructures, and products related to projects do not strictly follow the main data model of the OpenAIRE Research Graph. In particular, they differ in the following:
|
It should be noted that the dumps for research communities, infrastructures, and products related to projects do not strictly follow the main data model of the OpenAIRE Graph. In particular, they differ in the following:
|
||||||
|
|
||||||
* only research products are dumped (no relations, and entities different from results)
|
* only research products are dumped (no relations, and entities different from results)
|
||||||
* the dumped results are extended with information that can be inferred in the whole dump namely:
|
* the dumped results are extended with information that can be inferred in the whole dump namely:
|
||||||
|
|
|
@ -6,12 +6,12 @@ sidebar_position: 1
|
||||||
|
|
||||||
# Overview
|
# Overview
|
||||||
|
|
||||||
The OpenAIRE Research Graph is one of the largest open scholarly record collections worldwide, key in fostering Open Science and establishing its practices in the daily research activities.
|
The OpenAIRE Graph (formerly known as the OpenAIRE Research Graph) is one of the largest open scholarly record collections worldwide, key in fostering Open Science and establishing its practices in the daily research activities.
|
||||||
Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back in the hands of the scientific community.
|
Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back in the hands of the scientific community.
|
||||||
|
|
||||||
Imagine a vast collection of research products all linked together, contextualised and openly available. For the past years OpenAIRE has been working to gather this valuable record. It is a massive collection of metadata and links between scientific products such as articles, datasets, software, and other research products, entities like organisations, funders, funding streams, projects, communities, and data sources.
|
Imagine a vast collection of research products all linked together, contextualised and openly available. For the past years OpenAIRE has been working to gather this valuable record. It is a massive collection of metadata and links between scientific products such as articles, datasets, software, and other research products, entities like organisations, funders, funding streams, projects, communities, and data sources.
|
||||||
|
|
||||||
As of today, the OpenAIRE Research Graph aggregates around 450Mi metadata records with links collecting from 2K data sources trusted by scientists, including:
|
As of today, the OpenAIRE Graph aggregates around 450Mi metadata records with links collecting from 2K data sources trusted by scientists, including:
|
||||||
|
|
||||||
* Open Access journals registered in DOAJ
|
* Open Access journals registered in DOAJ
|
||||||
* Crossref
|
* Crossref
|
||||||
|
|
|
@ -4,5 +4,5 @@ sidebar_position: 11
|
||||||
|
|
||||||
# License
|
# License
|
||||||
|
|
||||||
OpenAIRE Research Graph is available for download and re-use as CC-BY (due to some input sources whose license is CC-BY). Parts of the graphs can be re-used as CC-0.
|
OpenAIRE Graph is available for download and re-use as CC-BY (due to some input sources whose license is CC-BY). Parts of the graphs can be re-used as CC-0.
|
||||||
|
|
||||||
|
|
|
@ -4,7 +4,7 @@ sidebar_position: 7
|
||||||
|
|
||||||
# Relevant publications
|
# Relevant publications
|
||||||
|
|
||||||
Open Science services are open and transparent and survive thanks to your active support and to the visibility and reward they gather. If you use one of the [OpenAIRE Research Graph dumps](https://doi.org/10.5281/zenodo.3516917) for your research, please provide a proper citation following the recommendation that you find on the dump's Zenodo page or as provided below.
|
Open Science services are open and transparent and survive thanks to your active support and to the visibility and reward they gather. If you use one of the [OpenAIRE Graph dumps](https://doi.org/10.5281/zenodo.3516917) for your research, please provide a proper citation following the recommendation that you find on the dump's Zenodo page or as provided below.
|
||||||
|
|
||||||
:::note How to cite
|
:::note How to cite
|
||||||
|
|
||||||
|
|
|
@ -18,7 +18,7 @@ console.info(env.parsed);
|
||||||
|
|
||||||
/** @type {import('@docusaurus/types').Config} */
|
/** @type {import('@docusaurus/types').Config} */
|
||||||
const config = {
|
const config = {
|
||||||
title: 'OpenAIRE Research Graph Documentation',
|
title: 'OpenAIRE Graph Documentation',
|
||||||
tagline: 'Open Access Infrastructure for Research in Europe',
|
tagline: 'Open Access Infrastructure for Research in Europe',
|
||||||
url: process.env.URL,
|
url: process.env.URL,
|
||||||
baseUrl: process.env.BASE_URL, // serve the website at route
|
baseUrl: process.env.BASE_URL, // serve the website at route
|
||||||
|
|
|
@ -29,7 +29,7 @@ const sidebars = {
|
||||||
label: "Entities",
|
label: "Entities",
|
||||||
link: {
|
link: {
|
||||||
type: 'generated-index',
|
type: 'generated-index',
|
||||||
description: 'The main entities of the OpenAIRE Research Graph are listed below.'
|
description: 'The main entities of the OpenAIRE Graph are listed below.'
|
||||||
},
|
},
|
||||||
items: [
|
items: [
|
||||||
{ type: 'doc', id: 'data-model/entities/result' },
|
{ type: 'doc', id: 'data-model/entities/result' },
|
||||||
|
@ -101,7 +101,7 @@ const sidebars = {
|
||||||
label: "Enrichment by mining",
|
label: "Enrichment by mining",
|
||||||
link: {
|
link: {
|
||||||
type: 'generated-index',
|
type: 'generated-index',
|
||||||
description: 'The OpenAIRE Research Graph is enriched using the different Text and Data Mining (TDM) algorithms that are grouped in the following categories.'
|
description: 'The OpenAIRE Graph is enriched using the different Text and Data Mining (TDM) algorithms that are grouped in the following categories.'
|
||||||
},
|
},
|
||||||
items: [
|
items: [
|
||||||
{ type: 'doc', id: 'data-provision/enrichment-by-mining/affiliation_matching' },
|
{ type: 'doc', id: 'data-provision/enrichment-by-mining/affiliation_matching' },
|
||||||
|
@ -128,7 +128,7 @@ const sidebars = {
|
||||||
label: "Deduction & propagation",
|
label: "Deduction & propagation",
|
||||||
link: {
|
link: {
|
||||||
type: 'generated-index' ,
|
type: 'generated-index' ,
|
||||||
description: 'The OpenAIRE Research Graph is further enriched by the deduction and propagation processes descibed in this section.'
|
description: 'The OpenAIRE Graph is further enriched by the deduction and propagation processes descibed in this section.'
|
||||||
|
|
||||||
},
|
},
|
||||||
items: [
|
items: [
|
||||||
|
@ -141,7 +141,7 @@ const sidebars = {
|
||||||
label: "Indicators ingestion",
|
label: "Indicators ingestion",
|
||||||
link: {
|
link: {
|
||||||
type: 'generated-index' ,
|
type: 'generated-index' ,
|
||||||
description: 'In this step, the following types of indicators are ingested in the OpenAIRE Research Graph.'
|
description: 'In this step, the following types of indicators are ingested in the OpenAIRE Graph.'
|
||||||
|
|
||||||
},
|
},
|
||||||
items: [
|
items: [
|
||||||
|
|
Before Width: | Height: | Size: 7.5 KiB After Width: | Height: | Size: 16 KiB |
Before Width: | Height: | Size: 38 KiB After Width: | Height: | Size: 70 KiB |
Before Width: | Height: | Size: 38 KiB After Width: | Height: | Size: 60 KiB |
Before Width: | Height: | Size: 44 KiB After Width: | Height: | Size: 72 KiB |
Before Width: | Height: | Size: 420 KiB After Width: | Height: | Size: 394 KiB |
|
@ -1,6 +1,6 @@
|
||||||
# Data model
|
# Data model
|
||||||
|
|
||||||
The OpenAIRE Research Graph comprises several types of [entities](../category/entities) and [relationships](./relationships) among them.
|
The OpenAIRE Graph comprises several types of [entities](../category/entities) and [relationships](./relationships) among them.
|
||||||
|
|
||||||
The latest version of the JSON schema can be found on the [Downloads](../downloads/full-graph) section.
|
The latest version of the JSON schema can be found on the [Downloads](../downloads/full-graph) section.
|
||||||
|
|
||||||
|
@ -20,6 +20,6 @@ responsible for operating data sources or consisting the affiliations of Product
|
||||||
|
|
||||||
:::note Further reading
|
:::note Further reading
|
||||||
|
|
||||||
A detailed report on the OpenAIRE Research Graph Data Model can be found on [Zenodo](https://zenodo.org/record/2643199).
|
A detailed report on the OpenAIRE Graph Data Model can be found on [Zenodo](https://zenodo.org/record/2643199).
|
||||||
:::
|
:::
|
||||||
|
|
||||||
|
|
|
@ -3,6 +3,6 @@
|
||||||
"position": 1,
|
"position": 1,
|
||||||
"link": {
|
"link": {
|
||||||
"type": "generated-index",
|
"type": "generated-index",
|
||||||
"description": "The main entities of the OpenAIRE Research Graph are listed below."
|
"description": "The main entities of the OpenAIRE Graph are listed below."
|
||||||
}
|
}
|
||||||
}
|
}
|
|
@ -1,6 +1,6 @@
|
||||||
# PIDs and identifiers
|
# PIDs and identifiers
|
||||||
|
|
||||||
One of the challenges towards the stability of the contents in the OpenAIRE Research Graph consists of making its identifiers and records stable over time.
|
One of the challenges towards the stability of the contents in the OpenAIRE Graph consists of making its identifiers and records stable over time.
|
||||||
The barriers to this scenario are many, as the Graph keeps a map of data sources that is subject to constant variations: records in repositories vary in content,
|
The barriers to this scenario are many, as the Graph keeps a map of data sources that is subject to constant variations: records in repositories vary in content,
|
||||||
original IDs, and PIDs, may disappear or reappear, and the same holds for the repository or the metadata collection it exposes.
|
original IDs, and PIDs, may disappear or reappear, and the same holds for the repository or the metadata collection it exposes.
|
||||||
Not only, but the mappings applied to the original contents may also change and improve over time to catch up with the changes in the input records.
|
Not only, but the mappings applied to the original contents may also change and improve over time to catch up with the changes in the input records.
|
||||||
|
|
|
@ -4,14 +4,14 @@ sidebar_position: 1
|
||||||
|
|
||||||
# Aggregation
|
# Aggregation
|
||||||
|
|
||||||
OpenAIRE materializes an open, participatory research graph (the OpenAIRE Research Graph) where products of the research life-cycle (e.g. scientific literature, research data, project, software) are semantically linked to each other and carry information about their access rights (i.e. if they are Open Access, Restricted, Embargoed, or Closed) and the sources from which they have been collected and where they are hosted. The OpenAIRE Research Graph is materialised via a set of autonomic, orchestrated workflows operating in a regimen of continuous data aggregation and integration. [1]
|
OpenAIRE materializes an open, participatory research graph (the OpenAIRE Graph) where products of the research life-cycle (e.g. scientific literature, research data, project, software) are semantically linked to each other and carry information about their access rights (i.e. if they are Open Access, Restricted, Embargoed, or Closed) and the sources from which they have been collected and where they are hosted. The OpenAIRE Graph is materialised via a set of autonomic, orchestrated workflows operating in a regimen of continuous data aggregation and integration. [1]
|
||||||
|
|
||||||
## What does OpenAIRE collect?
|
## What does OpenAIRE collect?
|
||||||
|
|
||||||
OpenAIRE aggregates metadata records describing objects of the research life-cycle from content providers compliant to the [OpenAIRE guidelines](https://guidelines.openaire.eu/) and from entity registries (i.e. data sources offering authoritative lists of entities, like [OpenDOAR](https://v2.sherpa.ac.uk/opendoar/), [re3data](https://www.re3data.org/), [DOAJ](https://doaj.org/), and various funder databases). After collection, metadata are transformed according to the OpenAIRE internal metadata model, which is used to generate the final OpenAIRE Research Graph, accessible from the [OpenAIRE EXPLORE portal](https://explore.openaire.eu) and the [APIs](https://graph.openaire.eu/develop/).
|
OpenAIRE aggregates metadata records describing objects of the research life-cycle from content providers compliant to the [OpenAIRE guidelines](https://guidelines.openaire.eu/) and from entity registries (i.e. data sources offering authoritative lists of entities, like [OpenDOAR](https://v2.sherpa.ac.uk/opendoar/), [re3data](https://www.re3data.org/), [DOAJ](https://doaj.org/), and various funder databases). After collection, metadata are transformed according to the OpenAIRE internal metadata model, which is used to generate the final OpenAIRE Graph, accessible from the [OpenAIRE EXPLORE portal](https://explore.openaire.eu) and the [APIs](https://graph.openaire.eu/develop/).
|
||||||
|
|
||||||
The transformation process includes the application of cleaning functions whose goal is to ensure that values are harmonised according to a common format (e.g. dates as YYYY-MM-dd) and, whenever applicable, to a common controlled vocabulary. The controlled vocabularies used for cleansing are accessible at [api.openaire.eu/vocabularies](https://api.openaire.eu/vocabularies/). Each vocabulary features a set of controlled terms, each with one code, one label, and a set of synonyms. If a synonym is found as field value, the value is updated with the corresponding term.
|
The transformation process includes the application of cleaning functions whose goal is to ensure that values are harmonised according to a common format (e.g. dates as YYYY-MM-dd) and, whenever applicable, to a common controlled vocabulary. The controlled vocabularies used for cleansing are accessible at [api.openaire.eu/vocabularies](https://api.openaire.eu/vocabularies/). Each vocabulary features a set of controlled terms, each with one code, one label, and a set of synonyms. If a synonym is found as field value, the value is updated with the corresponding term.
|
||||||
In addition, the OpenAIRE Research Graph is extended with other relevant scholarly communication sources that need special handling, either because they do not strictly follow the OpenAIRE Guidelines or due to the vast amount of data of data they offer (e.g. DOIBoost, that merges Crossref, ORCID, Microsoft Academic Graph, and Unpaywall).
|
In addition, the OpenAIRE Graph is extended with other relevant scholarly communication sources that need special handling, either because they do not strictly follow the OpenAIRE Guidelines or due to the vast amount of data of data they offer (e.g. DOIBoost, that merges Crossref, ORCID, Microsoft Academic Graph, and Unpaywall).
|
||||||
|
|
||||||
<p align="center">
|
<p align="center">
|
||||||
<img loading="lazy" alt="Aggregation" src={require('../../assets/img/aggregation.png').default} width="65%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
|
<img loading="lazy" alt="Aggregation" src={require('../../assets/img/aggregation.png').default} width="65%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
|
||||||
|
@ -30,7 +30,7 @@ Relationships between objects are collected from the data sources, but also auto
|
||||||
|
|
||||||
## What kind of data sources are in OpenAIRE?
|
## What kind of data sources are in OpenAIRE?
|
||||||
|
|
||||||
Objects and relationships in the OpenAIRE Research Graph are extracted from information packages, i.e. metadata records, collected from data sources of the following kinds:
|
Objects and relationships in the OpenAIRE Graph are extracted from information packages, i.e. metadata records, collected from data sources of the following kinds:
|
||||||
|
|
||||||
- *Literature, Institutional and thematic repositories*: Information systems where scientists upload the bibliographic metadata and full-texts of their articles, due to obligations from their organization or due to community practices (e.g. ArXiv, Europe PMC);
|
- *Literature, Institutional and thematic repositories*: Information systems where scientists upload the bibliographic metadata and full-texts of their articles, due to obligations from their organization or due to community practices (e.g. ArXiv, Europe PMC);
|
||||||
- *Open Access Publishers and journals*: Information system of open access publishers or relative journals, which offer bibliographic metadata and PDFs of their published articles;
|
- *Open Access Publishers and journals*: Information system of open access publishers or relative journals, which offer bibliographic metadata and PDFs of their published articles;
|
||||||
|
|
|
@ -33,7 +33,7 @@ The metadata collection process identifies the most recent record date available
|
||||||
|
|
||||||
### Entity Mapping
|
### Entity Mapping
|
||||||
|
|
||||||
The table below describes the mapping from the XML baseline records to the OpenAIRE Research Graph dump format.
|
The table below describes the mapping from the XML baseline records to the OpenAIRE Graph dump format.
|
||||||
|
|
||||||
| OpenAIRE Result field path | Datacite record JSON path | # Notes |
|
| OpenAIRE Result field path | Datacite record JSON path | # Notes |
|
||||||
|--------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
|--------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||||
|
|
|
@ -68,7 +68,7 @@ Records in Crossref are ruled out according to the following criteria
|
||||||
|
|
||||||
Records with `type=dataset` are mapped into OpenAIRE results of type dataset. All others are mapped as OpenAIRE results of type publication.
|
Records with `type=dataset` are mapped into OpenAIRE results of type dataset. All others are mapped as OpenAIRE results of type publication.
|
||||||
|
|
||||||
### Mapping Crossref properties into the OpenAIRE Research Graph
|
### Mapping Crossref properties into the OpenAIRE Graph
|
||||||
|
|
||||||
Properties in OpenAIRE results are set based on the logic described in the following table:
|
Properties in OpenAIRE results are set based on the logic described in the following table:
|
||||||
|
|
||||||
|
@ -222,7 +222,7 @@ Miriam will modify the process to ensure that:
|
||||||
* Only papers with DOI are considered
|
* Only papers with DOI are considered
|
||||||
* Since for the same DOI we have multiple version of item with different MAG PaperId, we only take one per DOI (the last one we process). We call this dataset `Papers_distinct`
|
* Since for the same DOI we have multiple version of item with different MAG PaperId, we only take one per DOI (the last one we process). We call this dataset `Papers_distinct`
|
||||||
|
|
||||||
When mapping MAG records to the OpenAIRE Research Graph, we consider the following MAG tables:
|
When mapping MAG records to the OpenAIRE Graph, we consider the following MAG tables:
|
||||||
* `PaperAbstractsInvertedIndex`: for the paper abstracts
|
* `PaperAbstractsInvertedIndex`: for the paper abstracts
|
||||||
* `Authors`: for the authors. The MAG data is pre-processed by grouping authors by PaperId
|
* `Authors`: for the authors. The MAG data is pre-processed by grouping authors by PaperId
|
||||||
* `Affiliations` and `PaperAuthorAffiliations`: to generate links between publications and organisations
|
* `Affiliations` and `PaperAuthorAffiliations`: to generate links between publications and organisations
|
||||||
|
|
|
@ -69,7 +69,7 @@ curl -s "https://www.ebi.ac.uk/europepmc/webservices/rest/MED/33024307/datalinks
|
||||||
```
|
```
|
||||||
|
|
||||||
## Mapping
|
## Mapping
|
||||||
The table below describes the mapping from the EBI links records to the OpenAIRE Research Graph dump format.
|
The table below describes the mapping from the EBI links records to the OpenAIRE Graph dump format.
|
||||||
We filter all the target links with pid type **ena**, **pdb** or **uniprot**
|
We filter all the target links with pid type **ena**, **pdb** or **uniprot**
|
||||||
For each target we construct a Bioentity with the following mapping
|
For each target we construct a Bioentity with the following mapping
|
||||||
|
|
||||||
|
|
|
@ -12,7 +12,7 @@ Pubmed exposes an entry point FTP with all the updates for each one. [ftp baseli
|
||||||
|
|
||||||
## Entity Mapping
|
## Entity Mapping
|
||||||
|
|
||||||
The table below describes the mapping from the XML baseline records to the OpenAIRE Research Graph dump format.
|
The table below describes the mapping from the XML baseline records to the OpenAIRE Graph dump format.
|
||||||
|
|
||||||
| OpenAIRE Result field path | PubMed record field xpath | Notes |
|
| OpenAIRE Result field path | PubMed record field xpath | Notes |
|
||||||
|--------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
|--------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||||
|
|
|
@ -14,7 +14,7 @@ The data curation activity is twofold, on one end pivots around the disambiguati
|
||||||
Duplicates among organizations are therefore managed through three different stages:
|
Duplicates among organizations are therefore managed through three different stages:
|
||||||
* *Creation of Suggestions*: executes an automatic workflow that performs the deduplication and prepare new suggestions for the curators to be processed;
|
* *Creation of Suggestions*: executes an automatic workflow that performs the deduplication and prepare new suggestions for the curators to be processed;
|
||||||
* *Curation*: manual editing of the organization records performed by the data curators;
|
* *Curation*: manual editing of the organization records performed by the data curators;
|
||||||
* *Creation of Representative Organizations*: executes an automatic workflow that creates curated organizations and exposes them on the OpenAIRE Research Graph by using the curators' feedback from the OpenOrgs underlying database.
|
* *Creation of Representative Organizations*: executes an automatic workflow that creates curated organizations and exposes them on the OpenAIRE Graph by using the curators' feedback from the OpenOrgs underlying database.
|
||||||
|
|
||||||
The next sections describe the above mentioned stages.
|
The next sections describe the above mentioned stages.
|
||||||
|
|
||||||
|
@ -61,7 +61,7 @@ Note that if a curator does not provide a feedback on a similarity relation sugg
|
||||||
|
|
||||||
### Creation of Representative Organizations
|
### Creation of Representative Organizations
|
||||||
|
|
||||||
This stage executes an automatic workflow that faces the *duplicates grouping* stage to create representative organizations and to update them on the OpenAIRE Research Graph. Such organizations are obtained via transitive closure and the relations used comes from the curators' feedback gathered on the OpenOrgs underlying Database.
|
This stage executes an automatic workflow that faces the *duplicates grouping* stage to create representative organizations and to update them on the OpenAIRE Graph. Such organizations are obtained via transitive closure and the relations used comes from the curators' feedback gathered on the OpenOrgs underlying Database.
|
||||||
|
|
||||||
#### Duplicates grouping (transitive closure)
|
#### Duplicates grouping (transitive closure)
|
||||||
|
|
||||||
|
|
|
@ -2,11 +2,11 @@
|
||||||
|
|
||||||
At the very end of the graph production workflow, a step is dedicated to perform certain finalisation operations, that we describe in this page,
|
At the very end of the graph production workflow, a step is dedicated to perform certain finalisation operations, that we describe in this page,
|
||||||
aiming to improve the overall quality of the data.
|
aiming to improve the overall quality of the data.
|
||||||
The output of this final step is the final version of the OpenAIRE Research Graph.
|
The output of this final step is the final version of the OpenAIRE Graph.
|
||||||
|
|
||||||
## Filtering
|
## Filtering
|
||||||
|
|
||||||
Bibliographic records that do not meet minimal requirements for being part of the OpenAIRE Research Graph are eliminated during this phase.
|
Bibliographic records that do not meet minimal requirements for being part of the OpenAIRE Graph are eliminated during this phase.
|
||||||
Currently, the only criteria applied horizontally to the entire graph aims at excluding scientific results whose title is not meaningful for citation purposes.
|
Currently, the only criteria applied horizontally to the entire graph aims at excluding scientific results whose title is not meaningful for citation purposes.
|
||||||
Then, different criteria are applied in the pre-processing of specific sub-collections:
|
Then, different criteria are applied in the pre-processing of specific sub-collections:
|
||||||
|
|
||||||
|
|
|
@ -1,16 +1,16 @@
|
||||||
# Indexing
|
# Indexing
|
||||||
|
|
||||||
The final version of the OpenAIRE Research Graph is indexed on a Solr server that is used by the OpenAIRE portals ([EXPLORE](https://explore.openaire.eu), [CONNECT](https://connect.openaire.eu), [PROVIDE](https://provide.openaire.eu)) and APIs, the latter adopted by several third-party applications and organizations, such as:
|
The final version of the OpenAIRE Graph is indexed on a Solr server that is used by the OpenAIRE portals ([EXPLORE](https://explore.openaire.eu), [CONNECT](https://connect.openaire.eu), [PROVIDE](https://provide.openaire.eu)) and APIs, the latter adopted by several third-party applications and organizations, such as:
|
||||||
|
|
||||||
* The OpenAIRE Graph APIs and Portals will offer to the EOSC (European Open Science Cloud) an Open Science Resource Catalogue, keeping an up to date map of all research results (publications, datasets, software), services, organizations, projects, funders in Europe and beyond.
|
* The OpenAIRE Graph APIs and Portals will offer to the EOSC (European Open Science Cloud) an Open Science Resource Catalogue, keeping an up to date map of all research results (publications, datasets, software), services, organizations, projects, funders in Europe and beyond.
|
||||||
|
|
||||||
* DSpace & EPrints repositories can install the OpenAIRE plugin to expose OpenAIRE compliant metadata records via their OAI-PMH endpoint and offer to researchers the possibility to link their depositions to the funding project, by selecting it from the list of project provided by OpenAIRE.
|
* DSpace & EPrints repositories can install the OpenAIRE plugin to expose OpenAIRE compliant metadata records via their OAI-PMH endpoint and offer to researchers the possibility to link their depositions to the funding project, by selecting it from the list of project provided by OpenAIRE.
|
||||||
|
|
||||||
* EC participant portal (Sygma - System for Grant Management) uses the OpenAIRE API in the “Continuous Reporting” section. Sygma automatically fetches from the OpenAIRE Search API the list of publications and datasets in the OpenAIRE Research Graph that are linked to the project. The user can select the research products from the list and easily compile the continuous reporting data of the project.
|
* EC participant portal (Sygma - System for Grant Management) uses the OpenAIRE API in the “Continuous Reporting” section. Sygma automatically fetches from the OpenAIRE Search API the list of publications and datasets in the OpenAIRE Graph that are linked to the project. The user can select the research products from the list and easily compile the continuous reporting data of the project.
|
||||||
|
|
||||||
* ScholExplorer is used by different players of the scholarly communication ecosystem. For example, [Elsevier](https://www.elsevier.com/authors/tools-and-resources/research-data/data-base-linking) uses its API to make the links between
|
* ScholExplorer is used by different players of the scholarly communication ecosystem. For example, [Elsevier](https://www.elsevier.com/authors/tools-and-resources/research-data/data-base-linking) uses its API to make the links between
|
||||||
publications and datasets automatically appear on ScienceDirect.
|
publications and datasets automatically appear on ScienceDirect.
|
||||||
ScholExplorer indexes the links among the four major types of research products (API v3) available in the OpenAIRE Research Graph and makes them available through an HTTP API that allows
|
ScholExplorer indexes the links among the four major types of research products (API v3) available in the OpenAIRE Graph and makes them available through an HTTP API that allows
|
||||||
to search them by the following criteria:
|
to search them by the following criteria:
|
||||||
* Links whose source object has a given PID or PID type;
|
* Links whose source object has a given PID or PID type;
|
||||||
* Links whose source object has been published by a given data source ("data source as publisher");
|
* Links whose source object has been published by a given data source ("data source as publisher");
|
||||||
|
|
|
@ -8,7 +8,7 @@ sidebar_position: 2
|
||||||
This version is not accompanied with public dump files, hence the files in this section are based on [v5.0.0](/docs/5.0.0/) of the Graph. The data of v.5.1.0 are only exposed via the [OpenAIRE Graph API](https://graph.openaire.eu/develop/) and added-value services that are built on top of this version of the Graph (e.g., the [OpenAIRE Explore](https://explore.openaire.eu/)). If you would be interested to get bulk access to Graph v5.1.0 data, please contact us via our [helpdesk](https://graph.openaire.eu/support).
|
This version is not accompanied with public dump files, hence the files in this section are based on [v5.0.0](/docs/5.0.0/) of the Graph. The data of v.5.1.0 are only exposed via the [OpenAIRE Graph API](https://graph.openaire.eu/develop/) and added-value services that are built on top of this version of the Graph (e.g., the [OpenAIRE Explore](https://explore.openaire.eu/)). If you would be interested to get bulk access to Graph v5.1.0 data, please contact us via our [helpdesk](https://graph.openaire.eu/support).
|
||||||
:::
|
:::
|
||||||
|
|
||||||
The large size of the OpenAIRE Research Graph is a major impediment for beginners to familiarise with the underlying data model and explore its contents.
|
The large size of the OpenAIRE Graph is a major impediment for beginners to familiarise with the underlying data model and explore its contents.
|
||||||
Working with the Graph in its full size typically requires access to a huge distributed computing infrastructure which cannot be easily accessible to everyone.
|
Working with the Graph in its full size typically requires access to a huge distributed computing infrastructure which cannot be easily accessible to everyone.
|
||||||
[The OpenAIRE Beginner’s Kit]( https://doi.org/10.5281/zenodo.7490192) aims to address this issue. It consists of two components:
|
[The OpenAIRE Beginner’s Kit]( https://doi.org/10.5281/zenodo.7490192) aims to address this issue. It consists of two components:
|
||||||
|
|
||||||
|
|
|
@ -8,7 +8,7 @@ sidebar_position: 1
|
||||||
This version is not accompanied with public dump files, hence the files in this section are based on [v5.0.0](/docs/5.0.0/) of the Graph. The data of v.5.1.0 are only exposed via the [OpenAIRE Graph API](https://graph.openaire.eu/develop/) and added-value services that are built on top of this version of the Graph (e.g., the [OpenAIRE Explore](https://explore.openaire.eu/)). If you would be interested to get bulk access to Graph v5.1.0 data, please contact us via our [helpdesk](https://graph.openaire.eu/support).
|
This version is not accompanied with public dump files, hence the files in this section are based on [v5.0.0](/docs/5.0.0/) of the Graph. The data of v.5.1.0 are only exposed via the [OpenAIRE Graph API](https://graph.openaire.eu/develop/) and added-value services that are built on top of this version of the Graph (e.g., the [OpenAIRE Explore](https://explore.openaire.eu/)). If you would be interested to get bulk access to Graph v5.1.0 data, please contact us via our [helpdesk](https://graph.openaire.eu/support).
|
||||||
:::
|
:::
|
||||||
|
|
||||||
You can download the full OpenAIRE Research Graph Dump as well as its schema from the following links:
|
You can download the full OpenAIRE Graph Dump as well as its schema from the following links:
|
||||||
|
|
||||||
Dataset: https://doi.org/10.5281/zenodo.3516917
|
Dataset: https://doi.org/10.5281/zenodo.3516917
|
||||||
|
|
||||||
|
@ -21,7 +21,7 @@ a tar archive containing gz files, each with one json per line.
|
||||||
|
|
||||||
## How to acknowledge this work
|
## How to acknowledge this work
|
||||||
|
|
||||||
Open Science services are open and transparent and survive thanks to your active support and to the visibility and reward they gather. If you use one of the [OpenAIRE Research Graph dumps](https://doi.org/10.5281/zenodo.3516917) for your research, please provide a proper citation following the recommendation that you find on the dump's Zenodo page or as provided below.
|
Open Science services are open and transparent and survive thanks to your active support and to the visibility and reward they gather. If you use one of the [OpenAIRE Graph dumps](https://doi.org/10.5281/zenodo.3516917) for your research, please provide a proper citation following the recommendation that you find on the dump's Zenodo page or as provided below.
|
||||||
|
|
||||||
:::note How to cite
|
:::note How to cite
|
||||||
|
|
||||||
|
|
|
@ -8,7 +8,7 @@ sidebar_position: 3
|
||||||
This version is not accompanied with public dump files, hence the files in this section are based on [v5.0.0](/docs/5.0.0/) of the Graph. The data of v.5.1.0 are only exposed via the [OpenAIRE Graph API](https://graph.openaire.eu/develop/) and added-value services that are built on top of this version of the Graph (e.g., the [OpenAIRE Explore](https://explore.openaire.eu/)). If you would be interested to get bulk access to Graph v5.1.0 data, please contact us via our [helpdesk](https://graph.openaire.eu/support).
|
This version is not accompanied with public dump files, hence the files in this section are based on [v5.0.0](/docs/5.0.0/) of the Graph. The data of v.5.1.0 are only exposed via the [OpenAIRE Graph API](https://graph.openaire.eu/develop/) and added-value services that are built on top of this version of the Graph (e.g., the [OpenAIRE Explore](https://explore.openaire.eu/)). If you would be interested to get bulk access to Graph v5.1.0 data, please contact us via our [helpdesk](https://graph.openaire.eu/support).
|
||||||
:::
|
:::
|
||||||
|
|
||||||
In order to facilitate users, different dumps are available under the Zenodo community called [OpenAIRE Research Graph](https://zenodo.org/communities/openaire-research-graph).
|
In order to facilitate users, different dumps are available under the Zenodo community called [OpenAIRE Graph](https://zenodo.org/communities/openaire-research-graph).
|
||||||
This page lists all alternative dumps currently available.
|
This page lists all alternative dumps currently available.
|
||||||
|
|
||||||
|
|
||||||
|
@ -31,7 +31,7 @@ The dump consists of a tar archive containing gzip files with one json per line.
|
||||||
|
|
||||||
This dataset is licensed under a Creative Commons Attribution 4.0 International License.
|
This dataset is licensed under a Creative Commons Attribution 4.0 International License.
|
||||||
It contains metadata records of research products (research literature, data, software, other types of research products) with funding
|
It contains metadata records of research products (research literature, data, software, other types of research products) with funding
|
||||||
information available in the OpenAIRE Research Graph. Records are grouped by funder in a dedicated archive file. Each tar archive contains
|
information available in the OpenAIRE Graph. Records are grouped by funder in a dedicated archive file. Each tar archive contains
|
||||||
gzip files, each with one json record per line. The model of this dump differs from the one of the whole graph.
|
gzip files, each with one json record per line. The model of this dump differs from the one of the whole graph.
|
||||||
Please refer [here](#alternative-sub-graph-data-model) for details on the data model of this dump.
|
Please refer [here](#alternative-sub-graph-data-model) for details on the data model of this dump.
|
||||||
|
|
||||||
|
@ -42,7 +42,7 @@ Please refer [here](#alternative-sub-graph-data-model) for details on the data m
|
||||||
Schema: https://doi.org/10.5281/zenodo.4238938
|
Schema: https://doi.org/10.5281/zenodo.4238938
|
||||||
|
|
||||||
This dataset is licensed under a Creative Commons Attribution 4.0 International License.
|
This dataset is licensed under a Creative Commons Attribution 4.0 International License.
|
||||||
It contains the metadata records of projects collected by OpenAIRE in a given time frame. Usually one deposition of collected projects is done for each release of the OpenAIRE Research Graph
|
It contains the metadata records of projects collected by OpenAIRE in a given time frame. Usually one deposition of collected projects is done for each release of the OpenAIRE Graph
|
||||||
The deposition is one tar archive containing gzip files, each with one json record per line.
|
The deposition is one tar archive containing gzip files, each with one json record per line.
|
||||||
|
|
||||||
## The dumps about research communities, initiatives and infrastructures
|
## The dumps about research communities, initiatives and infrastructures
|
||||||
|
@ -61,7 +61,7 @@ Please refer [here](#alternative-sub-graph-data-model) for details on the data m
|
||||||
|
|
||||||
## Alternative sub-graph data model
|
## Alternative sub-graph data model
|
||||||
|
|
||||||
It should be noted that the dumps for research communities, infrastructures, and products related to projects do not strictly follow the main data model of the OpenAIRE Research Graph. In particular, they differ in the following:
|
It should be noted that the dumps for research communities, infrastructures, and products related to projects do not strictly follow the main data model of the OpenAIRE Graph. In particular, they differ in the following:
|
||||||
|
|
||||||
* only research products are dumped (no relations, and entities different from results)
|
* only research products are dumped (no relations, and entities different from results)
|
||||||
* the dumped results are extended with information that can be inferred in the whole dump namely:
|
* the dumped results are extended with information that can be inferred in the whole dump namely:
|
||||||
|
|
|
@ -6,12 +6,12 @@ sidebar_position: 1
|
||||||
|
|
||||||
# Overview
|
# Overview
|
||||||
|
|
||||||
The OpenAIRE Research Graph is one of the largest open scholarly record collections worldwide, key in fostering Open Science and establishing its practices in the daily research activities.
|
The OpenAIRE Graph (formerly known as the OpenAIRE Research Graph) is one of the largest open scholarly record collections worldwide, key in fostering Open Science and establishing its practices in the daily research activities.
|
||||||
Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back in the hands of the scientific community.
|
Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back in the hands of the scientific community.
|
||||||
|
|
||||||
Imagine a vast collection of research products all linked together, contextualised and openly available. For the past years OpenAIRE has been working to gather this valuable record. It is a massive collection of metadata and links between scientific products such as articles, datasets, software, and other research products, entities like organisations, funders, funding streams, projects, communities, and data sources.
|
Imagine a vast collection of research products all linked together, contextualised and openly available. For the past years OpenAIRE has been working to gather this valuable record. It is a massive collection of metadata and links between scientific products such as articles, datasets, software, and other research products, entities like organisations, funders, funding streams, projects, communities, and data sources.
|
||||||
|
|
||||||
As of today, the OpenAIRE Research Graph aggregates around 450Mi metadata records with links collecting from 2K data sources trusted by scientists, including:
|
As of today, the OpenAIRE Graph aggregates around 450Mi metadata records with links collecting from 2K data sources trusted by scientists, including:
|
||||||
|
|
||||||
* Open Access journals registered in DOAJ
|
* Open Access journals registered in DOAJ
|
||||||
* Crossref
|
* Crossref
|
||||||
|
|
|
@ -4,5 +4,5 @@ sidebar_position: 11
|
||||||
|
|
||||||
# License
|
# License
|
||||||
|
|
||||||
OpenAIRE Research Graph is available for download and re-use as CC-BY (due to some input sources whose license is CC-BY). Parts of the graphs can be re-used as CC-0.
|
OpenAIRE Graph is available for download and re-use as CC-BY (due to some input sources whose license is CC-BY). Parts of the graphs can be re-used as CC-0.
|
||||||
|
|
||||||
|
|
|
@ -4,7 +4,7 @@ sidebar_position: 7
|
||||||
|
|
||||||
# Relevant publications
|
# Relevant publications
|
||||||
|
|
||||||
Open Science services are open and transparent and survive thanks to your active support and to the visibility and reward they gather. If you use one of the [OpenAIRE Research Graph dumps](https://doi.org/10.5281/zenodo.3516917) for your research, please provide a proper citation following the recommendation that you find on the dump's Zenodo page or as provided below.
|
Open Science services are open and transparent and survive thanks to your active support and to the visibility and reward they gather. If you use one of the [OpenAIRE Graph dumps](https://doi.org/10.5281/zenodo.3516917) for your research, please provide a proper citation following the recommendation that you find on the dump's Zenodo page or as provided below.
|
||||||
|
|
||||||
:::note How to cite
|
:::note How to cite
|
||||||
|
|
||||||
|
|
|
@ -21,7 +21,7 @@
|
||||||
"label": "Entities",
|
"label": "Entities",
|
||||||
"link": {
|
"link": {
|
||||||
"type": "generated-index",
|
"type": "generated-index",
|
||||||
"description": "The main entities of the OpenAIRE Research Graph are listed below."
|
"description": "The main entities of the OpenAIRE Graph are listed below."
|
||||||
},
|
},
|
||||||
"items": [
|
"items": [
|
||||||
{
|
{
|
||||||
|
@ -142,7 +142,7 @@
|
||||||
"label": "Enrichment by mining",
|
"label": "Enrichment by mining",
|
||||||
"link": {
|
"link": {
|
||||||
"type": "generated-index",
|
"type": "generated-index",
|
||||||
"description": "The OpenAIRE Research Graph is enriched using the different Text and Data Mining (TDM) algorithms that are grouped in the following categories."
|
"description": "The OpenAIRE Graph is enriched using the different Text and Data Mining (TDM) algorithms that are grouped in the following categories."
|
||||||
},
|
},
|
||||||
"items": [
|
"items": [
|
||||||
{
|
{
|
||||||
|
@ -202,7 +202,7 @@
|
||||||
"label": "Deduction & propagation",
|
"label": "Deduction & propagation",
|
||||||
"link": {
|
"link": {
|
||||||
"type": "generated-index",
|
"type": "generated-index",
|
||||||
"description": "The OpenAIRE Research Graph is further enriched by the deduction and propagation processes descibed in this section."
|
"description": "The OpenAIRE Graph is further enriched by the deduction and propagation processes descibed in this section."
|
||||||
},
|
},
|
||||||
"items": [
|
"items": [
|
||||||
{
|
{
|
||||||
|
@ -220,7 +220,7 @@
|
||||||
"label": "Indicators ingestion",
|
"label": "Indicators ingestion",
|
||||||
"link": {
|
"link": {
|
||||||
"type": "generated-index",
|
"type": "generated-index",
|
||||||
"description": "In this step, the following types of indicators are ingested in the OpenAIRE Research Graph."
|
"description": "In this step, the following types of indicators are ingested in the OpenAIRE Graph."
|
||||||
},
|
},
|
||||||
"items": [
|
"items": [
|
||||||
{
|
{
|
||||||
|
|