Merge pull request '[Bulk Download] first versione of the documentation' (#19) from bulk_downloads into main

Reviewed-on: D-Net/openaire-graph-docs#19
This commit is contained in:
Serafeim Chatzopoulos 2022-12-08 19:30:26 +01:00
commit caa6f7d196
16 changed files with 675 additions and 21 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

View File

@ -2,7 +2,7 @@
The OpenAIRE Graph comprises several types of [entities](../category/entities) and [relationships](./relationships) among them. The OpenAIRE Graph comprises several types of [entities](../category/entities) and [relationships](./relationships) among them.
The latest version of the JSON schema can be found on [Bulk downloads](../download). The latest version of the JSON schema can be found on the [Downloads](../downloads/full-graph) section.
<p align="center"> <p align="center">
<img loading="lazy" alt="Data model" src="/img/docs/data-model.png" width="80%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/> <img loading="lazy" alt="Data model" src="/img/docs/data-model.png" width="80%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>

View File

@ -1,17 +0,0 @@
---
sidebar_position: 4
---
# Bulk downloads
In order to facilitate users, different dumps are available. All are available under the Zenodo community called [OpenAIRE Research Graph](https://zenodo.org/communities/openaire-research-graph).
Here we provide detailed documentation about the full dump:
* JSON dump: https://doi.org/10.5281/zenodo.3516917
* JSON schema: https://doi.org/10.5281/zenodo.4238938
:::note Tip!
For a visual and interactive overview of the JSON schema, we suggest to use a JSON schema viewer like [jsonschemaviewer](https://navneethg.github.io/jsonschemaviewer/) (you just need to copy the schema and then you can easily navigate through the nodes).
:::

View File

@ -0,0 +1,30 @@
---
sidebar_position: 1
---
# CfHbKeyValue
Information about the sources from which the record has been collected.
@JsonSchema(description = "the OpenAIRE identifier of the data source")
### key
_Type: String &bull; Cardinality: ONE_
the OpenAIRE identifier of the data source
```json
"key":"10|openaire____::081b82f96300b6a6e3d282bad31cb6e2"
```
### value
_Type: String &bull; Cardinality: ONE_
The name of the data source.
```json
"value":"Crossref"
```

View File

@ -0,0 +1,37 @@
---
sidebar_position: 1
---
# CommunityInstance
It is a subclass of [Instance](../../data-model/entities/result#instance) extended with information regarding the collection and hosting source for this materialization of the result.
### hostedby
_Type: [CfHbKeyValue](./cfhb) &bull; Cardinality: ONE_
Information about the source from which the instance can be viewed or downloaded.
```json
"hostedby": {
"key": "10|issn___print::35ee75a5ad42581d604be113a8f56427",
"value": "New Phytologist"
},
```
### collectedfrom
_Type: [CfHbKeyValue](./cfhb) &bull; Cardinality: ONE_
Information about the source from which the record has been collected
```json
"collectedfrom": {
"key": "10|openaire____::081b82f96300b6a6e3d282bad31cb6e2",
"value": "Crossref"
}
```

View File

@ -0,0 +1,46 @@
---
sidebar_position: 1
---
# Context
Information related to research initiative/community (RI/RC) related to the result.
### code
_Type: String &bull; Cardinality: ONE_
Code identifying the RI/RC.
```json
"code":"sdsn-gr"
```
### label
_Type: String &bull; Cardinality: ONE_
Label of the RI/RC.
```json
"label":"SDSN - Greece"
```
### provenance
_Type: [Provenance](../../../data-model/entities/other#provenance-2) &bull; Cardinality: MANY_
Why this result is associated to the RI/RC.
```json
"provenance":[{
"provenance":"Inferred by OpenAIRE",
"trust":"0.9"
},
...
]
```

View File

@ -0,0 +1,141 @@
---
sidebar_position: 1
---
# Extended Result
It is a subclass of [Result](../../../data-model/entities/result) extended with information regarding projects (and funders), research communities/infrastructure and related data sources.
### projects
_Type: [Project](project.md) &bull; Cardinality: MANY_
List of projects (i.e. grants) that (co-)funded the production of the research results.
```json
"projects": [
{
"id": "40|corda__h2020::94c4a066401e22002c4811a301bb4655",
"code": "727929",
"acronym": "TomRes",
"title": "A NOVEL AND INTEGRATED APPROACH TO INCREASE MULTIPLE AND COMBINED STRESS TOLERANCE IN PLANTS USING TOMATO AS A MODEL",
"funder": {
"shortName": "EC",
"name": "European Commission",
"jurisdiction": "EU",
"fundingStream": "H2020"
},
"provenance": {
"provenance": "Harvested",
"trust": "0.900000000000000022"
},
"validated": {
"validationDate": "2021-0101",
"validatedByFunder": true
}
},
...
]
```
### context
_Type: [Context](./context) &bull; Cardinality: MANY_
Reference to relevant research infrastructure, initiative or communities (RI/RC) among those collaborating with OpenAIRE. Please see https://connect.openaire.eu that are publicly visible.
```json
"context":[
{
"code":"sdsn-gr",
"label":"SDSN - Greece",
"provenance":[
{
"provenance":"Inferred by OpenAIRE",
"trust":"0.9"
}
]
},
...
]
```
### collectedfrom
_Type: [CfHbKeyValue](./cfhb) &bull; Cardinality: MANY_
Information about the sources from which the record has been collected.
```json
"collectedfrom":[
{
"key":"10|openaire____::081b82f96300b6a6e3d282bad31cb6e2",
"value":"Crossref"
},
...
]
```
### instance
_Type: [CommunityInstance](./communityInstance) &bull; Cardinality: MANY_
Information about the source from which the instance can be viewed or downloaded.
```json
"instance": [
{
"license": "http://doi.wiley.com/10.1002/tdm_license_1.1",
"accessright": {
"code": "c_16ec",
"label": "RESTRICTED",
"scheme": "http://vocabularies.coar-repositories.org/documentation/access_rights/",
"openAccessRoute": null
},
"type": "Article",
"url": [
"https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1111%2Fnph.15014",
"http://onlinelibrary.wiley.com/wol1/doi/10.1111/nph.15014/fullpdf",
"http://dx.doi.org/10.1111/nph.15014"
],
"publicationdate": "2018-02-09",
"refereed": "UNKNOWN",
"hostedby": {
"key": "10|issn___print::35ee75a5ad42581d604be113a8f56427",
"value": "New Phytologist"
},
"collectedfrom": {
"key": "10|openaire____::081b82f96300b6a6e3d282bad31cb6e2",
"value": "Crossref"
}
},
...
]
```

View File

@ -0,0 +1,72 @@
---
sidebar_position: 1
---
# Funder
Information about the funder funding the project.
### fundingStream
_Type: String &bull; Cardinality: ONE_
Funding information for the project.
```json
"funding_stream": "H2020"
```
### jurisdiction
_Type: String &bull; Cardinality: ONE_
Geographical jurisdiction (e.g. for European Commission is EU, for Croatian Science Foundation is HR).
```json
"jurisdiction": "EU"
```
### name
_Type: String &bull; Cardinality: ONE_
The name of the funder.
```json
"name": "European Commission"
```
### shortName
_Type: String &bull; Cardinality: ONE_
The short name of the funder.
```json
"shortName": "EC"
```

View File

@ -0,0 +1,134 @@
---
sidebar_position: 1
---
# Project
The information about the projects related to the result.
### id
_Type: String &bull; Cardinality: ONE_
Main entity identifier, created according to the [OpenAIRE entity identifier and PID mapping policy](../../data-model/pids-and-identifiers).
```json
"id": "40|corda__h2020::70ea22400fd890c5033cb31642c4ae68"
```
### code
_Type: String &bull; Cardinality: ONE_
Τhe grant agreement code of the project.
```json
"code": "777541"
```
### acronym
_Type: String &bull; Cardinality: ONE_
Project's acronym.
```json
"acronym": "OpenAIRE-Advance"
```
### title
_Type: String &bull; Cardinality: ONE_
Project's title.
```json
"title": "OpenAIRE Advancing Open Scholarship"
```
### funder
_Type [Funder](funder.md) &bull; Cardinality: ONE_
Information about the funder funding the project.
```json
"funder": {
"shortName": "EC",
"name": "European Commission",
"jurisdiction": "EU",
"fundingStream": "H2020"
}
```
### provenace
_Type [Provenance](../../data-model/entities/other#provenance-2) &bull; Cardinality: ONE_
The reason why the project is associated to the result.
```json
"provenance": {
"provenance": "Harvested",
"trust": "0.900000000000000022"
}
```
### validated
_Type [Validated](validated.md) &bull; Cardinality: ONE_
Specifies it the association between the project and the result was validated.
```json
"validated": {
"validationDate": "2021-0101",
"validatedByFunder": true
}
```

View File

@ -0,0 +1,41 @@
---
sidebar_position: 1
---
# Validated
Information about the validtion of the association between the result and the funding information.
### validationDate
_Type: String &bull; Cardinality: ONE_
When OpenAIRE collected the association between the funding and the result from an authoritative source (i.e. Sygma).
```json
"validationDate": "2021-0101"
```
### validatedByFunder
_Type: Boolean &bull; Cardinality: ONE_
Specifies if the validation comes from the funder.
```json
"validatedByFunder": true
```

View File

@ -0,0 +1,6 @@
---
sidebar_position: 2
---
# Beginners kit

View File

@ -0,0 +1,35 @@
---
sidebar_position: 1
---
# Full graph dump
You can download the full OpenAIRE Research Graph Dump as well as its schema from the following links:
Dataset: https://doi.org/10.5281/zenodo.3516917
Schema: https://doi.org/10.5281/zenodo.4238938
The schema used to dump this dataset mirrors the one described in the [Data Model](../data-model).
This dataset is licensed under a Creative Commons Attribution 4.0 International License.
It is composed of several files so that you can download the parts you are interested into. The files are named after the entity they store (i.e. publication, dataset). Each file is at most 10GB and it is
a tar archive containing gz files, each with one json per line.
## How to acknowledge this work
Open Science services are open and transparent and survive thanks to your active support and to the visibility and reward they gather. If you use one of the [OpenAIRE Graph dumps](https://doi.org/10.5281/zenodo.3516917) for your research, please provide a proper citation following the recommendation that you find on the dump's Zenodo page or as provided below.
:::note How to cite
Manghi P., Atzori C., Bardi A., Baglioni M., Schirrwagen J., Dimitropoulos H., La Bruzzo S., Foufoulas I., Mannocci A., Horst M., Czerniak A., Kiatropoulou K., Kokogiannaki A., De Bonis M., Artini M., Ottonello E., Lempesis A., Ioannidis A., Manola N., Principe P. (2022). "OpenAIRE Research Graph Dump", *Dataset*, Zenodo. [doi:10.5281/zenodo.3516917](https://doi.org/10.5281/zenodo.3516917) ([BibTex](/bibtex/OpenAIRE_Research_Graph_dump.bib))
:::
Please also consider citing [other relevant research products](/publications#relevant-research-products) that can be of interest.
Also consider adding one of the following badges to your service with the appropriate link to [our website](https://graph.openaire.eu):
<p align="left" >
<a target="_blank" href={require('../assets/openaire-red-badge.png').default} download>
<img loading="lazy" alt="Openaire badge" src={require('../assets/openaire-red-badge.png').default} width="50%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module pagination-nav__link" title="Click to download"/>
</a>
</p>

View File

@ -0,0 +1,18 @@
---
sidebar_position: 4
---
# Other related datasets
In this page, we list other related datasets; please refer to their respective schema definitions for the data model they follow.
## The dump of ScholeXplorer
Dataset: https://doi.org/10.5281/zenodo.6338616
Schema (Scholix version 3): https://doi.org/10.5281/zenodo.1120275
Schema (Scholix version 4): https://doi.org/10.5281/zenodo.6351557
This dataset is licensed under a CC0 1.0 Universal (CC0 1.0) Public Domain Dedication.
The dataset contains the GZ-compressed dump of the Scholix links exposed by the OpenAIRE ScholeXplorer service.

View File

@ -0,0 +1,68 @@
---
sidebar_position: 3
---
# Sub-graph dumps
In order to facilitate users, different dumps are available under the Zenodo community called [OpenAIRE Research Graph](https://zenodo.org/communities/openaire-research-graph).
This page lists all alternative dumps currently available.
## The OpenAIRE COVID-19 dump
Dataset: https://doi.org/10.5281/zenodo.6638745
Schema: https://doi.org/10.5281/zenodo.6372977
This dataset is licensed under a Creative Commons Attribution 4.0 International License.
It contains metadata records of publications, research data, software and projects on the topic of Corona Virus and COVID-19.
This dump is part of the activities of OpenAIRE to support the fight against COVID-19 together with the OpenAIRE COVID-19 Gateway.
The dump consists of a tar archive containing gzip files with one json per line. Please refer [here](#alternative-sub-graph-data-model) for details on the data model of this dump.
## The dump of funded products
Dataset: https://doi.org/10.5281/zenodo.6634431
Schema: https://doi.org/10.5281/zenodo.6372977
This dataset is licensed under a Creative Commons Attribution 4.0 International License.
It contains metadata records of research products (research literature, data, software, other types of research products) with funding
information available in the OpenAIRE Graph. Records are grouped by funder in a dedicated archive file. Each tar archive contains
gzip files, each with one json record per line. The model of this dump differs from the one of the whole graph.
Please refer [here](#alternative-sub-graph-data-model) for details on the data model of this dump.
## The dump of delta projects
Dataset: https://doi.org/10.5281/zenodo.7119633
Schema: https://doi.org/10.5281/zenodo.4238938
This dataset is licensed under a Creative Commons Attribution 4.0 International License.
It contains the metadata records of projects collected by OpenAIRE in a given time frame. Usually one deposition of collected projects is done for each release of the OpenAIRE Graph
The deposition is one tar archive containing gzip files, each with one json record per line.
## The dumps about research communities, initiatives and infrastructures
Dataset: https://doi.org/10.5281/zenodo.6638478
Schema: https://doi.org/10.5281/zenodo.6372977
This dataset is licensed under a Creative Commons Attribution 4.0 International License.
The dataset contains one file per community/initiative/infrastructure collaborating with OpenAIRE. Check out also their community gateways on
CONNECT. Each file is a tar archive containing gzip files with one json per line. The only communities/research initiative/infrastructure we dump are those visible to everyone.
The model of this dump differs from the one of the whole graph.
Please refer [here](#alternative-sub-graph-data-model) for details on the data model of this dump.
---
## Alternative sub-graph data model
It should be noted that the dumps for research communities, infrastructures, and products related to projects do not strictly follow the main data model of the OpenAIRE Graph. In particular, they differ in the following:
* only research products are dumped (no relations, and entities different from results)
* the dumped results are extended with information that can be inferred in the whole dump namely:
* funding information if present
* associated research community/infrastructure
* associated data sources
So they have just one entity type, that is the [Extended Result](alternative-model/extendedresult.md).

View File

@ -51,9 +51,19 @@ const sidebars = {
href: "https://graph.openaire.eu/develop/overview.html" href: "https://graph.openaire.eu/develop/overview.html"
}, },
{ {
type: 'doc', type: 'category',
id: 'download' label: "Downloads",
}, link: {
type: 'generated-index',
description: 'All resources, available for download, are listed below.'
},
items: [
{ type: 'doc', id: 'downloads/full-graph'},
{ type: 'doc', id: 'downloads/beginners-kit' },
{ type: 'doc', id: 'downloads/subgraphs' },
{ type: 'doc', id: 'downloads/related-datasets' },
]
},
{ {
type: 'category', type: 'category',
label: "Data provision", label: "Data provision",

View File

@ -0,0 +1,33 @@
@dataset{manghi_paolo_2022_6616871,
author = {Manghi, Paolo and
Atzori, Claudio and
Bardi, Alessia and
Baglioni, Miriam and
Schirrwagen, Jochen and
Dimitropoulos, Harry and
La Bruzzo, Sandro and
Foufoulas, Ioannis and
Mannocci, Andrea and
Horst, Marek and
Czerniak, Andreas and
Kiatropoulou, Katerina and
Kokogiannaki, Argiro and
De Bonis, Michele and
Artini, Michele and
Ottonello, Enrico and
Lempesis, Antonis and
Ioannidis, Alexandros and
Manola, Natalia and
Principe, Pedro},
title = {OpenAIRE Research Graph Dump},
month = jun,
year = 2022,
note = {{A new version of this dataset is published every 6
months. The content available on the OpenAIRE
EXPLORE and CONNECT portals might be more up-to-
date with respect to the data you find here.}},
publisher = {Zenodo},
version = {4.1},
doi = {10.5281/zenodo.6616871},
url = {https://doi.org/10.5281/zenodo.6616871}
}