forked from D-Net/openaire-graph-docs
datacite tables and EBI texts
This commit is contained in:
parent
b5e23b9605
commit
77ad2700b2
|
@ -32,12 +32,11 @@ The metadata collection process identifies the most recent record date available
|
||||||
## Datacite Mapping
|
## Datacite Mapping
|
||||||
The table below describes the mapping from the XML baseline records to the OpenAIRE Graph dump format.
|
The table below describes the mapping from the XML baseline records to the OpenAIRE Graph dump format.
|
||||||
|
|
||||||
|
|
||||||
| OpenAIRE Result field path | Datacite record JSON path | # Notes |
|
| OpenAIRE Result field path | Datacite record JSON path | # Notes |
|
||||||
|--------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
|--------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||||
| `id` | `\attributes\doi` | id in the form `doi_________::md5(doi)` |
|
| `id` | `\attributes\doi` | id in the form `doi_________::md5(doi)` |
|
||||||
| <ul><li>`instance`</li> <li>`instance.type`</li></ul> | <ul><li>`\attributes\types\resourceType`</li> <li> `\attributes\types\resourceTypeGeneral` </li> <li>`attributes\types\schemaOrg`</li></ul> | Use the vocabulary **_dnet:publication_resource_** to find a synonym to one of these terms and get the `instance.type`. |
|
| <ul><li>`instance`</li> <li>`instance.type`</li></ul> | <ul><li>`\attributes\types\resourceType`</li> <li> `\attributes\types\resourceTypeGeneral` </li> <li>`attributes\types\schemaOrg`</li></ul> | Use the vocabulary **_dnet:publication_resource_** to find a synonym to one of these terms and get the `instance.type`. |
|
||||||
|`type` | <ul><li>`\attributes\types\resourceType`</li> <li> `\attributes\types\resourceTypeGeneral` </li> <li>`attributes\types\schemaOrg`</li></ul> | Using the **_dnet:result_typologies_** vocabulary, we look up the `instance.type` synonym to generate one of the following main entities: <ul><li>`publication`</li> <li>`dataset`</li> <li> `software`</li> <li>`otherresearchproduct`</li></ul> |
|
| `type` | <ul><li>`\attributes\types\resourceType`</li> <li> `\attributes\types\resourceTypeGeneral` </li> <li>`attributes\types\schemaOrg`</li></ul> | Using the **_dnet:result_typologies_** vocabulary, we look up the `instance.type` synonym to generate one of the following main entities: <ul><li>`publication`</li> <li>`dataset`</li> <li> `software`</li> <li>`otherresearchproduct`</li></ul> |
|
||||||
| `pid` | `\attributes\doi` | `scheme = doi` |
|
| `pid` | `\attributes\doi` | `scheme = doi` |
|
||||||
| `originalid` | `\attributes\doi` | |
|
| `originalid` | `\attributes\doi` | |
|
||||||
| `dateofcollection` | `attributes\updated` | the timestamp is defined in milliseconds we convert to "yyyy-MM-dd'T'HH:mm:ssZ" format |
|
| `dateofcollection` | `attributes\updated` | the timestamp is defined in milliseconds we convert to "yyyy-MM-dd'T'HH:mm:ssZ" format |
|
||||||
|
@ -60,18 +59,18 @@ The table below describes the mapping from the XML baseline records to the OpenA
|
||||||
| `publisher` | `\attributes\publisher` | |
|
| `publisher` | `\attributes\publisher` | |
|
||||||
| `language` | `\attributes\language` | cleaned by using vocabulary `dnet:languages` |
|
| `language` | `\attributes\language` | cleaned by using vocabulary `dnet:languages` |
|
||||||
| `publisher` | `\attributes\publisher` | |
|
| `publisher` | `\attributes\publisher` | |
|
||||||
| `instance.license` | `\attributes\rightsList` | if right value starts with http and matches a particular regex |
|
| `instance.license` | `\attributes\rightsList` | if the rights value starts with http and matches a particular regex |
|
||||||
| `instance.accessright` | `\attributes\rightsList` | <ul> <li>if not present :`unknown`</li><li>if datasource is _figshare_:`open`</li><li>If `embargo_date < today()`: _OPEN_ </li> </ul> |
|
| `instance.accessright` | `\attributes\rightsList` | <ul><li>if not present :`unknown`</li><li>if datasource is Figshare:`open`</li><li>If `embargo_date < today()`: OPEN</li></ul> |
|
||||||
|
|
||||||
|
|
||||||
### Mapping Relation
|
### Mapping Relation
|
||||||
|
|
||||||
|
|
||||||
| OpenAIRE Relation Semantic and inverse | Datacite record JSON path | Source/Tartget type | #Notes |
|
| OpenAIRE Relation Semantic and inverse | Datacite record JSON path | Source/Tartget type | #Notes |
|
||||||
|-------------------------------------------|-------------------------------|-------------------------------|---------|
|
|----------------------------------------|---------------------------------------|----------------------|---------------------------------------------------------------------------------------------------|
|
||||||
| `isProducedBy` |`attributes\fundingReferences` | `Result/Project`| we must identifi if match this pattern `(info:eu-repo/grantagreement/ec/h2020/)(\d{6})(.*)`|
|
| `isProducedBy` | `attributes\fundingReferences` | `Result/Project` | we must identifi if match this pattern `(info:eu-repo/grantagreement/ec/h2020/)(\d{6})(.*)` |
|
||||||
| `IsProvidedBy` | | `Result/DataSource` | Datasource is always Datacite|
|
| `IsProvidedBy` | | `Result/DataSource` | Datasource is always Datacite |
|
||||||
| `IsHostedBy` | `\attributes\relationships\client\id` | `Result/DataSource` |we defined a curated map clientId/Datasource if we found a match we create an _hostedBy Relation_ |
|
| `IsHostedBy` | `\attributes\relationships\client\id` | `Result/DataSource` | we defined a curated map clientId/Datasource if we found a match we create an _hostedBy Relation_ |
|
||||||
|
|
||||||
|
|
||||||
### Relation Resolution
|
### Relation Resolution
|
||||||
|
|
|
@ -2,13 +2,403 @@
|
||||||
|
|
||||||
This section describes the mapping implemented for [EMBL-EBIs Protein Data Bank in Europe](https://www.ebi.ac.uk/).
|
This section describes the mapping implemented for [EMBL-EBIs Protein Data Bank in Europe](https://www.ebi.ac.uk/).
|
||||||
|
|
||||||
The Europe PMC RESTful Web Service gives the [datalinks API](https://europepmc.org/RestfulWebService#!/Europe32PMC32Articles32RESTful32API)to retrieve data-literature links in Scholix format .
|
The Europe PMC RESTful Web Service gives the [datalinks API](https://europepmc.org/RestfulWebService#!/Europe32PMC32Articles32RESTful32API) to retrieve data-literature links in Scholix format.
|
||||||
|
|
||||||
## how data is collected
|
## How the data is collected
|
||||||
Starting from the Pubmed collection, we exploit this API to get all the related bioentities related to a Publication with a specific PubMed identifier.
|
|
||||||
|
|
||||||
Following this request: `https://www.ebi.ac.uk/europepmc/webservices/rest/MED/$PMID/datalinks?format=json` we store for each pubmedID the links related.
|
Starting from the Pubmed collection, the API below is used to obtain the bioentities related to publications for each PubMed identifier.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```commandline
|
||||||
|
curl -s "https://www.ebi.ac.uk/europepmc/webservices/rest/MED/33024307/datalinks?format=json" | jq '.'
|
||||||
|
{
|
||||||
|
"version": "6.8",
|
||||||
|
"hitCount": 9,
|
||||||
|
"request": {
|
||||||
|
"id": "33024307",
|
||||||
|
"source": "MED"
|
||||||
|
},
|
||||||
|
"dataLinkList": {
|
||||||
|
"Category": [
|
||||||
|
{
|
||||||
|
"Name": "Nucleotide Sequences",
|
||||||
|
"CategoryLinkCount": 5,
|
||||||
|
"Section": [
|
||||||
|
{
|
||||||
|
"ObtainedBy": "tm_accession",
|
||||||
|
"Tags": [
|
||||||
|
"supporting_data"
|
||||||
|
],
|
||||||
|
"SectionLinkCount": 5,
|
||||||
|
"Linklist": {
|
||||||
|
"Link": [
|
||||||
|
{
|
||||||
|
"ObtainedBy": "tm_accession",
|
||||||
|
"PublicationDate": "04-11-2022",
|
||||||
|
"LinkProvider": {
|
||||||
|
"Name": "Europe PMC"
|
||||||
|
},
|
||||||
|
"RelationshipType": {
|
||||||
|
"Name": "References"
|
||||||
|
},
|
||||||
|
"Source": {
|
||||||
|
"Type": {
|
||||||
|
"Name": "literature"
|
||||||
|
},
|
||||||
|
"Identifier": {
|
||||||
|
"ID": "33024307",
|
||||||
|
"IDScheme": "MED"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"Target": {
|
||||||
|
"Type": {
|
||||||
|
"Name": "dataset"
|
||||||
|
},
|
||||||
|
"Identifier": {
|
||||||
|
"ID": "AY278488",
|
||||||
|
"IDScheme": "ENA",
|
||||||
|
"IDURL": "http://identifiers.org/ebi/ena.embl:AY278488"
|
||||||
|
},
|
||||||
|
"Title": "AY278488",
|
||||||
|
"Publisher": {
|
||||||
|
"Name": "Europe PMC"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"Frequency": 1
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"ObtainedBy": "tm_accession",
|
||||||
|
"PublicationDate": "04-11-2022",
|
||||||
|
"LinkProvider": {
|
||||||
|
"Name": "Europe PMC"
|
||||||
|
},
|
||||||
|
"RelationshipType": {
|
||||||
|
"Name": "References"
|
||||||
|
},
|
||||||
|
"Source": {
|
||||||
|
"Type": {
|
||||||
|
"Name": "literature"
|
||||||
|
},
|
||||||
|
"Identifier": {
|
||||||
|
"ID": "33024307",
|
||||||
|
"IDScheme": "MED"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"Target": {
|
||||||
|
"Type": {
|
||||||
|
"Name": "dataset"
|
||||||
|
},
|
||||||
|
"Identifier": {
|
||||||
|
"ID": "MT121216",
|
||||||
|
"IDScheme": "ENA",
|
||||||
|
"IDURL": "http://identifiers.org/ebi/ena.embl:MT121216"
|
||||||
|
},
|
||||||
|
"Title": "MT121216",
|
||||||
|
"Publisher": {
|
||||||
|
"Name": "Europe PMC"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"Frequency": 1
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"ObtainedBy": "tm_accession",
|
||||||
|
"PublicationDate": "04-11-2022",
|
||||||
|
"LinkProvider": {
|
||||||
|
"Name": "Europe PMC"
|
||||||
|
},
|
||||||
|
"RelationshipType": {
|
||||||
|
"Name": "References"
|
||||||
|
},
|
||||||
|
"Source": {
|
||||||
|
"Type": {
|
||||||
|
"Name": "literature"
|
||||||
|
},
|
||||||
|
"Identifier": {
|
||||||
|
"ID": "33024307",
|
||||||
|
"IDScheme": "MED"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"Target": {
|
||||||
|
"Type": {
|
||||||
|
"Name": "dataset"
|
||||||
|
},
|
||||||
|
"Identifier": {
|
||||||
|
"ID": "KF367457",
|
||||||
|
"IDScheme": "ENA",
|
||||||
|
"IDURL": "http://identifiers.org/ebi/ena.embl:KF367457"
|
||||||
|
},
|
||||||
|
"Title": "KF367457",
|
||||||
|
"Publisher": {
|
||||||
|
"Name": "Europe PMC"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"Frequency": 1
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"ObtainedBy": "tm_accession",
|
||||||
|
"PublicationDate": "04-11-2022",
|
||||||
|
"LinkProvider": {
|
||||||
|
"Name": "Europe PMC"
|
||||||
|
},
|
||||||
|
"RelationshipType": {
|
||||||
|
"Name": "References"
|
||||||
|
},
|
||||||
|
"Source": {
|
||||||
|
"Type": {
|
||||||
|
"Name": "literature"
|
||||||
|
},
|
||||||
|
"Identifier": {
|
||||||
|
"ID": "33024307",
|
||||||
|
"IDScheme": "MED"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"Target": {
|
||||||
|
"Type": {
|
||||||
|
"Name": "dataset"
|
||||||
|
},
|
||||||
|
"Identifier": {
|
||||||
|
"ID": "MN996532",
|
||||||
|
"IDScheme": "ENA",
|
||||||
|
"IDURL": "http://identifiers.org/ebi/ena.embl:MN996532"
|
||||||
|
},
|
||||||
|
"Title": "MN996532",
|
||||||
|
"Publisher": {
|
||||||
|
"Name": "Europe PMC"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"Frequency": 1
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"ObtainedBy": "tm_accession",
|
||||||
|
"PublicationDate": "04-11-2022",
|
||||||
|
"LinkProvider": {
|
||||||
|
"Name": "Europe PMC"
|
||||||
|
},
|
||||||
|
"RelationshipType": {
|
||||||
|
"Name": "References"
|
||||||
|
},
|
||||||
|
"Source": {
|
||||||
|
"Type": {
|
||||||
|
"Name": "literature"
|
||||||
|
},
|
||||||
|
"Identifier": {
|
||||||
|
"ID": "33024307",
|
||||||
|
"IDScheme": "MED"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"Target": {
|
||||||
|
"Type": {
|
||||||
|
"Name": "dataset"
|
||||||
|
},
|
||||||
|
"Identifier": {
|
||||||
|
"ID": "MT072864",
|
||||||
|
"IDScheme": "ENA",
|
||||||
|
"IDURL": "http://identifiers.org/ebi/ena.embl:MT072864"
|
||||||
|
},
|
||||||
|
"Title": "MT072864",
|
||||||
|
"Publisher": {
|
||||||
|
"Name": "Europe PMC"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"Frequency": 1
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Name": "Protein Structures",
|
||||||
|
"NameLong": "Protein structures in PDBe",
|
||||||
|
"CategoryLinkCount": 2,
|
||||||
|
"Section": [
|
||||||
|
{
|
||||||
|
"ObtainedBy": "tm_accession",
|
||||||
|
"Tags": [
|
||||||
|
"supporting_data"
|
||||||
|
],
|
||||||
|
"SectionLinkCount": 2,
|
||||||
|
"Linklist": {
|
||||||
|
"Link": [
|
||||||
|
{
|
||||||
|
"ObtainedBy": "tm_accession",
|
||||||
|
"PublicationDate": "04-11-2022",
|
||||||
|
"LinkProvider": {
|
||||||
|
"Name": "Europe PMC"
|
||||||
|
},
|
||||||
|
"RelationshipType": {
|
||||||
|
"Name": "References"
|
||||||
|
},
|
||||||
|
"Source": {
|
||||||
|
"Type": {
|
||||||
|
"Name": "literature"
|
||||||
|
},
|
||||||
|
"Identifier": {
|
||||||
|
"ID": "33024307",
|
||||||
|
"IDScheme": "MED"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"Target": {
|
||||||
|
"Type": {
|
||||||
|
"Name": "dataset"
|
||||||
|
},
|
||||||
|
"Identifier": {
|
||||||
|
"ID": "6VW1",
|
||||||
|
"IDScheme": "PDB",
|
||||||
|
"IDURL": "http://identifiers.org/pdbe/pdb:6VW1"
|
||||||
|
},
|
||||||
|
"Title": "6VW1",
|
||||||
|
"Publisher": {
|
||||||
|
"Name": "Europe PMC"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"Frequency": 1
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"ObtainedBy": "tm_accession",
|
||||||
|
"PublicationDate": "04-11-2022",
|
||||||
|
"LinkProvider": {
|
||||||
|
"Name": "Europe PMC"
|
||||||
|
},
|
||||||
|
"RelationshipType": {
|
||||||
|
"Name": "References"
|
||||||
|
},
|
||||||
|
"Source": {
|
||||||
|
"Type": {
|
||||||
|
"Name": "literature"
|
||||||
|
},
|
||||||
|
"Identifier": {
|
||||||
|
"ID": "33024307",
|
||||||
|
"IDScheme": "MED"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"Target": {
|
||||||
|
"Type": {
|
||||||
|
"Name": "dataset"
|
||||||
|
},
|
||||||
|
"Identifier": {
|
||||||
|
"ID": "2AJF",
|
||||||
|
"IDScheme": "PDB",
|
||||||
|
"IDURL": "http://identifiers.org/pdbe/pdb:2AJF"
|
||||||
|
},
|
||||||
|
"Title": "2AJF",
|
||||||
|
"Publisher": {
|
||||||
|
"Name": "Europe PMC"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"Frequency": 1
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Name": "Altmetric",
|
||||||
|
"CategoryLinkCount": 1,
|
||||||
|
"Section": [
|
||||||
|
{
|
||||||
|
"ObtainedBy": "ext_links",
|
||||||
|
"Tags": [
|
||||||
|
"altmetrics"
|
||||||
|
],
|
||||||
|
"SectionLinkCount": 1,
|
||||||
|
"Linklist": {
|
||||||
|
"Link": [
|
||||||
|
{
|
||||||
|
"ObtainedBy": "ext_links",
|
||||||
|
"PublicationDate": "15-10-2020",
|
||||||
|
"LinkProvider": {
|
||||||
|
"Name": "Europe PMC"
|
||||||
|
},
|
||||||
|
"RelationshipType": {
|
||||||
|
"Name": "IsReferencedBy"
|
||||||
|
},
|
||||||
|
"Source": {
|
||||||
|
"Type": {
|
||||||
|
"Name": "literature"
|
||||||
|
},
|
||||||
|
"Identifier": {
|
||||||
|
"ID": "33024307",
|
||||||
|
"IDScheme": "PMID"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"Target": {
|
||||||
|
"Type": {
|
||||||
|
"Name": "dataset"
|
||||||
|
},
|
||||||
|
"Identifier": {
|
||||||
|
"ID": "https://www.altmetric.com/details/91880755",
|
||||||
|
"IDScheme": "URL",
|
||||||
|
"IDURL": "https://www.altmetric.com/details/91880755"
|
||||||
|
},
|
||||||
|
"Title": "Characteristics of SARS-CoV-2 and COVID-19",
|
||||||
|
"Publisher": {
|
||||||
|
"Name": "Altmetric"
|
||||||
|
},
|
||||||
|
"ImageURL": "https://api.altmetric.com/v1/donut/91880755_64.png"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Name": "BioStudies: supplemental material and supporting data",
|
||||||
|
"CategoryLinkCount": 1,
|
||||||
|
"Section": [
|
||||||
|
{
|
||||||
|
"ObtainedBy": "ext_links",
|
||||||
|
"Tags": [
|
||||||
|
"supporting_data"
|
||||||
|
],
|
||||||
|
"SectionLinkCount": 1,
|
||||||
|
"Linklist": {
|
||||||
|
"Link": [
|
||||||
|
{
|
||||||
|
"ObtainedBy": "ext_links",
|
||||||
|
"PublicationDate": "11-03-2021",
|
||||||
|
"LinkProvider": {
|
||||||
|
"Name": "Europe PMC"
|
||||||
|
},
|
||||||
|
"RelationshipType": {
|
||||||
|
"Name": "IsReferencedBy"
|
||||||
|
},
|
||||||
|
"Source": {
|
||||||
|
"Type": {
|
||||||
|
"Name": "literature"
|
||||||
|
},
|
||||||
|
"Identifier": {
|
||||||
|
"ID": "33024307",
|
||||||
|
"IDScheme": "PMID"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"Target": {
|
||||||
|
"Type": {
|
||||||
|
"Name": "dataset"
|
||||||
|
},
|
||||||
|
"Identifier": {
|
||||||
|
"ID": "http://www.ebi.ac.uk/biostudies/studies/S-EPMC7537588?xr=true",
|
||||||
|
"IDScheme": "URL",
|
||||||
|
"IDURL": "http://www.ebi.ac.uk/biostudies/studies/S-EPMC7537588?xr=true"
|
||||||
|
},
|
||||||
|
"Title": "Characteristics of SARS-CoV-2 and COVID-19.",
|
||||||
|
"Publisher": {
|
||||||
|
"Name": "BioStudies: supplemental material and supporting data"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
## Mapping
|
## Mapping
|
||||||
The table below describes the mapping from the EBI links records to the OpenAIRE Graph dump format.
|
The table below describes the mapping from the EBI links records to the OpenAIRE Graph dump format.
|
||||||
|
|
Loading…
Reference in New Issue