forked from D-Net/openaire-graph-docs
reviewed Pubmed Mapping, added EBI page
This commit is contained in:
parent
93bad11a04
commit
89cc05d25a
|
@ -35,7 +35,7 @@ The table below describes the mapping from the XML baseline records to the OpenA
|
||||||
|
|
||||||
| OpenAIRE Result field path | Datacite record JSON path | # Notes |
|
| OpenAIRE Result field path | Datacite record JSON path | # Notes |
|
||||||
|--------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
|--------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||||
| `id` | `\attributes\doi` | the identifier will be created by folloing the openaire PID generation policy |
|
| `id` | `\attributes\doi` | id in the form `doi_________::md5(doi)` |
|
||||||
| <ul><li>`instance`</li> <li>`instance.type`</li></ul> | <ul><li>`\attributes\types\resourceType`</li> <li> `\attributes\types\resourceTypeGeneral` </li> <li>`attributes\types\schemaOrg`</li></ul> | Use the vocabulary **_dnet:publication_resource_** to find a synonym to one of these terms and get the `instance.type`. Using the **_dnet:result_typologies_** vocabulary, we look up the `instance.type` synonym to generate one of the following main entities: <ul><li>`publication`</li> <li>`dataset`</li> <li> `software`</li> <li>`otherresearchproduct`</li></ul> |
|
| <ul><li>`instance`</li> <li>`instance.type`</li></ul> | <ul><li>`\attributes\types\resourceType`</li> <li> `\attributes\types\resourceTypeGeneral` </li> <li>`attributes\types\schemaOrg`</li></ul> | Use the vocabulary **_dnet:publication_resource_** to find a synonym to one of these terms and get the `instance.type`. Using the **_dnet:result_typologies_** vocabulary, we look up the `instance.type` synonym to generate one of the following main entities: <ul><li>`publication`</li> <li>`dataset`</li> <li> `software`</li> <li>`otherresearchproduct`</li></ul> |
|
||||||
| `pid` | `\attributes\doi` | `scheme = doi` |
|
| `pid` | `\attributes\doi` | `scheme = doi` |
|
||||||
| `originalid` | `\attributes\doi` | |
|
| `originalid` | `\attributes\doi` | |
|
||||||
|
|
|
@ -7,6 +7,8 @@ This section describes the mapping implemented for [MEDLINE/PubMed](https://pubm
|
||||||
The native data is collected from the [ftp baseline](https://ftp.ncbi.nlm.nih.gov/pubmed/baseline/) site.
|
The native data is collected from the [ftp baseline](https://ftp.ncbi.nlm.nih.gov/pubmed/baseline/) site.
|
||||||
It contains XML records compliant with the schema available at https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html.
|
It contains XML records compliant with the schema available at https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html.
|
||||||
|
|
||||||
|
## Incremental harvesting
|
||||||
|
Pubmed exposes an entry point FTP with all the updates for each one. [ftp baseline update](https://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/). We collect the new file and generate the new dataset by upserting the existing item.
|
||||||
## Mapping
|
## Mapping
|
||||||
|
|
||||||
The table below describes the mapping from the XML baseline records to the OpenAIRE Graph dump format.
|
The table below describes the mapping from the XML baseline records to the OpenAIRE Graph dump format.
|
||||||
|
@ -15,9 +17,9 @@ The table below describes the mapping from the XML baseline records to the OpenA
|
||||||
| *OpenAIRE Result field path* | PubMed record field xpath | Notes |
|
| *OpenAIRE Result field path* | PubMed record field xpath | Notes |
|
||||||
|--------------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
|--------------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||||
| **Publication Mapping** | | |
|
| **Publication Mapping** | | |
|
||||||
| `id` | ?? | ?? |
|
| `id` | ?? | id in the form `pmid_________::md5(pmid)` |
|
||||||
| `pid` | `//PMID` | `classid = classname = pmid` |
|
| `pid` | `//PMID` | `classid = classname = pmid` |
|
||||||
| `publicationdate` | `//PubmedPubDate` | apply the function GraphCleaningFunctions.cleanDate before assign it |
|
| `publicationdate` | `//PubmedPubDate` | clean and normalize the format of the date to be YYYY-mm-dd |
|
||||||
| `maintitle` | `//Title` | |
|
| `maintitle` | `//Title` | |
|
||||||
| `description` | `//AbstractText` | |
|
| `description` | `//AbstractText` | |
|
||||||
| `language` | `//Language` | cleaning vocabulary -> dnet:languages |
|
| `language` | `//Language` | cleaning vocabulary -> dnet:languages |
|
||||||
|
@ -31,31 +33,11 @@ The table below describes the mapping from the XML baseline records to the OpenA
|
||||||
| `container.conferencedate` | `//Journal/PubDate` | map the date of the Journal |
|
| `container.conferencedate` | `//Journal/PubDate` | map the date of the Journal |
|
||||||
| `container.name` | `//Journal/Title` | name of the journal |
|
| `container.name` | `//Journal/Title` | name of the journal |
|
||||||
| `container.vol` | `//Journal/Volume` | journal volume |
|
| `container.vol` | `//Journal/Volume` | journal volume |
|
||||||
| `container.issPrinted` | `//Journal/ISSN` | ?? |
|
| `container.issPrinted` | `//Journal/ISSN` | the journal issn |
|
||||||
| `container.iss` | `//Journal/Issue` | The journal issue |
|
| `container.iss` | `//Journal/Issue` | The journal issue |
|
||||||
| **Instance Mapping** | | |
|
| **Instance Mapping** | | |
|
||||||
| `instance.type` | `//PublicationType` | if the article contains the typology `Journal Article` then we apply this type else We have to find a terms that match the vocabulary otherwise we discard it |
|
| `instance.type` | `//PublicationType` | if the article contains the typology `Journal Article` then we apply this type else We have to find a terms that match the vocabulary otherwise we discard it |
|
||||||
| `instance.pid` | `//PMID` | map the pmid in the pid in the instance |
|
| `instance.pid` | `//PMID` | map the pmid in the pid in the instance |
|
||||||
| `instance.url` | `//PMID` | creates the URL by prepending `https://pubmed.ncbi.nlm.nih.gov/` to the PMId |
|
| `instance.url` | `//PMID` | creates the URL by prepending `https://pubmed.ncbi.nlm.nih.gov/` to the PMId |
|
||||||
| `instance.alternateIdentifier` | `//ArticleId[./@IdType="doi"]` | |
|
| `instance.alternateIdentifier` | `//ArticleId[./@IdType="doi"]` | |
|
||||||
| `instance.publicationdate` | `//PubmedPubDate` | |
|
| `instance.publicationdate` | `//PubmedPubDate` | clean and normalize the format of the date to be YYYY-mm-dd |
|
||||||
|
|
||||||
|
|
||||||
| *OpenAIRE Relation field path* | PubMed record field xpath | Notes |
|
|
||||||
|--------------------------------|---------------------------|-------|
|
|
||||||
| | | |
|
|
||||||
|
|
||||||
#TODO
|
|
||||||
|
|
||||||
Missing item mapped
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -66,7 +66,8 @@ const sidebars = {
|
||||||
items: [
|
items: [
|
||||||
{ type: 'doc', id: 'data-provision/aggregation/doiboost' },
|
{ type: 'doc', id: 'data-provision/aggregation/doiboost' },
|
||||||
{ type: 'doc', id: 'data-provision/aggregation/pubmed' },
|
{ type: 'doc', id: 'data-provision/aggregation/pubmed' },
|
||||||
{ type: 'doc', id: 'data-provision/aggregation/datacite' }
|
{ type: 'doc', id: 'data-provision/aggregation/datacite' },
|
||||||
|
{ type: 'doc', id: 'data-provision/aggregation/ebi' },
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|
Loading…
Reference in New Issue