reviewed Pubmed Mapping, added EBI page

This commit is contained in:
Sandro La Bruzzo 2022-10-21 14:58:16 +02:00
parent 93bad11a04
commit 89cc05d25a
3 changed files with 9 additions and 26 deletions

View File

@ -35,7 +35,7 @@ The table below describes the mapping from the XML baseline records to the OpenA
| OpenAIRE Result field path | Datacite record JSON path | # Notes | | OpenAIRE Result field path | Datacite record JSON path | # Notes |
|--------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |--------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `id` | `\attributes\doi` | the identifier will be created by folloing the openaire PID generation policy | | `id` | `\attributes\doi` | id in the form `doi_________::md5(doi)` |
| <ul><li>`instance`</li> <li>`instance.type`</li></ul> | <ul><li>`\attributes\types\resourceType`</li> <li> `\attributes\types\resourceTypeGeneral` </li> <li>`attributes\types\schemaOrg`</li></ul> | Use the vocabulary **_dnet:publication_resource_** to find a synonym to one of these terms and get the `instance.type`. Using the **_dnet:result_typologies_** vocabulary, we look up the `instance.type` synonym to generate one of the following main entities: <ul><li>`publication`</li> <li>`dataset`</li> <li> `software`</li> <li>`otherresearchproduct`</li></ul> | | <ul><li>`instance`</li> <li>`instance.type`</li></ul> | <ul><li>`\attributes\types\resourceType`</li> <li> `\attributes\types\resourceTypeGeneral` </li> <li>`attributes\types\schemaOrg`</li></ul> | Use the vocabulary **_dnet:publication_resource_** to find a synonym to one of these terms and get the `instance.type`. Using the **_dnet:result_typologies_** vocabulary, we look up the `instance.type` synonym to generate one of the following main entities: <ul><li>`publication`</li> <li>`dataset`</li> <li> `software`</li> <li>`otherresearchproduct`</li></ul> |
| `pid` | `\attributes\doi` | `scheme = doi` | | `pid` | `\attributes\doi` | `scheme = doi` |
| `originalid` | `\attributes\doi` | | | `originalid` | `\attributes\doi` | |

View File

@ -7,6 +7,8 @@ This section describes the mapping implemented for [MEDLINE/PubMed](https://pubm
The native data is collected from the [ftp baseline](https://ftp.ncbi.nlm.nih.gov/pubmed/baseline/) site. The native data is collected from the [ftp baseline](https://ftp.ncbi.nlm.nih.gov/pubmed/baseline/) site.
It contains XML records compliant with the schema available at https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html. It contains XML records compliant with the schema available at https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html.
## Incremental harvesting
Pubmed exposes an entry point FTP with all the updates for each one. [ftp baseline update](https://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/). We collect the new file and generate the new dataset by upserting the existing item.
## Mapping ## Mapping
The table below describes the mapping from the XML baseline records to the OpenAIRE Graph dump format. The table below describes the mapping from the XML baseline records to the OpenAIRE Graph dump format.
@ -15,9 +17,9 @@ The table below describes the mapping from the XML baseline records to the OpenA
| *OpenAIRE Result field path* | PubMed record field xpath | Notes | | *OpenAIRE Result field path* | PubMed record field xpath | Notes |
|--------------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------| |--------------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Publication Mapping** | | | | **Publication Mapping** | | |
| `id` | ?? | ?? | | `id` | ?? | id in the form `pmid_________::md5(pmid)` |
| `pid` | `//PMID` | `classid = classname = pmid` | | `pid` | `//PMID` | `classid = classname = pmid` |
| `publicationdate` | `//PubmedPubDate` | apply the function GraphCleaningFunctions.cleanDate before assign it | | `publicationdate` | `//PubmedPubDate` | clean and normalize the format of the date to be YYYY-mm-dd |
| `maintitle` | `//Title` | | | `maintitle` | `//Title` | |
| `description` | `//AbstractText` | | | `description` | `//AbstractText` | |
| `language` | `//Language` | cleaning vocabulary -> dnet:languages | | `language` | `//Language` | cleaning vocabulary -> dnet:languages |
@ -31,31 +33,11 @@ The table below describes the mapping from the XML baseline records to the OpenA
| `container.conferencedate` | `//Journal/PubDate` | map the date of the Journal | | `container.conferencedate` | `//Journal/PubDate` | map the date of the Journal |
| `container.name` | `//Journal/Title` | name of the journal | | `container.name` | `//Journal/Title` | name of the journal |
| `container.vol` | `//Journal/Volume` | journal volume | | `container.vol` | `//Journal/Volume` | journal volume |
| `container.issPrinted` | `//Journal/ISSN` | ?? | | `container.issPrinted` | `//Journal/ISSN` | the journal issn |
| `container.iss` | `//Journal/Issue` | The journal issue | | `container.iss` | `//Journal/Issue` | The journal issue |
| **Instance Mapping** | | | | **Instance Mapping** | | |
| `instance.type` | `//PublicationType` | if the article contains the typology `Journal Article` then we apply this type else We have to find a terms that match the vocabulary otherwise we discard it | | `instance.type` | `//PublicationType` | if the article contains the typology `Journal Article` then we apply this type else We have to find a terms that match the vocabulary otherwise we discard it |
| `instance.pid` | `//PMID` | map the pmid in the pid in the instance | | `instance.pid` | `//PMID` | map the pmid in the pid in the instance |
| `instance.url` | `//PMID` | creates the URL by prepending `https://pubmed.ncbi.nlm.nih.gov/` to the PMId | | `instance.url` | `//PMID` | creates the URL by prepending `https://pubmed.ncbi.nlm.nih.gov/` to the PMId |
| `instance.alternateIdentifier` | `//ArticleId[./@IdType="doi"]` | | | `instance.alternateIdentifier` | `//ArticleId[./@IdType="doi"]` | |
| `instance.publicationdate` | `//PubmedPubDate` | | | `instance.publicationdate` | `//PubmedPubDate` | clean and normalize the format of the date to be YYYY-mm-dd |
| *OpenAIRE Relation field path* | PubMed record field xpath | Notes |
|--------------------------------|---------------------------|-------|
| | | |
#TODO
Missing item mapped

View File

@ -66,7 +66,8 @@ const sidebars = {
items: [ items: [
{ type: 'doc', id: 'data-provision/aggregation/doiboost' }, { type: 'doc', id: 'data-provision/aggregation/doiboost' },
{ type: 'doc', id: 'data-provision/aggregation/pubmed' }, { type: 'doc', id: 'data-provision/aggregation/pubmed' },
{ type: 'doc', id: 'data-provision/aggregation/datacite' } { type: 'doc', id: 'data-provision/aggregation/datacite' },
{ type: 'doc', id: 'data-provision/aggregation/ebi' },
] ]
}, },
{ {