diff --git a/docs/data-provision/aggregation/datacite.md b/docs/data-provision/aggregation/datacite.md index 6d838fd..13b67cd 100644 --- a/docs/data-provision/aggregation/datacite.md +++ b/docs/data-provision/aggregation/datacite.md @@ -35,7 +35,7 @@ The table below describes the mapping from the XML baseline records to the OpenA | OpenAIRE Result field path | Datacite record JSON path | # Notes | |--------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `id` | `\attributes\doi` | the identifier will be created by folloing the openaire PID generation policy | +| `id` | `\attributes\doi` | id in the form `doi_________::md5(doi)` | | | | Use the vocabulary **_dnet:publication_resource_** to find a synonym to one of these terms and get the `instance.type`. Using the **_dnet:result_typologies_** vocabulary, we look up the `instance.type` synonym to generate one of the following main entities: | | `pid` | `\attributes\doi` | `scheme = doi` | | `originalid` | `\attributes\doi` | | diff --git a/docs/data-provision/aggregation/pubmed.md b/docs/data-provision/aggregation/pubmed.md index c0c6ac6..d2355ff 100644 --- a/docs/data-provision/aggregation/pubmed.md +++ b/docs/data-provision/aggregation/pubmed.md @@ -7,6 +7,8 @@ This section describes the mapping implemented for [MEDLINE/PubMed](https://pubm The native data is collected from the [ftp baseline](https://ftp.ncbi.nlm.nih.gov/pubmed/baseline/) site. It contains XML records compliant with the schema available at https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html. +## Incremental harvesting +Pubmed exposes an entry point FTP with all the updates for each one. [ftp baseline update](https://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/). We collect the new file and generate the new dataset by upserting the existing item. ## Mapping The table below describes the mapping from the XML baseline records to the OpenAIRE Graph dump format. @@ -15,9 +17,9 @@ The table below describes the mapping from the XML baseline records to the OpenA | *OpenAIRE Result field path* | PubMed record field xpath | Notes | |--------------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------| | **Publication Mapping** | | | -| `id` | ?? | ?? | +| `id` | ?? | id in the form `pmid_________::md5(pmid)` | | `pid` | `//PMID` | `classid = classname = pmid` | -| `publicationdate` | `//PubmedPubDate` | apply the function GraphCleaningFunctions.cleanDate before assign it | +| `publicationdate` | `//PubmedPubDate` | clean and normalize the format of the date to be YYYY-mm-dd | | `maintitle` | `//Title` | | | `description` | `//AbstractText` | | | `language` | `//Language` | cleaning vocabulary -> dnet:languages | @@ -31,31 +33,11 @@ The table below describes the mapping from the XML baseline records to the OpenA | `container.conferencedate` | `//Journal/PubDate` | map the date of the Journal | | `container.name` | `//Journal/Title` | name of the journal | | `container.vol` | `//Journal/Volume` | journal volume | -| `container.issPrinted` | `//Journal/ISSN` | ?? | +| `container.issPrinted` | `//Journal/ISSN` | the journal issn | | `container.iss` | `//Journal/Issue` | The journal issue | | **Instance Mapping** | | | | `instance.type` | `//PublicationType` | if the article contains the typology `Journal Article` then we apply this type else We have to find a terms that match the vocabulary otherwise we discard it | | `instance.pid` | `//PMID` | map the pmid in the pid in the instance | | `instance.url` | `//PMID` | creates the URL by prepending `https://pubmed.ncbi.nlm.nih.gov/` to the PMId | | `instance.alternateIdentifier` | `//ArticleId[./@IdType="doi"]` | | -| `instance.publicationdate` | `//PubmedPubDate` | | - - -| *OpenAIRE Relation field path* | PubMed record field xpath | Notes | -|--------------------------------|---------------------------|-------| -| | | | - -#TODO - -Missing item mapped - - - - - - - - - - - +| `instance.publicationdate` | `//PubmedPubDate` | clean and normalize the format of the date to be YYYY-mm-dd | \ No newline at end of file diff --git a/sidebars.js b/sidebars.js index e7501f1..8063572 100644 --- a/sidebars.js +++ b/sidebars.js @@ -66,7 +66,8 @@ const sidebars = { items: [ { type: 'doc', id: 'data-provision/aggregation/doiboost' }, { type: 'doc', id: 'data-provision/aggregation/pubmed' }, - { type: 'doc', id: 'data-provision/aggregation/datacite' } + { type: 'doc', id: 'data-provision/aggregation/datacite' }, + { type: 'doc', id: 'data-provision/aggregation/ebi' }, ] }, {