reviewed Pubmed Mapping, added EBI page

2022-10-21 14:58:16 +02:00 · 2022-10-21 14:58:16 +02:00 · 89cc05d25a
parent 93bad11a04
commit 89cc05d25a
3 changed files with 9 additions and 26 deletions
--- a/docs/data-provision/aggregation/datacite.md
+++ b/docs/data-provision/aggregation/datacite.md
@ -35,7 +35,7 @@ The table below describes the mapping from the XML baseline records to the OpenA

 | OpenAIRE Result field path                             | Datacite record JSON path                                                                                                                     | # Notes                                                                                                                                                                                                                                                                                                                                                                       |
 |--------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `id`                                                   | `\attributes\doi`                                                                                                                             | the identifier will be created by folloing the openaire PID generation policy                                                                                                                                                                                                                                                                                                 |
+| `id`                                                   | `\attributes\doi`                      | id in the form `doi_________::md5(doi)`                 |
 | <ul><li>`instance`</li>  <li>`instance.type`</li></ul> | <ul><li>`\attributes\types\resourceType`</li>  <li> `\attributes\types\resourceTypeGeneral` </li>  <li>`attributes\types\schemaOrg`</li></ul> | Use the vocabulary **_dnet:publication_resource_**  to find a synonym to one of these terms and get the `instance.type`. Using the **_dnet:result_typologies_** vocabulary, we look up the `instance.type` synonym to  generate one of the following main entities: <ul><li>`publication`</li>  <li>`dataset`</li> <li> `software`</li>  <li>`otherresearchproduct`</li></ul> |
 | `pid`                                                  | `\attributes\doi`                                                                                                                             | `scheme = doi`                                                                                                                                                                                                                                                                                                                                                                |
 | `originalid`                                           | `\attributes\doi`                                                                                                                             |                                                                                                                                                                                                                                                                                                                                                                               |
--- a/docs/data-provision/aggregation/pubmed.md
+++ b/docs/data-provision/aggregation/pubmed.md
@ -7,6 +7,8 @@ This section describes the mapping implemented for [MEDLINE/PubMed](https://pubm
 The native data is collected from the [ftp baseline](https://ftp.ncbi.nlm.nih.gov/pubmed/baseline/) site. 
 It contains XML records compliant with the schema available at https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html.

+## Incremental harvesting
+Pubmed exposes an entry point FTP with all the updates for each one. [ftp baseline update](https://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/). We collect the new file and generate the new dataset by upserting the existing item.
 ## Mapping

 The table below describes the mapping from the XML baseline records to the OpenAIRE Graph dump format.
@ -15,9 +17,9 @@ The table below describes the mapping from the XML baseline records to the OpenA
 | *OpenAIRE Result field path*   | PubMed record field xpath      | Notes                                                                                                                                                         |
 |--------------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | **Publication Mapping**        |                                |                                                                                                                                                               |
-| `id`                           | ??                             | ??                                                                                                                                                            |
+| `id`                           | ??                             | id in the form `pmid_________::md5(pmid)`                                                                                                                     |
 | `pid`                          | `//PMID`                       | `classid = classname = pmid`                                                                                                                                  |
-| `publicationdate`              | `//PubmedPubDate`              | apply the function GraphCleaningFunctions.cleanDate before assign it                                                                                          |
+| `publicationdate`              | `//PubmedPubDate`              | clean and normalize the format of the date to be YYYY-mm-dd                                                                                                   |
 | `maintitle`                    | `//Title`                      |                                                                                                                                                               |
 | `description`                  | `//AbstractText`               |                                                                                                                                                               |
 | `language`                     | `//Language`                   | cleaning vocabulary -> dnet:languages                                                                                                                         |
@ -31,31 +33,11 @@ The table below describes the mapping from the XML baseline records to the OpenA
 | `container.conferencedate`     | `//Journal/PubDate`            | map the date of the Journal                                                                                                                                   |
 | `container.name`               | `//Journal/Title`              | name of the journal                                                                                                                                           |
 | `container.vol`                | `//Journal/Volume`             | journal volume                                                                                                                                                |
-| `container.issPrinted`         | `//Journal/ISSN`               | ??                                                                                                                                                            |
+| `container.issPrinted`         | `//Journal/ISSN`               | the journal issn                                                                                                                                                        |
 | `container.iss`                | `//Journal/Issue`              | The journal issue                                                                                                                                             |
 | **Instance Mapping**           |                                |                                                                                                                                                               |
 | `instance.type`                | `//PublicationType`            | if the article contains the typology `Journal Article` then we apply this type else We have to find a terms that match the vocabulary otherwise we discard it |
 | `instance.pid`                 | `//PMID`                       | map the pmid in the pid in the instance                                                                                                                       |
 | `instance.url`                 | `//PMID`                       | creates the URL by prepending `https://pubmed.ncbi.nlm.nih.gov/` to the PMId                                                                                  |
 | `instance.alternateIdentifier` | `//ArticleId[./@IdType="doi"]` |                                                                                                                                                               |
-| `instance.publicationdate`     | `//PubmedPubDate`              |                                                                                                                                                               |
-
-
-| *OpenAIRE Relation field path* | PubMed record field xpath | Notes |
-|--------------------------------|---------------------------|-------|
-|                                |                           |       |
-
-#TODO
-
-Missing item mapped
-
-
-
-
-
-
-
-
-
-
-
+| `instance.publicationdate`     | `//PubmedPubDate`              |   clean and normalize the format of the date to be YYYY-mm-dd                                                                                                 |
--- a/sidebars.js
+++ b/sidebars.js
@ -66,7 +66,8 @@ const sidebars = {
          items: [
            { type: 'doc', id: 'data-provision/aggregation/doiboost' },
            { type: 'doc', id: 'data-provision/aggregation/pubmed' },
-            { type: 'doc', id: 'data-provision/aggregation/datacite' }
+            { type: 'doc', id: 'data-provision/aggregation/datacite' },
+            { type: 'doc', id: 'data-provision/aggregation/ebi' },
          ]
        },
        {