aggregation section #2

Merged
schatz merged 27 commits from aggregation into main 2022-11-09 12:01:13 +01:00
3 changed files with 9 additions and 26 deletions
Showing only changes of commit 89cc05d25a - Show all commits

View File

@ -35,7 +35,7 @@ The table below describes the mapping from the XML baseline records to the OpenA
| OpenAIRE Result field path | Datacite record JSON path | # Notes |
|--------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `id` | `\attributes\doi` | the identifier will be created by folloing the openaire PID generation policy |
| `id` | `\attributes\doi` | id in the form `doi_________::md5(doi)` |
| <ul><li>`instance`</li> <li>`instance.type`</li></ul> | <ul><li>`\attributes\types\resourceType`</li> <li> `\attributes\types\resourceTypeGeneral` </li> <li>`attributes\types\schemaOrg`</li></ul> | Use the vocabulary **_dnet:publication_resource_** to find a synonym to one of these terms and get the `instance.type`. Using the **_dnet:result_typologies_** vocabulary, we look up the `instance.type` synonym to generate one of the following main entities: <ul><li>`publication`</li> <li>`dataset`</li> <li> `software`</li> <li>`otherresearchproduct`</li></ul> |
| `pid` | `\attributes\doi` | `scheme = doi` |
| `originalid` | `\attributes\doi` | |

View File

@ -7,6 +7,8 @@ This section describes the mapping implemented for [MEDLINE/PubMed](https://pubm
The native data is collected from the [ftp baseline](https://ftp.ncbi.nlm.nih.gov/pubmed/baseline/) site.
It contains XML records compliant with the schema available at https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html.
## Incremental harvesting
Pubmed exposes an entry point FTP with all the updates for each one. [ftp baseline update](https://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/). We collect the new file and generate the new dataset by upserting the existing item.
Review

Remove the fullstop before the link ?

Remove the fullstop before the link ?
Review

Updated.

Updated.
## Mapping
The table below describes the mapping from the XML baseline records to the OpenAIRE Graph dump format.
@ -15,9 +17,9 @@ The table below describes the mapping from the XML baseline records to the OpenA
| *OpenAIRE Result field path* | PubMed record field xpath | Notes |
|--------------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|

Remove italics from the table header.

Remove italics from the table header.

Removed.

Removed.
| **Publication Mapping** | | |
| `id` | ?? | ?? |
| `id` | ?? | id in the form `pmid_________::md5(pmid)` |
| `pid` | `//PMID` | `classid = classname = pmid` |

??

??

Filled

Filled
| `publicationdate` | `//PubmedPubDate` | apply the function GraphCleaningFunctions.cleanDate before assign it |
| `publicationdate` | `//PubmedPubDate` | clean and normalize the format of the date to be YYYY-mm-dd |
| `maintitle` | `//Title` | |
| `description` | `//AbstractText` | |
| `language` | `//Language` | cleaning vocabulary -> dnet:languages |
@ -31,31 +33,11 @@ The table below describes the mapping from the XML baseline records to the OpenA
| `container.conferencedate` | `//Journal/PubDate` | map the date of the Journal |
| `container.name` | `//Journal/Title` | name of the journal |
| `container.vol` | `//Journal/Volume` | journal volume |
| `container.issPrinted` | `//Journal/ISSN` | ?? |
| `container.issPrinted` | `//Journal/ISSN` | the journal issn |
| `container.iss` | `//Journal/Issue` | The journal issue |
| **Instance Mapping** | | |
| `instance.type` | `//PublicationType` | if the article contains the typology `Journal Article` then we apply this type else We have to find a terms that match the vocabulary otherwise we discard it |
| `instance.pid` | `//PMID` | map the pmid in the pid in the instance |
| `instance.url` | `//PMID` | creates the URL by prepending `https://pubmed.ncbi.nlm.nih.gov/` to the PMId |
| `instance.alternateIdentifier` | `//ArticleId[./@IdType="doi"]` | |
| `instance.publicationdate` | `//PubmedPubDate` | |
| *OpenAIRE Relation field path* | PubMed record field xpath | Notes |
|--------------------------------|---------------------------|-------|
| | | |
#TODO
Missing item mapped
| `instance.publicationdate` | `//PubmedPubDate` | clean and normalize the format of the date to be YYYY-mm-dd |

View File

@ -66,7 +66,8 @@ const sidebars = {
items: [
{ type: 'doc', id: 'data-provision/aggregation/doiboost' },

Is it ok to use only "DOIBoost" here as the title of the item in the sidebar ? If yes, we add a "label" here.

Is it ok to use only "DOIBoost" here as the title of the item in the sidebar ? If yes, we add a "label" here.

Thanks for the hint. I'm not sure what would be better for the end user reading this doc. On one end DOIBoost means nothing, hence I'm tempted to leave the longer title (listing the different providers), on the other hand, aestetically speaking I surely prefer the short version.

It's good to know anyway that can build a cleaner TOC.

Thanks for the hint. I'm not sure what would be better for the end user reading this doc. On one end DOIBoost means nothing, hence I'm tempted to leave the longer title (listing the different providers), on the other hand, aestetically speaking I surely prefer the short version. It's good to know anyway that can build a cleaner TOC.
{ type: 'doc', id: 'data-provision/aggregation/pubmed' },
{ type: 'doc', id: 'data-provision/aggregation/datacite' }
{ type: 'doc', id: 'data-provision/aggregation/datacite' },
{ type: 'doc', id: 'data-provision/aggregation/ebi' },
]
},
{