aggregation section #2
|
@ -35,7 +35,7 @@ The table below describes the mapping from the XML baseline records to the OpenA
|
|||
|
||||
| OpenAIRE Result field path | Datacite record JSON path | # Notes |
|
||||
|--------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| `id` | `\attributes\doi` | the identifier will be created by folloing the openaire PID generation policy |
|
||||
| `id` | `\attributes\doi` | id in the form `doi_________::md5(doi)` |
|
||||
| <ul><li>`instance`</li> <li>`instance.type`</li></ul> | <ul><li>`\attributes\types\resourceType`</li> <li> `\attributes\types\resourceTypeGeneral` </li> <li>`attributes\types\schemaOrg`</li></ul> | Use the vocabulary **_dnet:publication_resource_** to find a synonym to one of these terms and get the `instance.type`. Using the **_dnet:result_typologies_** vocabulary, we look up the `instance.type` synonym to generate one of the following main entities: <ul><li>`publication`</li> <li>`dataset`</li> <li> `software`</li> <li>`otherresearchproduct`</li></ul> |
|
||||
| `pid` | `\attributes\doi` | `scheme = doi` |
|
||||
| `originalid` | `\attributes\doi` | |
|
||||
|
|
|
@ -7,6 +7,8 @@ This section describes the mapping implemented for [MEDLINE/PubMed](https://pubm
|
|||
The native data is collected from the [ftp baseline](https://ftp.ncbi.nlm.nih.gov/pubmed/baseline/) site.
|
||||
It contains XML records compliant with the schema available at https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html.
|
||||
|
||||
## Incremental harvesting
|
||||
Pubmed exposes an entry point FTP with all the updates for each one. [ftp baseline update](https://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/). We collect the new file and generate the new dataset by upserting the existing item.
|
||||
|
||||
## Mapping
|
||||
|
||||
The table below describes the mapping from the XML baseline records to the OpenAIRE Graph dump format.
|
||||
|
@ -15,9 +17,9 @@ The table below describes the mapping from the XML baseline records to the OpenA
|
|||
| *OpenAIRE Result field path* | PubMed record field xpath | Notes |
|
||||
|--------------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
schatz
commented
Remove italics from the table header. Remove italics from the table header.
claudio.atzori
commented
Removed. Removed.
|
||||
| **Publication Mapping** | | |
|
||||
| `id` | ?? | ?? |
|
||||
| `id` | ?? | id in the form `pmid_________::md5(pmid)` |
|
||||
| `pid` | `//PMID` | `classid = classname = pmid` |
|
||||
schatz
commented
?? ??
claudio.atzori
commented
Filled Filled
|
||||
| `publicationdate` | `//PubmedPubDate` | apply the function GraphCleaningFunctions.cleanDate before assign it |
|
||||
| `publicationdate` | `//PubmedPubDate` | clean and normalize the format of the date to be YYYY-mm-dd |
|
||||
| `maintitle` | `//Title` | |
|
||||
| `description` | `//AbstractText` | |
|
||||
| `language` | `//Language` | cleaning vocabulary -> dnet:languages |
|
||||
|
@ -31,31 +33,11 @@ The table below describes the mapping from the XML baseline records to the OpenA
|
|||
| `container.conferencedate` | `//Journal/PubDate` | map the date of the Journal |
|
||||
| `container.name` | `//Journal/Title` | name of the journal |
|
||||
| `container.vol` | `//Journal/Volume` | journal volume |
|
||||
| `container.issPrinted` | `//Journal/ISSN` | ?? |
|
||||
| `container.issPrinted` | `//Journal/ISSN` | the journal issn |
|
||||
| `container.iss` | `//Journal/Issue` | The journal issue |
|
||||
| **Instance Mapping** | | |
|
||||
| `instance.type` | `//PublicationType` | if the article contains the typology `Journal Article` then we apply this type else We have to find a terms that match the vocabulary otherwise we discard it |
|
||||
| `instance.pid` | `//PMID` | map the pmid in the pid in the instance |
|
||||
| `instance.url` | `//PMID` | creates the URL by prepending `https://pubmed.ncbi.nlm.nih.gov/` to the PMId |
|
||||
| `instance.alternateIdentifier` | `//ArticleId[./@IdType="doi"]` | |
|
||||
| `instance.publicationdate` | `//PubmedPubDate` | |
|
||||
|
||||
|
||||
| *OpenAIRE Relation field path* | PubMed record field xpath | Notes |
|
||||
|--------------------------------|---------------------------|-------|
|
||||
| | | |
|
||||
|
||||
#TODO
|
||||
|
||||
Missing item mapped
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
| `instance.publicationdate` | `//PubmedPubDate` | clean and normalize the format of the date to be YYYY-mm-dd |
|
|
@ -66,7 +66,8 @@ const sidebars = {
|
|||
items: [
|
||||
{ type: 'doc', id: 'data-provision/aggregation/doiboost' },
|
||||
schatz
commented
Is it ok to use only "DOIBoost" here as the title of the item in the sidebar ? If yes, we add a "label" here. Is it ok to use only "DOIBoost" here as the title of the item in the sidebar ? If yes, we add a "label" here.
claudio.atzori
commented
Thanks for the hint. I'm not sure what would be better for the end user reading this doc. On one end DOIBoost means nothing, hence I'm tempted to leave the longer title (listing the different providers), on the other hand, aestetically speaking I surely prefer the short version. It's good to know anyway that can build a cleaner TOC. Thanks for the hint. I'm not sure what would be better for the end user reading this doc. On one end DOIBoost means nothing, hence I'm tempted to leave the longer title (listing the different providers), on the other hand, aestetically speaking I surely prefer the short version.
It's good to know anyway that can build a cleaner TOC.
|
||||
{ type: 'doc', id: 'data-provision/aggregation/pubmed' },
|
||||
{ type: 'doc', id: 'data-provision/aggregation/datacite' }
|
||||
{ type: 'doc', id: 'data-provision/aggregation/datacite' },
|
||||
{ type: 'doc', id: 'data-provision/aggregation/ebi' },
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
Loading…
Reference in New Issue
Remove the fullstop before the link ?
Updated.