From ae41daf81d427bd357061784c464cfafff81d8fe Mon Sep 17 00:00:00 2001 From: Sandro La Bruzzo Date: Wed, 12 Oct 2022 12:16:35 +0200 Subject: [PATCH] completed Documentation of Datacite --- docs/data-provision/aggregation/datacite.md | 38 ++++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/docs/data-provision/aggregation/datacite.md b/docs/data-provision/aggregation/datacite.md index 125fe01..3ca23e7 100644 --- a/docs/data-provision/aggregation/datacite.md +++ b/docs/data-provision/aggregation/datacite.md @@ -41,6 +41,42 @@ The table below describes the mapping from the XML baseline records to the OpenA | `id` | `\attributes\doi`|the identifier will be created by folloing the openaire PID generation policy | | | | Use the vocabulary **_dnet:publication_resource_** to find a synonym to one of these terms and get the `instance.type`. Using the **_dnet:result_typologies_** vocabulary, we look up the `instance.type` synonym to generate one of the following main entities: | | `pid` | `\attributes\doi` | `scheme = doi` | +| `originalid` | `\attributes\doi` | | | `dateofcollection` | `attributes\updated` | the timestamp is defined in milliseconds we convert to "yyyy-MM-dd'T'HH:mm:ssZ" format | -| `author` | `\attributes\creators` | Each creator field will be mapped in the author entity below the subfield| +| `author` | `\attributes\creators` | Each creator field will be mapped in the author entity below the subfield. **If the record has no Creator it will be skipped**| +| `author.fullname` | `\attributes\creators\name` | if name is not defined, we construct from given and family name | +| `author.rank` | | Incremental index starting from 1 | +| `author.name` | `\attributes\creators\givenName` | | +| `author.surname` | `\attributes\creators\familyName` | | +| `author.pid` | `\attributes\creators\nameIdentifiers` | this is a list of pids associated to the creator | +| `author.pid.scheme` | `\attributes\creators\nameIdentifiers` | mapping with vocabulary **dnet:pid_types** | +| `author.pid.value` | `\attributes\creators\nameIdentifiers/nameIdentifier` | the pid value | +| `maintitle` | `\attributes\titles` | Titles whose title type is null or title type is Main | +| `subtitle` | `\attributes\titles` | Titles whose title type is Subtitle since the title type vocabulary in OpenAIRE use the datacite title type vocabulary | +| **date section** | | for each date in particular for DOI starting with _10.14457_ we Apply a fix thai date convert a date to ThaiBuddhistDate and reformat to local one see ticket [#6791](https://support.openaire.eu/issues/6791) | +|`publicationdate` | `\attributes\dates` | where `dateType` is **issued** | +|`publicationdate` | `\attributes\publicationYear` | we create this date format `01-01-publicationYear` | +|`embargoenddate` | `\attributes\dates` | where `dateType` is **available** | +| `subjects` | `\attributes\subject` | `scheme=keywords` | +| `description` | `\attributes\descriptions` | | +| `publisher` | `\attributes\publisher` | | +| `language` | `\attributes\language` | cleaned by using vocabulary `dnet:languages` | +| `publisher` | `\attributes\publisher` | | +| `instance.license` | `\attributes\rightsList` | if right value starts with http and matches a particular regex | +| `instance.accessright` | `\attributes\rightsList` | | + + +### Mapping Relation + + +| OpenAIRE Relation Semantic and inverse | Datacite record JSON path | Source/Tartget type | #Notes | +|-------------------------------------------|-------------------------------|-------------------------------|---------| +| `isProducedBy` |`attributes\fundingReferences` | `Result/Project`| we must identifi if match this pattern `(info:eu-repo/grantagreement/ec/h2020/)(\d{6})(.*)`| +| `IsProvidedBy` | | `Result/DataSource` | Datasource is always Datacite| +| `IsHostedBy` | `\attributes\relationships\client\id` | `Result/DataSource` |we defined a curated map clientId/Datasource if we found a match we create an _hostedBy Relation_ | + + +### Relation Resolution + +