Normalising DOI urls #177

Merged
claudio.atzori merged 1 commits from instance_group_by_url into beta 2021-12-23 12:40:17 +01:00

Original description in #7330.

In the XML representation of the instances, there are cases that the instance.webresource is a DOI link.

This link can be in many forms:

  • doi.org
  • dx.doi.org

with

  • http://
  • https://
  • or no protocol at all.

An example record is this publication

Since all these URLs are basically the same, we should apply some preprocessing to transform them according to the Crossref and Datacite guidelines indicated in #7159 (https://doi.org) in order to merge their instances.

Original description in [#7330](https://support.openaire.eu/issues/7330). In the XML representation of the instances, there are cases that the `instance.webresource` is a DOI link. This link can be in many forms: - `doi.org` - `dx.doi.org` with - `http://` - `https://` - or no protocol at all. An example record is [this publication](http://beta.services.openaire.eu/search/v2/api/resources2?pid=10.1016%2Fj.jas.2019.105013&pidtype=doi&type=publications&format=json) Since all these URLs are basically the same, we should apply some preprocessing to transform them according to the Crossref and Datacite guidelines indicated in [#7159](https://support.openaire.eu/issues/7159) (https://doi.org) in order to merge their instances.
claudio.atzori added the
enhancement
label 2021-12-23 12:38:59 +01:00
claudio.atzori self-assigned this 2021-12-23 12:38:59 +01:00
claudio.atzori added 1 commit 2021-12-23 12:38:59 +01:00
claudio.atzori merged commit 278cf08421 into beta 2021-12-23 12:40:17 +01:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#177
No description provided.