pre-cleaning to set missing vocabulary names #33

Closed
opened 2020-07-30 14:44:44 +02:00 by claudio.atzori · 1 comment

Checking out the issue originally reported in #5833 it turned out that all the author PIDs from ORCID (actionset orcidworks-no-doi) doesn't define the name of the vocabulary in the pid.qualifier.classid.

As consequence, being the cleaning process based on the field type, it doesn't know which vocabulary should be used to clean each individual field as it assumes that every qualifier subject to cleaning must declare the name of the vocabulary to be used.

It is possible that this case could occur also in other fields, so a possible way to cover them is to include a pre-cleaning mapping that defines the missing vocabulary names.

Checking out the issue originally reported in [#5833](https://issue.openaire.research-infrastructures.eu/issues/5833) it turned out that all the author PIDs from ORCID (actionset `orcidworks-no-doi`) doesn't define the name of the vocabulary in the ```pid.qualifier.classid```. As consequence, being the cleaning process based on the field type, it doesn't know which vocabulary should be used to clean each individual field as it assumes that every qualifier subject to cleaning must declare the name of the vocabulary to be used. It is possible that this case could occur also in other fields, so a possible way to cover them is to include a pre-cleaning mapping that defines the missing vocabulary names.
claudio.atzori added the
enhancement
label 2020-07-30 14:44:44 +02:00
claudio.atzori self-assigned this 2020-07-30 14:44:44 +02:00
Author
Owner

Functionality implemented in 4ff8007518.

The cleaning workflow was also deployed in /lib/dnet/oa/graph/clean/oozie_app

Functionality implemented in https://code-repo.d4science.org/D-Net/dnet-hadoop/commit/4ff8007518dd5c9ce7b56ae95ce171dcbfd8b47a. The cleaning workflow was also deployed in `/lib/dnet/oa/graph/clean/oozie_app`
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#33
No description provided.