Priority to records from delegated authorities #187

Merged
miriam.baglioni merged 10 commits from delegated_authorities into beta 2 years ago
Owner

When a record is aggregated from multiple sources considered authoritative for minting specific PIDs, different mappings could be applied to them, depending on the case, thus resulting in inconsistencies in the attribution of the field values. To overcome the issue, the intuition is to include such records only once in the graph.

For the time being, this case seems to involve only Zenodo as delegated authority from Datacite and the policy we're going to implement assumes to pick the version from Zenodo (as it is assumed to be richer).

This "selection" can be performed when the entitites in the graph sharing the same identifier are grouped together, but the graph pipeline does not currently include any of such operation between the raw graph is materialised and before the deduplication workflow takes place.

This implies that we must introduce a new grouping phase, producing a new graph materialization. The implementation for the procedure can share the same code, extended to support this further businness logic; to this aim, the grouping spark job was factored in the dhp-common module.

Note that the project temporarily depends on dhp-schemas 2.10.26-SNAPSHOT until it will be released.

When a record is aggregated from multiple sources considered authoritative for minting specific PIDs, different mappings could be applied to them, depending on the case, thus resulting in inconsistencies in the attribution of the field values. To overcome the issue, the intuition is to include such records only once in the graph. For the time being, this case seems to involve only Zenodo as delegated authority from Datacite and the policy we're going to implement assumes to pick the version from Zenodo (as it is assumed to be richer). This "selection" can be performed when the entitites in the graph sharing the same identifier are grouped together, but the graph pipeline does not currently include any of such operation between the raw graph is materialised and before the deduplication workflow takes place. This implies that we must introduce a new grouping phase, producing a new graph materialization. The implementation for the procedure can share the same code, extended to support this further businness logic; to this aim, the grouping spark job was factored in the `dhp-common` module. Note that the project temporarily depends on ```dhp-schemas 2.10.26-SNAPSHOT``` until it will be released.
claudio.atzori added the
enhancement
label 2 years ago
claudio.atzori added 2 commits 2 years ago
claudio.atzori requested review from miriam.baglioni 2 years ago
claudio.atzori added 2 commits 2 years ago
claudio.atzori added 1 commit 2 years ago
claudio.atzori added 1 commit 2 years ago
claudio.atzori added 2 commits 2 years ago
claudio.atzori requested review from sandro.labruzzo 2 years ago
claudio.atzori added 1 commit 2 years ago
claudio.atzori added 1 commit 2 years ago
Collaborator

The PR is fine with me. I think it can be integrated.
Note: I have skipped the modifications for the provision and iis workflows

The PR is fine with me. I think it can be integrated. Note: I have skipped the modifications for the provision and iis workflows
Owner

The PR is fine with me. I think it can be integrated.

The PR is fine with me. I think it can be integrated.
miriam.baglioni merged commit a70b0990c9 into beta 2 years ago

Reviewers

miriam.baglioni was requested for review 2 years ago
sandro.labruzzo was requested for review 2 years ago
The pull request has been merged as a70b0990c9.
You can also view command line instructions.

Step 1:

From your project repository, check out a new branch and test the changes.
git checkout -b delegated_authorities beta
git pull origin delegated_authorities

Step 2:

Merge the changes and update on Gitea.
git checkout beta
git merge --no-ff delegated_authorities
git push origin beta
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#187
Loading…
There is no content yet.