Priority to records from delegated authorities
#187
Merged
miriam.baglioni
merged 10 commits from delegated_authorities
into beta
2 years ago
Loading…
Reference in New Issue
There is no content yet.
Delete Branch 'delegated_authorities'
Deleting a branch is permanent. It CANNOT be undone. Continue?
When a record is aggregated from multiple sources considered authoritative for minting specific PIDs, different mappings could be applied to them, depending on the case, thus resulting in inconsistencies in the attribution of the field values. To overcome the issue, the intuition is to include such records only once in the graph.
For the time being, this case seems to involve only Zenodo as delegated authority from Datacite and the policy we're going to implement assumes to pick the version from Zenodo (as it is assumed to be richer).
This "selection" can be performed when the entitites in the graph sharing the same identifier are grouped together, but the graph pipeline does not currently include any of such operation between the raw graph is materialised and before the deduplication workflow takes place.
This implies that we must introduce a new grouping phase, producing a new graph materialization. The implementation for the procedure can share the same code, extended to support this further businness logic; to this aim, the grouping spark job was factored in the
dhp-common
module.Note that the project temporarily depends on
dhp-schemas 2.10.26-SNAPSHOT
until it will be released.The PR is fine with me. I think it can be integrated.
Note: I have skipped the modifications for the provision and iis workflows
The PR is fine with me. I think it can be integrated.
a70b0990c9
into beta 2 years agoReviewers
a70b0990c9
.Step 1:
From your project repository, check out a new branch and test the changes.Step 2:
Merge the changes and update on Gitea.