Priority to records from delegated authorities #187

Merged
miriam.baglioni merged 10 commits from delegated_authorities into beta 2022-01-26 16:02:50 +01:00

When a record is aggregated from multiple sources considered authoritative for minting specific PIDs, different mappings could be applied to them, depending on the case, thus resulting in inconsistencies in the attribution of the field values. To overcome the issue, the intuition is to include such records only once in the graph.

For the time being, this case seems to involve only Zenodo as delegated authority from Datacite and the policy we're going to implement assumes to pick the version from Zenodo (as it is assumed to be richer).

This "selection" can be performed when the entitites in the graph sharing the same identifier are grouped together, but the graph pipeline does not currently include any of such operation between the raw graph is materialised and before the deduplication workflow takes place.

This implies that we must introduce a new grouping phase, producing a new graph materialization. The implementation for the procedure can share the same code, extended to support this further businness logic; to this aim, the grouping spark job was factored in the dhp-common module.

Note that the project temporarily depends on dhp-schemas 2.10.26-SNAPSHOT until it will be released.

When a record is aggregated from multiple sources considered authoritative for minting specific PIDs, different mappings could be applied to them, depending on the case, thus resulting in inconsistencies in the attribution of the field values. To overcome the issue, the intuition is to include such records only once in the graph. For the time being, this case seems to involve only Zenodo as delegated authority from Datacite and the policy we're going to implement assumes to pick the version from Zenodo (as it is assumed to be richer). This "selection" can be performed when the entitites in the graph sharing the same identifier are grouped together, but the graph pipeline does not currently include any of such operation between the raw graph is materialised and before the deduplication workflow takes place. This implies that we must introduce a new grouping phase, producing a new graph materialization. The implementation for the procedure can share the same code, extended to support this further businness logic; to this aim, the grouping spark job was factored in the `dhp-common` module. Note that the project temporarily depends on ```dhp-schemas 2.10.26-SNAPSHOT``` until it will be released.
claudio.atzori added the
enhancement
label 2022-01-19 13:00:14 +01:00
claudio.atzori added 2 commits 2022-01-19 13:00:15 +01:00
claudio.atzori requested review from miriam.baglioni 2022-01-19 13:02:05 +01:00
claudio.atzori added 2 commits 2022-01-19 17:17:16 +01:00
claudio.atzori added 1 commit 2022-01-19 18:16:02 +01:00
claudio.atzori added 1 commit 2022-01-21 13:02:49 +01:00
claudio.atzori added 2 commits 2022-01-21 13:59:50 +01:00
claudio.atzori requested review from sandro.labruzzo 2022-01-21 14:00:23 +01:00
claudio.atzori added 1 commit 2022-01-21 14:30:13 +01:00
claudio.atzori added 1 commit 2022-01-25 14:28:27 +01:00

The PR is fine with me. I think it can be integrated.
Note: I have skipped the modifications for the provision and iis workflows

The PR is fine with me. I think it can be integrated. Note: I have skipped the modifications for the provision and iis workflows

The PR is fine with me. I think it can be integrated.

The PR is fine with me. I think it can be integrated.
miriam.baglioni merged commit a70b0990c9 into beta 2022-01-26 16:02:50 +01:00
Sign in to join this conversation.
No description provided.