Support for the PromoteAction strategy [master] #391

Merged
claudio.atzori merged 2 commits from promote_actions_join_type_master into master 2024-02-08 15:12:17 +01:00

This PR introduces support for two strategies the workflow can use to promote the actionset contents in the graph.

So far the only strategy assumed to upsert each record against the corresponding entity table, merging with the records matched by OpenAIRE id and inserting those that do not match. This turned out to be introducing noisy or stable records often contained in the actionset, which pollute the graph and may cause failures along the graph processing pipeline.

The upsert strategy continues to be the default one, but this modification allows the workflow caller to speficy a different one, named ENRICH, which only updates the matching records, while discarding the non-matching ones.

Hence, the oozie workflow definition now suppoerts a new optional parameter named promoteActionStrategy that can assume two values

  • UPSERT (default)
  • ENRICH
This PR introduces support for two strategies the workflow can use to promote the actionset contents in the graph. So far the only strategy assumed to _upsert_ each record against the corresponding entity table, merging with the records matched by OpenAIRE id and inserting those that do not match. This turned out to be introducing noisy or stable records often contained in the actionset, which pollute the graph and may cause failures along the graph processing pipeline. The upsert strategy continues to be the default one, but this modification allows the workflow caller to speficy a different one, named `ENRICH`, which only updates the matching records, while discarding the non-matching ones. Hence, the oozie workflow definition now suppoerts a new optional parameter named `promoteActionStrategy` that can assume two values - `UPSERT` (default) - `ENRICH`
claudio.atzori added 2 commits 2024-02-08 15:12:08 +01:00
claudio.atzori merged commit f21133229a into master 2024-02-08 15:12:17 +01:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#391
No description provided.