Support for the PromoteAction strategy [master] #391

Merged
claudio.atzori merged 2 commits from promote_actions_join_type_master into master 3 months ago
Owner

This PR introduces support for two strategies the workflow can use to promote the actionset contents in the graph.

So far the only strategy assumed to upsert each record against the corresponding entity table, merging with the records matched by OpenAIRE id and inserting those that do not match. This turned out to be introducing noisy or stable records often contained in the actionset, which pollute the graph and may cause failures along the graph processing pipeline.

The upsert strategy continues to be the default one, but this modification allows the workflow caller to speficy a different one, named ENRICH, which only updates the matching records, while discarding the non-matching ones.

Hence, the oozie workflow definition now suppoerts a new optional parameter named promoteActionStrategy that can assume two values

  • UPSERT (default)
  • ENRICH
This PR introduces support for two strategies the workflow can use to promote the actionset contents in the graph. So far the only strategy assumed to _upsert_ each record against the corresponding entity table, merging with the records matched by OpenAIRE id and inserting those that do not match. This turned out to be introducing noisy or stable records often contained in the actionset, which pollute the graph and may cause failures along the graph processing pipeline. The upsert strategy continues to be the default one, but this modification allows the workflow caller to speficy a different one, named `ENRICH`, which only updates the matching records, while discarding the non-matching ones. Hence, the oozie workflow definition now suppoerts a new optional parameter named `promoteActionStrategy` that can assume two values - `UPSERT` (default) - `ENRICH`
claudio.atzori added 2 commits 3 months ago
claudio.atzori merged commit f21133229a into master 3 months ago
The pull request has been merged as f21133229a.
You can also view command line instructions.

Step 1:

From your project repository, check out a new branch and test the changes.
git checkout -b promote_actions_join_type_master master
git pull origin promote_actions_join_type_master

Step 2:

Merge the changes and update on Gitea.
git checkout master
git merge --no-ff promote_actions_join_type_master
git push origin master
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#391
Loading…
There is no content yet.