Patch the identifiers (source/target) in the relations #125

Merged
claudio.atzori merged 3 commits from fct_project_id_replacement into master 3 years ago
Owner

This PR includes an extension of the oozie workflow responsible for the creation of the raw graph that optionally allows to run a phase where the identifiers included in the graph relations (source and target) can be updated with the values provided by a correspondence map (oldId -> newId).

In order to activate this phase, when running the raw_all workflow ensure to

  • set the flag shouldPatchRelations = true
  • set the path pointing to the HDFS location where the correspondence map is stored (idMappingPath). Note that the oozie workflow also checkes that the path exists
This PR includes an extension of the oozie workflow responsible for the creation of the `raw` graph that optionally allows to run a phase where the identifiers included in the graph relations (`source` and `target`) can be updated with the values provided by a correspondence map (`oldId -> newId`). In order to activate this phase, when running the `raw_all` workflow ensure to * set the flag `shouldPatchRelations = true` * set the path pointing to the HDFS location where the correspondence map is stored (`idMappingPath`). Note that the oozie workflow also checkes that the path exists
claudio.atzori added the
enhancement
label 3 years ago
alessia.bardi was assigned by claudio.atzori 3 years ago
miriam.baglioni was assigned by claudio.atzori 3 years ago
claudio.atzori self-assigned this 3 years ago
claudio.atzori added 1 commit 3 years ago
miriam.baglioni requested changes 3 years ago
@ -0,0 +81,4 @@
rels
.joinWith(idMapping, rels.col("source").equalTo(idMapping.col("oldId")), "full")
.filter((FilterFunction<Tuple2<Relation, RelationIdMapping>>) t -> Objects.nonNull(t._1()))
Collaborator

I think you could replace the filter step with a left join: you get all the original relations that is what you do with the full and the subsequent filter

I think you could replace the filter step with a left join: you get all the original relations that is what you do with the full and the subsequent filter
claudio.atzori marked this conversation as resolved
claudio.atzori added 1 commit 3 years ago
claudio.atzori changed target branch from master to beta 3 years ago
claudio.atzori changed target branch from beta to master 3 years ago
claudio.atzori added 1 commit 3 years ago
claudio.atzori merged commit f83dd70e1c into master 3 years ago
Poster
Owner

PR integrated also in the beta branch with e87e1805c4

PR integrated also in the `beta` branch with e87e1805c4280e9d8ed9be9f733d331401273118
Poster
Owner

One further fix added

patching relation identifier phase to be run at the end, i.e. includes also claimed relations

One further fix added ```patching relation identifier phase to be run at the end, i.e. includes also claimed relations``` - `beta` commit 5d08ad86ae45478db3742fef51c7c0ae38f30e34 - `master` commit e725c88ebb6eedea235abfc07af932756c9e2397

Reviewers

miriam.baglioni requested changes 3 years ago
The pull request has been merged as f83dd70e1c.
You can also view command line instructions.

Step 1:

From your project repository, check out a new branch and test the changes.
git checkout -b fct_project_id_replacement master
git pull origin fct_project_id_replacement

Step 2:

Merge the changes and update on Gitea.
git checkout master
git merge --no-ff fct_project_id_replacement
git push origin master
Sign in to join this conversation.
No reviewers
No Milestone
No project
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#125
Loading…
There is no content yet.