Patch the identifiers (source/target) in the relations #125

Merged
claudio.atzori merged 3 commits from fct_project_id_replacement into master 2021-07-29 12:11:28 +02:00

This PR includes an extension of the oozie workflow responsible for the creation of the raw graph that optionally allows to run a phase where the identifiers included in the graph relations (source and target) can be updated with the values provided by a correspondence map (oldId -> newId).

In order to activate this phase, when running the raw_all workflow ensure to

  • set the flag shouldPatchRelations = true
  • set the path pointing to the HDFS location where the correspondence map is stored (idMappingPath). Note that the oozie workflow also checkes that the path exists
This PR includes an extension of the oozie workflow responsible for the creation of the `raw` graph that optionally allows to run a phase where the identifiers included in the graph relations (`source` and `target`) can be updated with the values provided by a correspondence map (`oldId -> newId`). In order to activate this phase, when running the `raw_all` workflow ensure to * set the flag `shouldPatchRelations = true` * set the path pointing to the HDFS location where the correspondence map is stored (`idMappingPath`). Note that the oozie workflow also checkes that the path exists
claudio.atzori added the
enhancement
label 2021-07-27 17:24:17 +02:00
alessia.bardi was assigned by claudio.atzori 2021-07-27 17:24:17 +02:00
miriam.baglioni was assigned by claudio.atzori 2021-07-27 17:24:17 +02:00
claudio.atzori self-assigned this 2021-07-27 17:24:17 +02:00
claudio.atzori added 1 commit 2021-07-27 17:24:18 +02:00
miriam.baglioni requested changes 2021-07-29 11:31:43 +02:00
@ -0,0 +81,4 @@
rels
.joinWith(idMapping, rels.col("source").equalTo(idMapping.col("oldId")), "full")
.filter((FilterFunction<Tuple2<Relation, RelationIdMapping>>) t -> Objects.nonNull(t._1()))

I think you could replace the filter step with a left join: you get all the original relations that is what you do with the full and the subsequent filter

I think you could replace the filter step with a left join: you get all the original relations that is what you do with the full and the subsequent filter
claudio.atzori marked this conversation as resolved
claudio.atzori added 1 commit 2021-07-29 11:36:25 +02:00
claudio.atzori changed target branch from master to beta 2021-07-29 11:37:07 +02:00
claudio.atzori changed target branch from beta to master 2021-07-29 11:37:47 +02:00
claudio.atzori added 1 commit 2021-07-29 11:38:23 +02:00
claudio.atzori merged commit f83dd70e1c into master 2021-07-29 12:11:28 +02:00
Author
Owner

PR integrated also in the beta branch with e87e1805c4

PR integrated also in the `beta` branch with e87e1805c4280e9d8ed9be9f733d331401273118
Author
Owner

One further fix added

patching relation identifier phase to be run at the end, i.e. includes also claimed relations

One further fix added ```patching relation identifier phase to be run at the end, i.e. includes also claimed relations``` - `beta` commit 5d08ad86ae45478db3742fef51c7c0ae38f30e34 - `master` commit e725c88ebb6eedea235abfc07af932756c9e2397
Sign in to join this conversation.
No description provided.