Patch the identifiers (source/target) in the relations #125
No reviewers
Labels
No Label
bug
duplicate
enhancement
help wanted
invalid
question
RDGraph
RSAC
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: D-Net/dnet-hadoop#125
Loading…
Reference in New Issue
No description provided.
Delete Branch "fct_project_id_replacement"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR includes an extension of the oozie workflow responsible for the creation of the
raw
graph that optionally allows to run a phase where the identifiers included in the graph relations (source
andtarget
) can be updated with the values provided by a correspondence map (oldId -> newId
).In order to activate this phase, when running the
raw_all
workflow ensure toshouldPatchRelations = true
idMappingPath
). Note that the oozie workflow also checkes that the path exists@ -0,0 +81,4 @@
rels
.joinWith(idMapping, rels.col("source").equalTo(idMapping.col("oldId")), "full")
.filter((FilterFunction<Tuple2<Relation, RelationIdMapping>>) t -> Objects.nonNull(t._1()))
I think you could replace the filter step with a left join: you get all the original relations that is what you do with the full and the subsequent filter
PR integrated also in the
beta
branch withe87e1805c4
One further fix added
patching relation identifier phase to be run at the end, i.e. includes also claimed relations
beta
commit5d08ad86ae
master
commite725c88ebb