Patch the identifiers (source/target) in the relations #125

Merged
claudio.atzori merged 3 commits from fct_project_id_replacement into master 2021-07-29 12:11:28 +02:00
1 changed files with 2 additions and 4 deletions
Showing only changes of commit 1923c1ce21 - Show all commits

View File

@ -80,8 +80,7 @@ public class PatchRelationsApplication {
final Dataset<RelationIdMapping> idMapping = Utils.readPath(spark, idMappingPath, RelationIdMapping.class); final Dataset<RelationIdMapping> idMapping = Utils.readPath(spark, idMappingPath, RelationIdMapping.class);
rels rels
.joinWith(idMapping, rels.col("source").equalTo(idMapping.col("oldId")), "full") .joinWith(idMapping, rels.col("source").equalTo(idMapping.col("oldId")), "left")
.filter((FilterFunction<Tuple2<Relation, RelationIdMapping>>) t -> Objects.nonNull(t._1()))
.map((MapFunction<Tuple2<Relation, RelationIdMapping>, Relation>) t -> { .map((MapFunction<Tuple2<Relation, RelationIdMapping>, Relation>) t -> {
claudio.atzori marked this conversation as resolved
Review

I think you could replace the filter step with a left join: you get all the original relations that is what you do with the full and the subsequent filter

I think you could replace the filter step with a left join: you get all the original relations that is what you do with the full and the subsequent filter
final Relation r = t._1(); final Relation r = t._1();
Optional.ofNullable(t._2()) Optional.ofNullable(t._2())
@ -89,8 +88,7 @@ public class PatchRelationsApplication {
.ifPresent(r::setSource); .ifPresent(r::setSource);
return r; return r;
}, Encoders.bean(Relation.class)) }, Encoders.bean(Relation.class))
.joinWith(idMapping, rels.col("target").equalTo(idMapping.col("oldId")), "full") .joinWith(idMapping, rels.col("target").equalTo(idMapping.col("oldId")), "left")
.filter((FilterFunction<Tuple2<Relation, RelationIdMapping>>) t -> Objects.nonNull(t._1()))
.map((MapFunction<Tuple2<Relation, RelationIdMapping>, Relation>) t -> { .map((MapFunction<Tuple2<Relation, RelationIdMapping>, Relation>) t -> {
final Relation r = t._1(); final Relation r = t._1();
Optional.ofNullable(t._2()) Optional.ofNullable(t._2())