Refactor Dedup using Spark Dataframe API, initial support for scala 2.12 and Spark 3.4 #324
No reviewers
Labels
No Label
bug
duplicate
enhancement
help wanted
invalid
question
RDGraph
RSAC
wontfix
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: D-Net/dnet-hadoop#324
Loading…
Reference in New Issue
No description provided.
Delete Branch "dedup-with-dataframe-2"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Refactor SparkCreateSimRels process to use Spark Dataframe API:
Build for both Spark 2.4 and Scala 2.11, and Spark 3.4 and Scala 2.12 (WIP)
Refacor SparkWhitelistSimRels with dataframe
JsonPath cache contention performance problem fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement using precompiled Patterns