Commit Graph

2 Commits

Author SHA1 Message Date
Giambattista Bloisi 727ccbe575 Add profiles for different spark versions: spark-24, spark-34, spark-35 2023-09-21 14:25:26 +02:00
Giambattista Bloisi e64c2854a3 Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00