Giambattista Bloisi
e64c2854a3
Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
...
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Claudio Atzori
2ee21da43b
suggestions from SonarLint
2021-08-11 12:13:22 +02:00
Sandro La Bruzzo
168bfb496a
adopted dedup to the new schema
2020-07-31 09:06:57 +02:00
Claudio Atzori
6f5b899038
reformatted code according to the updated style descriptor
2020-04-28 11:23:29 +02:00
Claudio Atzori
a0bdbacdae
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
2020-04-27 14:52:31 +02:00
Claudio Atzori
7a3f8085f7
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
2020-04-27 14:45:40 +02:00
Claudio Atzori
ad7a131b18
introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin , applied to each java class in the project
2020-04-18 12:42:58 +02:00
miconis
a61763d149
structure for sparksimrel changed to be compliant with mockito testing
2020-04-02 18:37:53 +02:00