dnet-hadoop/dhp-pace-core/src/main/java/eu/dnetlib/pace/clustering
Giambattista Bloisi e64c2854a3 Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
..
AbstractClusteringFunction.java Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface 2023-07-24 15:36:24 +02:00
Acronyms.java New sources formatted by maven plugin 2023-07-06 10:28:53 +02:00
ClusteringClass.java New sources formatted by maven plugin 2023-07-06 10:28:53 +02:00
ClusteringFunction.java Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface 2023-07-24 15:36:24 +02:00
ImmutableFieldValue.java New sources formatted by maven plugin 2023-07-06 10:28:53 +02:00
KeywordsClustering.java Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface 2023-07-24 15:36:24 +02:00
LastNameFirstInitial.java Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface 2023-07-24 15:36:24 +02:00
LowercaseClustering.java Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface 2023-07-24 15:36:24 +02:00
NGramUtils.java Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface 2023-07-24 15:36:24 +02:00
NgramPairs.java Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface 2023-07-24 15:36:24 +02:00
Ngrams.java Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface 2023-07-24 15:36:24 +02:00
PersonClustering.java Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface 2023-07-24 15:36:24 +02:00
PersonHash.java New sources formatted by maven plugin 2023-07-06 10:28:53 +02:00
RandomClusteringFunction.java New sources formatted by maven plugin 2023-07-06 10:28:53 +02:00
SortedNgramPairs.java Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface 2023-07-24 15:36:24 +02:00
SpaceTrimmingFieldValue.java New sources formatted by maven plugin 2023-07-06 10:28:53 +02:00
SuffixPrefix.java New sources formatted by maven plugin 2023-07-06 10:28:53 +02:00
UrlClustering.java Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface 2023-07-24 15:36:24 +02:00
WordsStatsSuffixPrefixChain.java New sources formatted by maven plugin 2023-07-06 10:28:53 +02:00
WordsSuffixPrefix.java New sources formatted by maven plugin 2023-07-06 10:28:53 +02:00