Claudio Atzori
d4871b31e8
WIP: extended provision workflow to create the JSON based payload
2024-03-08 11:43:20 +01:00
Giambattista Bloisi
e64c2854a3
Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
...
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Serafeim Chatzopoulos
0b5bf53b45
Remove unecessary indexed fields from Solr
2023-02-23 12:42:42 +02:00
Claudio Atzori
2ee21da43b
suggestions from SonarLint
2021-08-11 12:13:22 +02:00
Claudio Atzori
23b8883ab1
applied intellij code cleanup
2021-05-14 10:58:12 +02:00
Claudio Atzori
d9532446eb
imported more diffs from master branch; code formatting
2020-12-10 16:14:16 +01:00
Claudio Atzori
3f34757c63
merged from master
2020-11-19 14:34:54 +01:00
Claudio Atzori
d9e07a242b
extended XmlIndexingJob to accept an optional parameter: outputPath. When present, forces the job to write its output on the specified HDFS location
2020-11-18 14:34:55 +01:00
Claudio Atzori
8177ce7939
test for XmlIndexingJob based on a local miniSolrCluster
2020-11-18 10:58:05 +01:00
Claudio Atzori
3a11a387a9
data provision workflow enhancement: added nodes to perform DELETE BY QUERY before the indexing begins and COMMIT after the indexing is completed
2020-08-03 14:28:08 +02:00
Claudio Atzori
bac37b3973
fixed children expansion in XML records
2020-05-04 11:51:17 +02:00
Claudio Atzori
6f5b899038
reformatted code according to the updated style descriptor
2020-04-28 11:23:29 +02:00
Claudio Atzori
a0bdbacdae
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
2020-04-27 14:52:31 +02:00
Claudio Atzori
7a3f8085f7
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
2020-04-27 14:45:40 +02:00
Claudio Atzori
ad7a131b18
introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin , applied to each java class in the project
2020-04-18 12:42:58 +02:00
Claudio Atzori
77f59b1b10
dataset based provision WIP
2020-04-06 19:37:27 +02:00