Commit Graph

27 Commits

Author SHA1 Message Date
Giambattista Bloisi e64c2854a3 Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Serafeim Chatzopoulos bc1bf55507 Add AbstractSplittedRecordPlugin 2022-04-07 14:33:04 +03:00
Claudio Atzori 663b1556d7 manually integrating PR#140 #140 2021-09-15 16:40:25 +02:00
Sandro La Bruzzo 3c6fc2096c fix bug on oai iterator that skip record cleaned 2021-09-07 10:46:26 +02:00
Miriam Baglioni 8769dd8eef GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:20:56 +02:00
Claudio Atzori 2ee21da43b suggestions from SonarLint 2021-08-11 12:13:22 +02:00
Claudio Atzori 777536ce91 [aggregation] string values used as regular expressions in the OAI collection classes are defined in a single point as constants, to be reused across the code (PR#122) 2021-07-07 11:23:48 +02:00
Andreas Czerniak ebf3f47a02 from&until more OAI2.0 compl., adding tfs 2021-07-07 09:29:49 +02:00
Claudio Atzori 23b8883ab1 applied intellij code cleanup 2021-05-14 10:58:12 +02:00
Claudio Atzori 7df2461ccc indent XML records collected from oai-pmh endpoints 2021-02-25 16:19:12 +01:00
Claudio Atzori b592d78bb4 WIP: collectorWorker error reporting, generalised reported implementation 2021-02-17 10:28:01 +01:00
Claudio Atzori cf27905a71 WIP: collectorWorker error reporting, added report messages 2021-02-16 16:53:14 +01:00
Claudio Atzori 29c6f7e255 classes related to the collection workflow moved into common package; implemented MongoDB collection plugins 2021-02-12 12:31:02 +01:00
Claudio Atzori 50add4c61b added requestDelay to HttpConnector2 configuration; Aggregation workflow constants moved in dhp-common 2021-02-08 12:19:38 +01:00
Claudio Atzori 40df0f987d better logging, WIP: collectorWorker error reporting; common functions moved in DHPUtils 2021-02-06 20:12:00 +01:00
Claudio Atzori a8a758925e better logging, WIP: collectorWorker error reporting 2021-02-05 19:18:05 +01:00
Claudio Atzori deb85706db imported HttpConnector from https://svn.driver.research-infrastructures.eu/driver/dnet45/modules/dnet-modular-collector-service/trunk/src/main/java/eu/dnetlib/data/collector/plugins/HttpConnector.java as HttpConnector2 2021-02-04 17:24:52 +01:00
Claudio Atzori 40764cf626 better logging, WIP: collectorWorker error reporting 2021-02-04 14:06:02 +01:00
Claudio Atzori e04045089f better logging, WIP: collectorWorker error reporting 2021-02-03 17:58:22 +01:00
Claudio Atzori 0e8a4f9f1a better logging, WIP: collectorWorker error reporting 2021-02-03 12:33:41 +01:00
Sandro La Bruzzo 0276180039 WIP mdstore
transaction implemented on hadoop side
2021-01-29 16:42:41 +01:00
Sandro La Bruzzo 98b9498b57 Removed old messaging system not quite used from collection and Transformation workflow
code refactor
2021-01-28 09:51:17 +01:00
Claudio Atzori 6f5b899038 reformatted code according to the updated style descriptor 2020-04-28 11:23:29 +02:00
Claudio Atzori a0bdbacdae switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin 2020-04-27 14:52:31 +02:00
Claudio Atzori 7a3f8085f7 switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin 2020-04-27 14:45:40 +02:00
Claudio Atzori ad7a131b18 introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin, applied to each java class in the project 2020-04-18 12:42:58 +02:00
Sandro La Bruzzo 5a8a323f2a dhp-collection-worker integrated in dhp-workflows 2019-10-24 11:36:59 +02:00