Commit Graph

213 Commits

Author SHA1 Message Date
Claudio Atzori 909729a2fc [dedup] tweaking num partitions, minor changes 2023-05-17 10:16:22 +02:00
Claudio Atzori 062abfd669 fixed NPE, removed unused stuff 2022-12-06 12:04:00 +01:00
Claudio Atzori 0aa725083f extended dedup testing 2022-11-17 16:13:43 +01:00
Claudio Atzori 3dbc637d3e code formatting 2022-11-17 09:55:41 +01:00
Claudio Atzori ddff0e8999 merging duplicates using IdentifierComparator 2022-11-11 16:10:25 +01:00
Claudio Atzori 5af5a8ae42 added IdentifierComparator 2022-11-09 14:20:59 +01:00
Claudio Atzori c26222623f [maven-release-plugin] prepare for next development iteration 2022-04-07 13:32:22 +02:00
Claudio Atzori 86585a6b27 [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 13:32:19 +02:00
Claudio Atzori ad85d88eaf [maven-release-plugin] rollback the release of dhp-1.2.4 2022-04-07 13:28:35 +02:00
Claudio Atzori 598e11dfd7 [maven-release-plugin] prepare for next development iteration 2022-04-07 13:27:02 +02:00
Claudio Atzori db3d9877a5 [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 13:26:58 +02:00
Claudio Atzori 3bba6d6e38 [maven-release-plugin] rollback the release of dhp-1.2.4 2022-04-07 12:23:17 +02:00
Claudio Atzori 2ac2d928bd [maven-release-plugin] prepare for next development iteration 2022-04-07 12:18:47 +02:00
Claudio Atzori 85bc722ff4 [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 12:18:43 +02:00
Claudio Atzori bc05b6168a [maven-release-plugin] rollback the release of dhp-1.2.4 2022-04-07 11:49:06 +02:00
Claudio Atzori 505420fd61 [maven-release-plugin] prepare for next development iteration 2022-04-07 11:34:06 +02:00
Claudio Atzori 66e718981e [maven-release-plugin] prepare release dhp-1.2.4 2022-04-07 11:34:02 +02:00
Claudio Atzori 61319b2e83 updated dhp-schema version; set entity-level dataInfo before & after merging the fields from the group of duplicates 2022-03-25 16:38:33 +01:00
miconis c959639bd5 dependency updated to the new pace-core version 2022-03-15 16:33:03 +01:00
miconis 8991d097b4 bug fix in the DedupRecordFactory, DataInfo set before merge 2022-02-24 17:13:12 +01:00
Claudio Atzori 391aa1373b added unit test 2022-01-19 17:13:21 +01:00
Claudio Atzori 44a937f4ed factored out entity grouping implementation, extended to consider results from delegated authorities rather than identical records from other sources 2022-01-19 12:24:52 +01:00
Claudio Atzori f4538f3c4c cleanup 2021-11-19 11:33:10 +01:00
Claudio Atzori 2b46b87f56 fixed filtering criteria applied in SparkCopyRelationsNoOpenorgs to keep the parent/child relations from OpenOrgs 2021-11-19 11:30:29 +01:00
Claudio Atzori a24b9f8268 [dedup] trivial refactoring 2021-11-18 17:12:02 +01:00
Claudio Atzori c0750fb17c avoid non necessary count operations over large spark datasets 2021-11-18 17:11:31 +01:00
Claudio Atzori 0a727d325d [dedup] increased number of partitions in the consistency phase 2021-11-16 08:43:41 +01:00
miconis 611ca511db set configuration property in openorgs duplicates wf 2021-10-07 15:39:55 +02:00
miconis 9646b9fd98 implementation of the http call for the update of openorgs suggestions 2021-10-07 11:29:11 +02:00
miconis 853333bdde implementation of the whitelist for similarity relations 2021-09-20 16:21:47 +02:00
Claudio Atzori 9f4db73f30 updated/fixed unit tests 2021-08-11 15:02:51 +02:00
Claudio Atzori 2ee21da43b suggestions from SonarLint 2021-08-11 12:13:22 +02:00
Claudio Atzori 2fff24df55 code formatting 2021-07-28 11:34:19 +02:00
Sandro La Bruzzo 3920c69bc8 change implementation of resolve Relation to generate jsonRdd in output 2021-07-25 09:51:36 +02:00
Sandro La Bruzzo 058b636d4d added control to check if the entity exists 2021-07-22 16:08:54 +02:00
Claudio Atzori 41b551562e applying PR#115 (DatePicker) on stable_ids 2021-06-17 09:33:50 +02:00
Claudio Atzori 23b8883ab1 applied intellij code cleanup 2021-05-14 10:58:12 +02:00
Claudio Atzori 5afa7d3e0c core utilities in dhp-common moved in external module dhp-schemas 2021-04-27 15:44:01 +02:00
Claudio Atzori 27ab8a704d adjusted poms to align with the external dhp-schema module 2021-04-27 10:12:27 +02:00
Claudio Atzori ef4bfd82e2 code formatting 2021-04-27 10:09:31 +02:00
Claudio Atzori faa8f6f4e2 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-04-27 09:57:03 +02:00
miconis 6d5c14e030 assertions updated in entity merger test 2021-04-27 09:47:49 +02:00
Claudio Atzori c2bb03c8b5 depending on external dhp-schemas module 2021-04-23 17:57:35 +02:00
miconis 3c12eeadce bug fix in propagation of relations 2021-04-22 11:44:33 +02:00
Claudio Atzori 8f309b72ff [dedup] using node names consistently across the workflow 2021-04-21 17:54:51 +02:00
Claudio Atzori 815b9f4d56 [openorgs dedup] fixed workflow parameter declarations. Introduced support for resuming the execution from intermediate steps 2021-04-20 17:24:45 +02:00
Claudio Atzori 45057440c1 code formatting 2021-04-16 17:28:25 +02:00
miconis 7ad573d023 bug fix: changed join in propagaterelations without applying filter on the id 2021-04-16 16:40:42 +02:00
miconis f64e57c112 refactoring of the id generation, sparkcreatemergerels collects entities to create root id after a join 2021-04-15 10:59:24 +02:00
miconis 3525a8f504 id generation of representative record moved to the SparkCreateMergeRel job 2021-04-14 18:06:07 +02:00