Giambattista Bloisi
e64c2854a3
Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
...
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Claudio Atzori
ddff0e8999
merging duplicates using IdentifierComparator
2022-11-11 16:10:25 +01:00
Claudio Atzori
61319b2e83
updated dhp-schema version; set entity-level dataInfo before & after merging the fields from the group of duplicates
2022-03-25 16:38:33 +01:00
Claudio Atzori
2ee21da43b
suggestions from SonarLint
2021-08-11 12:13:22 +02:00
Claudio Atzori
23b8883ab1
applied intellij code cleanup
2021-05-14 10:58:12 +02:00
miconis
6d5c14e030
assertions updated in entity merger test
2021-04-27 09:47:49 +02:00
miconis
2355cc4e9b
minor changes and bug fix
2021-03-29 10:07:12 +02:00
Claudio Atzori
e5da4ee9b1
dedup workflow using the common PidComparator
2020-11-04 15:02:02 +01:00
miconis
c4a59d1b9a
merge with the master to port the new packages
2020-10-20 16:07:30 +02:00
miconis
708d887e64
minor changes
2020-10-20 15:12:19 +02:00
Sandro La Bruzzo
734934e2eb
fixed error on empty intersection with publication and relation on export to OAF
2020-10-08 17:29:29 +02:00
Sandro La Bruzzo
eec418cd26
moved AuthoreMerger into dhp-common
2020-10-08 10:33:55 +02:00
miconis
5a8bc329c5
bug fix in the result merge: it takes the correct bestaccessright basing on the license instead of the trust
2020-10-06 15:26:44 +02:00
miconis
e3f7798d1b
minor changes in dedup tests, bug fix in the idgenerator and pace-core version update
2020-09-29 15:31:46 +02:00
miconis
d47352cbc7
refactoring of the procedure for the id generation, minor changes and addition of a comparation on the original id and the origin datasource
2020-07-24 20:10:47 +02:00
miconis
b260fee787
implementation of the dedup_id generation using pids to make the graph more stable
2020-07-22 17:29:48 +02:00
Alessia Bardi
7e96105947
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-07-12 19:29:12 +02:00
Alessia Bardi
b7a39731a6
assert, not print
2020-07-12 19:28:56 +02:00
Michele Artini
e1ae964bc4
stats
2020-07-10 16:12:08 +02:00
Alessia Bardi
853e8d7987
test for software merge
2020-07-08 17:03:53 +02:00
Claudio Atzori
7b288a94cb
code formatting
2020-05-26 09:54:13 +02:00
miconis
da1e5cf557
implementation of the result title merge. main title with higher trust, distinct between the others
2020-05-25 18:02:57 +02:00
Claudio Atzori
7181807e64
code formatting
2020-05-23 09:51:48 +02:00
miconis
0fd0c7d725
reimplementation of the sim between two authors. now it takes into account both name and surname. threshold incremented to 1.0 if the name is too short
2020-05-22 17:24:57 +02:00
Claudio Atzori
3cf2796ac6
code formatting
2020-05-22 12:34:00 +02:00
miconis
8bbd1d0501
reimplementation of the author merging in deduprecord creation. implementation of the test class.
2020-05-21 11:52:14 +02:00