Giambattista Bloisi
d2d173773e
Precompile blacklists patterns before evaluating clustering criteria
...
Enable Junit 5 tests in maven builds
Make path comparisons platform-independent
Read String resource files assuming they are encoded in UTF-8
Fix a few test conditions
2023-06-16 09:41:11 +02:00
miconis
b2cbc09fda
bug fix in the normalization of a legalname, city map updated and transliteration support added
2022-03-15 14:59:13 +01:00
miconis
e168d95ec0
bug fix in the authormatch comparator, implementation of tests
2022-01-13 11:58:28 +01:00
miconis
5e8757a457
implementation of new comparators for publication dedup configuration update
2021-12-27 17:35:02 +01:00
miconis
451114418d
implementation of the instance type comparator and its tests
2021-11-04 15:20:57 +01:00
Sandro La Bruzzo
b6c4f4acf3
upgraded maven version of commons-lang
2020-02-10 12:38:40 +01:00
miconis
eeeb374480
minor changes in comparators
2020-01-24 10:01:11 +01:00
miconis
6a27fb14a8
update in the implementation of the tree: addition of new logic aggregations and statistics
2020-01-14 11:42:43 +02:00
miconis
5676e625bd
implementation of romansmatch and re-implementation of the getNumber function. New terms in the translation map and update of the configuration
2019-11-28 16:54:44 +01:00
miconis
79e62787cf
jarowinklernormalizedname splitted in 3 different comparators: citymatch, keywordmatch and jarowinkler. Implementation of the TreeStatistic support functions
2019-11-20 10:45:00 +01:00
miconis
5b3adb3e65
code cleaning, distribution of the classes in packages and implementation of the new configuration
2019-11-07 12:47:12 +01:00
miconis
1cbb48f77b
minor changes
2019-10-08 16:49:07 +02:00
miconis
03c1b334d5
translation map moved in json configuration, support for synonyms added in the configuration, now the configuration is argument of conditions, distancealgos and clusteringfunctions
2019-10-08 14:53:52 +02:00
miconis
f0b4c4cbd4
addition of a fixSpecial function to address the problem with special character in organization names, addition of new terms in translation maps
2019-08-06 17:06:05 +02:00
miconis
84974dcdfa
restyling of the JaroWinklerNormalizedName comparator, now it is optimized. Addition of some translations in the translation maps, addition of a clustering based on keywords in organizations legalnames
2019-07-19 17:10:29 +02:00
miconis
0509ea8d1e
bug fixing in the keywordsclustering class
2019-07-08 11:01:49 +02:00
miconis
2b866cfbeb
addition of doi normalization in PidMatch comparator, addition of keywordsclustering (clustering based on terms in the translation maps for the organizations), minor changes
2019-07-08 09:44:02 +02:00
Michele De Bonis
f87790f701
update of the comparator for legalnames of organizations
2019-03-21 14:27:27 +01:00
Michele De Bonis
9ff83d6567
implementation of the decision tree for the deduplication of the authors, implementation of multiple comparators to be used in a tree node and definition of the proto for person entity
2018-12-20 09:54:41 +01:00
Michele De Bonis
0bd20c565a
implementation of the decisional tree, addition of the dnet-openaire-data-protos module, definition of the person proto, blockprocessor and paceconfig modified with addition of support for the tree processing
2018-12-12 16:30:03 +01:00
Michele De Bonis
23c5a16525
addition of cities check
2018-11-16 16:11:03 +01:00
Michele De Bonis
3a517a6551
Merge branch 'master' of https://github.com/dnet-team/dnet-dedup
2018-11-12 14:11:26 +01:00
Michele De Bonis
33387a3532
configuration file updated, addition of condition on domain
2018-11-12 14:11:15 +01:00
Claudio Atzori
925a437597
getting rid of spark libs from dnet-pace-core
2018-11-12 12:46:06 +01:00
Michele De Bonis
4337e83950
implementation of JaroWinklerNormalizedName, addition of various stopwords in different languages and configuration test
2018-11-05 17:22:59 +01:00
Sandro La Bruzzo
a043d0c716
added d-net pace core module and ignored target folder
2018-10-02 10:37:54 +02:00