Commit Graph

57 Commits

Author SHA1 Message Date
Giambattista Bloisi b0ade43608 Precompile blacklists patterns before evaluating clustering criteria
Enable Junit 5 tests in maven builds
Make path comparisons platform-independent
Read String resource files assuming they are encoded in UTF-8
Fix a few test conditions
2023-06-16 09:41:11 +02:00
Michele De Bonis cb595c87bb implementation of the support for authors deduplication: cosinesimilarity comparator and double array json parser 2023-04-17 11:06:27 +02:00
Michele De Bonis 6a6c266dde implementation of author dedup configuration and lnfi clustering function 2023-01-31 11:53:10 +01:00
Michele De Bonis 14f6346676 implementation of the new software configuration 2022-11-22 17:48:34 +01:00
Michele De Bonis 9fee2ed611 minor changes 2022-11-21 14:35:46 +01:00
miconis 9ddd24ba36 implementation of comparators and clustering function for the author deduplication 2022-04-19 10:18:09 +02:00
miconis a965233dd0 bug fix in the normalization of a legalname, city map updated and transliteration support added 2022-03-15 14:59:13 +01:00
miconis 2f1ba56f61 bug fix in the authormatch comparator, implementation of tests 2022-01-13 11:58:28 +01:00
miconis a224bf70a4 implementation of new comparators for publication dedup configuration update 2021-12-27 17:35:02 +01:00
miconis 8f1db32921 implementation of the instance type comparator and its tests 2021-11-04 15:20:57 +01:00
miconis fbb1b66bfb dedup test implementation & graph drawing tools 2021-09-13 14:53:19 +02:00
miconis 4bce4f2e8e minor change: version updated 2021-05-03 16:05:39 +02:00
miconis 4988e9f80d implementation of cross comparison for different fields, addition of clustering mechanism to collapse keys from different clustering functions on the same cluster 2021-05-03 15:37:41 +02:00
miconis ed0d5d3e1d implementation of the wf to dedup entities, addition of the module to run the wf on the cluster 2020-12-04 15:41:31 +01:00
miconis 07ab904d60 implementation of the clustering function for the suffixprefix chain 2020-07-16 18:57:55 +02:00
miconis f933fd33e0 implemented new function for clustering 2020-07-02 17:04:17 +02:00
miconis 6e9b27f37d implementation of the mechanism to truncate the string and the lists 2020-04-24 14:36:42 +02:00
miconis 5c8f6febee minor changes in comparators 2020-01-24 10:01:11 +01:00
miconis b3748b8d77 minor changes 2019-12-18 16:20:35 +01:00
miconis b21b1b8f61 implementation of new aggregation in the tree node processing 2019-12-18 16:19:36 +01:00
miconis 20fcfe6328 implementation of new aggregation in the tree node processing 2019-12-18 16:19:26 +01:00
Sandro La Bruzzo d924f28b93 fixed wrong use of jspath 2019-12-18 09:29:44 +01:00
miconis 84aaa65501 implementation of new json comparator and update of the publication configuration 2019-12-17 09:16:26 +01:00
Sandro La Bruzzo 5c01ae4c92 merged JqMapping branch into tree2 2019-12-13 11:30:02 +01:00
Sandro La Bruzzo 16c670a5d5 Improved deduplication 2019-12-05 14:14:25 +01:00
miconis 49f9beb4a8 implementation of romansmatch and re-implementation of the getNumber function. New terms in the translation map and update of the configuration 2019-11-28 16:54:44 +01:00
miconis f791730330 addition of one term to the translation maps in the configurations 2019-11-27 15:48:37 +01:00
miconis 8c0d346005 the param map has been updated: now it accepts string parameters 2019-11-21 09:37:56 +01:00
miconis ddd40540aa jarowinklernormalizedname splitted in 3 different comparators: citymatch, keywordmatch and jarowinkler. Implementation of the TreeStatistic support functions 2019-11-20 10:45:00 +01:00
miconis 0973899865 code cleaning, distribution of the classes in packages and implementation of the new configuration 2019-11-07 12:47:12 +01:00
miconis 30a873265f put the last modification of the master branch into the tree2. Addition of the configuration as parameter of the comparator. This is to allow the comparator to access it 2019-10-29 16:38:42 +01:00
miconis 5f249fd56c minor changes 2019-10-23 16:37:20 +02:00
miconis c9863debfa minor changes and configuration updates (synonym field added) 2019-10-23 16:31:45 +02:00
miconis 50b7a12b3f normalization of the term in the translation map added 2019-10-08 15:13:45 +02:00
miconis 26b383fea2 translation map moved in json configuration, support for synonyms added in the configuration, now the configuration is argument of conditions, distancealgos and clusteringfunctions 2019-10-08 14:53:52 +02:00
Claudio Atzori 74c6462b49 updated translation map and some tests 2019-09-25 10:15:13 +02:00
miconis d71dae5fd2 implementation of the conditions in tree nodes. get rid of the conditions part of the configuration 2019-08-09 15:41:49 +02:00
miconis a5c5d2f01b implementation of the decision tree. It takes place of the distance algos, necessaryConditions and sufficientConditions are still there. The model contains only path, type and name of the field. ignoreMissing is still in the model because it is used by the conditions. 2019-08-09 10:08:34 +02:00
miconis 8c867101ef addition of a fixSpecial function to address the problem with special character in organization names, addition of new terms in translation maps 2019-08-06 17:06:05 +02:00
miconis 4502b44337 addition of the BlockUtils class for meta-blocking, implementation of a new local test with edge filtering example 2019-08-06 12:09:34 +02:00
miconis a85576c27e restyling of the JaroWinklerNormalizedName comparator, now it is optimized. Addition of some translations in the translation maps, addition of a clustering based on keywords in organizations legalnames 2019-07-19 17:10:29 +02:00
miconis 3c6f8d1e44 bug fixing in the keywordsclustering class 2019-07-08 11:01:49 +02:00
miconis 15bec5e876 addition of doi normalization in PidMatch comparator, addition of keywordsclustering (clustering based on terms in the translation maps for the organizations), minor changes 2019-07-08 09:44:02 +02:00
miconis 54e4d0af04 exact match condition gives undefined if a field is missing, ignoremissing semantics changed: now performs the comparison in any case if =true, if false gives -1 in case of missing 2019-06-18 14:05:31 +02:00
miconis 7e7018c51f addition of a sparktester test, implementation of 2 different classes for testing in dnet-dedup-test module, addition of new terms in the vocabulary and change in the implementation of the JaroWinklerNormalizedName comparator 2019-04-03 09:40:14 +02:00
miconis 4bd5a9beee minor changes 2019-03-26 15:48:21 +01:00
Michele De Bonis 662448e584 update of the comparator for legalnames of organizations 2019-03-21 14:27:27 +01:00
Michele De Bonis 0735f3a822 implementation of the test classes and minor changes 2019-02-08 12:56:47 +01:00
Michele De Bonis 7a8d28991f implementation of the decision tree for the deduplication of the authors, implementation of multiple comparators to be used in a tree node and definition of the proto for person entity 2018-12-20 09:54:41 +01:00
Michele De Bonis 3d4372ced9 addition of cities check 2018-11-16 16:11:03 +01:00