Commit Graph

43 Commits

Author SHA1 Message Date
miconis 451114418d implementation of the instance type comparator and its tests 2021-11-04 15:20:57 +01:00
miconis 5a52aed8e1 dedup test implementation & graph drawing tools 2021-09-13 14:53:19 +02:00
miconis fad803bd46 implementation of cross comparison for different fields, addition of clustering mechanism to collapse keys from different clustering functions on the same cluster 2021-05-03 15:37:41 +02:00
miconis e65526848a implementation of the wf to dedup entities, addition of the module to run the wf on the cluster 2020-12-04 15:41:31 +01:00
miconis 7188648bdc implementation of the clustering function for the suffixprefix chain 2020-07-16 18:57:55 +02:00
miconis 33eadb7c9c implemented new function for clustering 2020-07-02 17:04:17 +02:00
miconis 7bc00a3f5f implementation of the mechanism to truncate the string and the lists 2020-04-24 14:36:42 +02:00
miconis eeeb374480 minor changes in comparators 2020-01-24 10:01:11 +01:00
miconis 72ca3bb9ba implementation of new aggregation in the tree node processing 2019-12-18 16:19:36 +01:00
Sandro La Bruzzo 492049b8bc fixed wrong use of jspath 2019-12-18 09:29:44 +01:00
miconis 159cb2a493 implementation of new json comparator and update of the publication configuration 2019-12-17 09:16:26 +01:00
Sandro La Bruzzo d09193a094 merged JqMapping branch into tree2 2019-12-13 11:30:02 +01:00
Sandro La Bruzzo bd79999fb8 Improved deduplication 2019-12-05 14:14:25 +01:00
miconis 5676e625bd implementation of romansmatch and re-implementation of the getNumber function. New terms in the translation map and update of the configuration 2019-11-28 16:54:44 +01:00
miconis 493b385b5b addition of one term to the translation maps in the configurations 2019-11-27 15:48:37 +01:00
miconis 40808200f0 the param map has been updated: now it accepts string parameters 2019-11-21 09:37:56 +01:00
miconis 79e62787cf jarowinklernormalizedname splitted in 3 different comparators: citymatch, keywordmatch and jarowinkler. Implementation of the TreeStatistic support functions 2019-11-20 10:45:00 +01:00
miconis 5b3adb3e65 code cleaning, distribution of the classes in packages and implementation of the new configuration 2019-11-07 12:47:12 +01:00
miconis 3ff5be675b put the last modification of the master branch into the tree2. Addition of the configuration as parameter of the comparator. This is to allow the comparator to access it 2019-10-29 16:38:42 +01:00
miconis 2ffaa235a2 minor changes and configuration updates (synonym field added) 2019-10-23 16:31:45 +02:00
miconis 7998f37ce1 normalization of the term in the translation map added 2019-10-08 15:13:45 +02:00
miconis 03c1b334d5 translation map moved in json configuration, support for synonyms added in the configuration, now the configuration is argument of conditions, distancealgos and clusteringfunctions 2019-10-08 14:53:52 +02:00
Claudio Atzori fda7f1ce93 updated translation map and some tests 2019-09-25 10:15:13 +02:00
miconis 4bcf353a72 implementation of the conditions in tree nodes. get rid of the conditions part of the configuration 2019-08-09 15:41:49 +02:00
miconis f0b4c4cbd4 addition of a fixSpecial function to address the problem with special character in organization names, addition of new terms in translation maps 2019-08-06 17:06:05 +02:00
miconis 85070ce3fe addition of the BlockUtils class for meta-blocking, implementation of a new local test with edge filtering example 2019-08-06 12:09:34 +02:00
miconis 84974dcdfa restyling of the JaroWinklerNormalizedName comparator, now it is optimized. Addition of some translations in the translation maps, addition of a clustering based on keywords in organizations legalnames 2019-07-19 17:10:29 +02:00
miconis 0509ea8d1e bug fixing in the keywordsclustering class 2019-07-08 11:01:49 +02:00
miconis 2b866cfbeb addition of doi normalization in PidMatch comparator, addition of keywordsclustering (clustering based on terms in the translation maps for the organizations), minor changes 2019-07-08 09:44:02 +02:00
miconis e7d170d0eb exact match condition gives undefined if a field is missing, ignoremissing semantics changed: now performs the comparison in any case if =true, if false gives -1 in case of missing 2019-06-18 14:05:31 +02:00
miconis f738c2b641 addition of a sparktester test, implementation of 2 different classes for testing in dnet-dedup-test module, addition of new terms in the vocabulary and change in the implementation of the JaroWinklerNormalizedName comparator 2019-04-03 09:40:14 +02:00
miconis e9894ed089 minor changes 2019-03-26 15:48:21 +01:00
Michele De Bonis f87790f701 update of the comparator for legalnames of organizations 2019-03-21 14:27:27 +01:00
Michele De Bonis b02aa08833 implementation of the test classes and minor changes 2019-02-08 12:56:47 +01:00
Michele De Bonis 9ff83d6567 implementation of the decision tree for the deduplication of the authors, implementation of multiple comparators to be used in a tree node and definition of the proto for person entity 2018-12-20 09:54:41 +01:00
Michele De Bonis 23c5a16525 addition of cities check 2018-11-16 16:11:03 +01:00
Michele De Bonis 5d81c04d0b deleted useless imports 2018-11-06 09:48:22 +01:00
Michele De Bonis 4337e83950 implementation of JaroWinklerNormalizedName, addition of various stopwords in different languages and configuration test 2018-11-05 17:22:59 +01:00
Michele De Bonis 7c59c3ebf0 serialization test added. useless getter methods ignored by json serialization 2018-10-29 16:16:11 +01:00
Michele De Bonis 0d03030694 DedupConf parsed using Jackson library 2018-10-29 11:13:55 +01:00
Michele De Bonis 0375f1cec9 implementation of the toString methonds changed: from Gson to Jackson 2018-10-26 14:55:59 +02:00
Michele De Bonis d059bf68b8 modification in the initialization of clustering functions, distance algos and conditions. 2018-10-25 15:15:40 +02:00
Sandro La Bruzzo a043d0c716 added d-net pace core module and ignored target folder 2018-10-02 10:37:54 +02:00