Commit Graph

28 Commits

Author SHA1 Message Date
miconis 3ff5be675b put the last modification of the master branch into the tree2. Addition of the configuration as parameter of the comparator. This is to allow the comparator to access it 2019-10-29 16:38:42 +01:00
miconis 2ffaa235a2 minor changes and configuration updates (synonym field added) 2019-10-23 16:31:45 +02:00
miconis 03c1b334d5 translation map moved in json configuration, support for synonyms added in the configuration, now the configuration is argument of conditions, distancealgos and clusteringfunctions 2019-10-08 14:53:52 +02:00
miconis 93b332cbe5 translation map updated 2019-09-25 09:53:06 +02:00
miconis 4bcf353a72 implementation of the conditions in tree nodes. get rid of the conditions part of the configuration 2019-08-09 15:41:49 +02:00
miconis 72b14ec36b implementation of the decision tree. It takes place of the distance algos, necessaryConditions and sufficientConditions are still there. The model contains only path, type and name of the field. ignoreMissing is still in the model because it is used by the conditions. 2019-08-09 10:08:34 +02:00
miconis f0b4c4cbd4 addition of a fixSpecial function to address the problem with special character in organization names, addition of new terms in translation maps 2019-08-06 17:06:05 +02:00
miconis 85070ce3fe addition of the BlockUtils class for meta-blocking, implementation of a new local test with edge filtering example 2019-08-06 12:09:34 +02:00
miconis 84974dcdfa restyling of the JaroWinklerNormalizedName comparator, now it is optimized. Addition of some translations in the translation maps, addition of a clustering based on keywords in organizations legalnames 2019-07-19 17:10:29 +02:00
miconis 2b866cfbeb addition of doi normalization in PidMatch comparator, addition of keywordsclustering (clustering based on terms in the translation maps for the organizations), minor changes 2019-07-08 09:44:02 +02:00
miconis e7d170d0eb exact match condition gives undefined if a field is missing, ignoremissing semantics changed: now performs the comparison in any case if =true, if false gives -1 in case of missing 2019-06-18 14:05:31 +02:00
miconis a5526f6254 implementation of the integration test, addition of document blocks to group entities after clustering 2019-05-21 16:38:26 +02:00
miconis 3018031621 branch cities merged into master 2019-04-03 12:22:33 +02:00
Michele De Bonis f87790f701 update of the comparator for legalnames of organizations 2019-03-21 14:27:27 +01:00
Michele De Bonis b02aa08833 implementation of the test classes and minor changes 2019-02-08 12:56:47 +01:00
Michele De Bonis 9ff83d6567 implementation of the decision tree for the deduplication of the authors, implementation of multiple comparators to be used in a tree node and definition of the proto for person entity 2018-12-20 09:54:41 +01:00
Michele De Bonis 0bd20c565a implementation of the decisional tree, addition of the dnet-openaire-data-protos module, definition of the person proto, blockprocessor and paceconfig modified with addition of support for the tree processing 2018-12-12 16:30:03 +01:00
Claudio Atzori d72960f8b9 apply limits (length, size) to pace Fields 2018-11-20 10:51:38 +01:00
Claudio Atzori e5a77f0a53 added new properties to FieldDef (size, length) to limit the information mapped onto each MapDocument 2018-11-19 17:37:57 +01:00
Michele De Bonis 23c5a16525 addition of cities check 2018-11-16 16:11:03 +01:00
Michele De Bonis 33387a3532 configuration file updated, addition of condition on domain 2018-11-12 14:11:15 +01:00
Michele De Bonis c84b5005e6 configuration files changed: dedupRun instead of run, assertion updated in tests 2018-11-06 11:02:00 +01:00
Michele De Bonis 5d81c04d0b deleted useless imports 2018-11-06 09:48:22 +01:00
Michele De Bonis 4337e83950 implementation of JaroWinklerNormalizedName, addition of various stopwords in different languages and configuration test 2018-11-05 17:22:59 +01:00
Claudio Atzori 9f513352fb added DiffPatchMatch utility. Resumed commented tests! 2018-10-31 10:49:11 +01:00
Sandro La Bruzzo 674ea3909f Added First Spark Implementation of dedup 2018-10-12 12:53:47 +02:00
Sandro La Bruzzo 67e5f9858b Added FSpark Implementation of dedup 2018-10-11 15:19:20 +02:00
Sandro La Bruzzo d0edb7b773 Added First Implementation of Spark Test 2018-10-02 17:07:17 +02:00