Commit Graph

68 Commits

Author SHA1 Message Date
Michele De Bonis 9ff83d6567 implementation of the decision tree for the deduplication of the authors, implementation of multiple comparators to be used in a tree node and definition of the proto for person entity 2018-12-20 09:54:41 +01:00
Michele De Bonis 0bd20c565a implementation of the decisional tree, addition of the dnet-openaire-data-protos module, definition of the person proto, blockprocessor and paceconfig modified with addition of support for the tree processing 2018-12-12 16:30:03 +01:00
Claudio Atzori d72960f8b9 apply limits (length, size) to pace Fields 2018-11-20 10:51:38 +01:00
Claudio Atzori e5a77f0a53 added new properties to FieldDef (size, length) to limit the information mapped onto each MapDocument 2018-11-19 17:37:57 +01:00
Michele De Bonis 23c5a16525 addition of cities check 2018-11-16 16:11:03 +01:00
Michele De Bonis 33387a3532 configuration file updated, addition of condition on domain 2018-11-12 14:11:15 +01:00
Michele De Bonis c84b5005e6 configuration files changed: dedupRun instead of run, assertion updated in tests 2018-11-06 11:02:00 +01:00
Michele De Bonis 5d81c04d0b deleted useless imports 2018-11-06 09:48:22 +01:00
Michele De Bonis 4337e83950 implementation of JaroWinklerNormalizedName, addition of various stopwords in different languages and configuration test 2018-11-05 17:22:59 +01:00
Claudio Atzori 9f513352fb added DiffPatchMatch utility. Resumed commented tests! 2018-10-31 10:49:11 +01:00
Michele De Bonis 1d678ddc9c update in the discovery of clustering, conditions and distance functions (annotated with custom annotations) 2018-10-24 12:09:41 +02:00
Claudio Atzori bc4505e0e6 revised PidMatch implementation, cleanup 2018-10-20 08:38:19 +02:00
Claudio Atzori 0bab8cf704 tests and relative resources migrated from openaire-mapping-utils 2018-10-18 15:30:51 +02:00
Claudio Atzori f27655e96c updated maven project structure 2018-10-18 11:56:26 +02:00
Michele De Bonis 1f0eeaf7ab update of the spark test 2018-10-18 10:12:44 +02:00
Sandro La Bruzzo 674ea3909f Added First Spark Implementation of dedup 2018-10-12 12:53:47 +02:00
Sandro La Bruzzo 67e5f9858b Added FSpark Implementation of dedup 2018-10-11 15:19:20 +02:00
Sandro La Bruzzo d0edb7b773 Added First Implementation of Spark Test 2018-10-02 17:07:17 +02:00