Commit Graph

40 Commits

Author SHA1 Message Date
miconis 15bec5e876 addition of doi normalization in PidMatch comparator, addition of keywordsclustering (clustering based on terms in the translation maps for the organizations), minor changes 2019-07-08 09:44:02 +02:00
Claudio Atzori 15d7b584f3 optimized classpath resolvers 2019-06-19 10:01:35 +02:00
Claudio Atzori f2bc665403 avoid to divide by zero: in case of missing values, return undefined response 2019-06-18 14:45:15 +02:00
Claudio Atzori e3f86b92c8 cleanup 2019-06-18 14:44:42 +02:00
miconis 54e4d0af04 exact match condition gives undefined if a field is missing, ignoremissing semantics changed: now performs the comparison in any case if =true, if false gives -1 in case of missing 2019-06-18 14:05:31 +02:00
miconis e8db8f2abb implementation of the integration test, addition of document blocks to group entities after clustering 2019-05-21 16:38:26 +02:00
miconis 1d29bae47c branch cities merged into master 2019-04-03 12:22:33 +02:00
miconis 7e7018c51f addition of a sparktester test, implementation of 2 different classes for testing in dnet-dedup-test module, addition of new terms in the vocabulary and change in the implementation of the JaroWinklerNormalizedName comparator 2019-04-03 09:40:14 +02:00
miconis 4bd5a9beee minor changes 2019-03-26 15:48:21 +01:00
Michele De Bonis 662448e584 update of the comparator for legalnames of organizations 2019-03-21 14:27:27 +01:00
Claudio Atzori 470c4b0f20 default configuration includes configurationId 2019-02-18 09:07:23 +01:00
Claudio Atzori 968cd47436 replace existing attributes when loading default configuration 2019-02-17 12:48:25 +01:00
Michele De Bonis 0735f3a822 implementation of the test classes and minor changes 2019-02-08 12:56:47 +01:00
Michele De Bonis 7a8d28991f implementation of the decision tree for the deduplication of the authors, implementation of multiple comparators to be used in a tree node and definition of the proto for person entity 2018-12-20 09:54:41 +01:00
Michele De Bonis 39613dbbd6 implementation of the decisional tree, addition of the dnet-openaire-data-protos module, definition of the person proto, blockprocessor and paceconfig modified with addition of support for the tree processing 2018-12-12 16:30:03 +01:00
Claudio Atzori f1c68d8ba3 apply limits (length, size) to pace Fields 2018-11-20 10:51:38 +01:00
Claudio Atzori c2d4cb3ba6 added new properties to FieldDef (size, length) to limit the information mapped onto each MapDocument 2018-11-19 17:37:57 +01:00
Claudio Atzori 0dfb2ea600 added distance function fot software titles 2018-11-17 09:11:38 +01:00
Michele De Bonis 3d4372ced9 addition of cities check 2018-11-16 16:11:03 +01:00
Claudio Atzori 399e4bc80f default (empty) configuration should be aligned with the updated model 2018-11-15 16:52:56 +01:00
Claudio Atzori 59bab8dba4 less verbose logging 2018-11-13 09:07:45 +01:00
Claudio Atzori 478ad72cb8 propagate exceptions in case of serialization errors, removed configuration pretty printing, removed unused class ScoredResult 2018-11-12 15:52:18 +01:00
Michele De Bonis 72a9b3139e Merge branch 'master' of https://github.com/dnet-team/dnet-dedup 2018-11-12 14:11:26 +01:00
Michele De Bonis b5062f5429 configuration file updated, addition of condition on domain 2018-11-12 14:11:15 +01:00
Claudio Atzori b7bc7f0401 getting rid of spark libs from dnet-pace-core 2018-11-12 12:46:06 +01:00
Michele De Bonis b247a86e69 configuration files changed: dedupRun instead of run, assertion updated in tests 2018-11-06 11:02:00 +01:00
Michele De Bonis 4c8485d0bb deleted useless imports 2018-11-06 09:48:22 +01:00
Michele De Bonis 748189af10 implementation of JaroWinklerNormalizedName, addition of various stopwords in different languages and configuration test 2018-11-05 17:22:59 +01:00
Claudio Atzori e296f7a81c added DiffPatchMatch utility. Resumed commented tests! 2018-10-31 10:49:11 +01:00
Michele De Bonis dc41b76643 serialization test added. useless getter methods ignored by json serialization 2018-10-29 16:16:11 +01:00
Michele De Bonis ea36007d1f DedupConf parsed using Jackson library 2018-10-29 11:13:55 +01:00
Michele De Bonis 8b4762bf54 implementation of the toString methonds changed: from Gson to Jackson 2018-10-26 14:55:59 +02:00
Michele De Bonis 3cf3dc1934 modification in the initialization of clustering functions, distance algos and conditions. 2018-10-25 15:15:40 +02:00
Michele De Bonis 1cbbc3f15a update in the discovery of clustering, conditions and distance functions (annotated with custom annotations) 2018-10-24 12:09:41 +02:00
Claudio Atzori 4d379c2227 revised PidMatch implementation, cleanup 2018-10-20 08:38:19 +02:00
Claudio Atzori 1b46966383 updated maven project structure 2018-10-18 11:56:26 +02:00
Michele De Bonis 72ebf7c0f3 update of the spark test 2018-10-18 10:12:44 +02:00
Sandro La Bruzzo 1bb5c26e6d Added FSpark Implementation of dedup 2018-10-11 15:19:20 +02:00
Sandro La Bruzzo d1c73bcf90 Added First Implementation of Spark Test 2018-10-02 17:07:17 +02:00
Sandro La Bruzzo 476c3d7b07 added d-net pace core module and ignored target folder 2018-10-02 10:37:54 +02:00