Commit Graph

5115 Commits

Author SHA1 Message Date
luosolo c10770cd3e added module that allows to collect data into HDFS 2019-03-12 15:40:55 +01:00
Claudio Atzori f2394fcd9f [maven-release-plugin] prepare for next development iteration 2019-02-18 09:09:14 +01:00
Claudio Atzori 722431dde1 [maven-release-plugin] prepare release dnet-dedup-3.0.8 2019-02-18 09:09:07 +01:00
Claudio Atzori 470c4b0f20 default configuration includes configurationId 2019-02-18 09:07:23 +01:00
Claudio Atzori ccb7e83196 [maven-release-plugin] prepare for next development iteration 2019-02-17 12:56:19 +01:00
Claudio Atzori 7d8e62d4cc [maven-release-plugin] prepare release dnet-dedup-3.0.7 2019-02-17 12:56:11 +01:00
Claudio Atzori 968cd47436 replace existing attributes when loading default configuration 2019-02-17 12:48:25 +01:00
Michele De Bonis 0735f3a822 implementation of the test classes and minor changes 2019-02-08 12:56:47 +01:00
Michele De Bonis 7a8d28991f implementation of the decision tree for the deduplication of the authors, implementation of multiple comparators to be used in a tree node and definition of the proto for person entity 2018-12-20 09:54:41 +01:00
Michele De Bonis 39613dbbd6 implementation of the decisional tree, addition of the dnet-openaire-data-protos module, definition of the person proto, blockprocessor and paceconfig modified with addition of support for the tree processing 2018-12-12 16:30:03 +01:00
Claudio Atzori f1c68d8ba3 apply limits (length, size) to pace Fields 2018-11-20 10:51:38 +01:00
Claudio Atzori c5979ffe18 [maven-release-plugin] prepare for next development iteration 2018-11-19 17:41:45 +01:00
Claudio Atzori 9869dff1d2 [maven-release-plugin] prepare release dnet-dedup-3.0.6 2018-11-19 17:41:37 +01:00
Claudio Atzori c2d4cb3ba6 added new properties to FieldDef (size, length) to limit the information mapped onto each MapDocument 2018-11-19 17:37:57 +01:00
Claudio Atzori 394fcafd41 [maven-release-plugin] prepare for next development iteration 2018-11-17 09:13:16 +01:00
Claudio Atzori 397554130c [maven-release-plugin] prepare release dnet-dedup-3.0.5 2018-11-17 09:13:09 +01:00
Claudio Atzori 0dfb2ea600 added distance function fot software titles 2018-11-17 09:11:38 +01:00
Michele De Bonis 3d4372ced9 addition of cities check 2018-11-16 16:11:03 +01:00
Claudio Atzori 55a9b4f501 [maven-release-plugin] prepare for next development iteration 2018-11-16 09:18:00 +01:00
Claudio Atzori 35ab630493 [maven-release-plugin] prepare release dnet-dedup-3.0.4 2018-11-16 09:17:53 +01:00
Claudio Atzori 399e4bc80f default (empty) configuration should be aligned with the updated model 2018-11-15 16:52:56 +01:00
Claudio Atzori 59bab8dba4 less verbose logging 2018-11-13 09:07:45 +01:00
Claudio Atzori 478ad72cb8 propagate exceptions in case of serialization errors, removed configuration pretty printing, removed unused class ScoredResult 2018-11-12 15:52:18 +01:00
Claudio Atzori f7616c7a8a [maven-release-plugin] prepare for next development iteration 2018-11-12 14:23:36 +01:00
Claudio Atzori df4b871c8b [maven-release-plugin] prepare release dnet-dedup-3.0.3 2018-11-12 14:23:29 +01:00
Michele De Bonis 72a9b3139e Merge branch 'master' of https://github.com/dnet-team/dnet-dedup 2018-11-12 14:11:26 +01:00
Michele De Bonis b5062f5429 configuration file updated, addition of condition on domain 2018-11-12 14:11:15 +01:00
Claudio Atzori 2a509b18fa [maven-release-plugin] prepare for next development iteration 2018-11-12 12:46:50 +01:00
Claudio Atzori e247218987 [maven-release-plugin] prepare release dnet-dedup-3.0.2 2018-11-12 12:46:42 +01:00
Claudio Atzori b7bc7f0401 getting rid of spark libs from dnet-pace-core 2018-11-12 12:46:06 +01:00
Claudio Atzori 3dacba37ea [maven-release-plugin] prepare for next development iteration 2018-11-12 11:40:42 +01:00
Claudio Atzori 8cc2517f5d [maven-release-plugin] prepare release dnet-dedup-3.0.1 2018-11-12 11:40:34 +01:00
Claudio Atzori 851ae5eec3 [maven-release-plugin] rollback the release of dnet-dedup-3.0.1 2018-11-12 11:39:07 +01:00
Claudio Atzori f283d58a6e [maven-release-plugin] prepare release dnet-dedup-3.0.1 2018-11-12 11:38:52 +01:00
Claudio Atzori 6d09041288 [maven-release-plugin] rollback the release of dnet-dedup-3.0.1 2018-11-12 11:28:28 +01:00
Claudio Atzori 46cee13596 [maven-release-plugin] prepare for next development iteration 2018-11-12 11:24:06 +01:00
Claudio Atzori e1c69ad24e [maven-release-plugin] prepare release dnet-dedup-3.0.1 2018-11-12 11:23:57 +01:00
Michele De Bonis b247a86e69 configuration files changed: dedupRun instead of run, assertion updated in tests 2018-11-06 11:02:00 +01:00
Michele De Bonis 4c8485d0bb deleted useless imports 2018-11-06 09:48:22 +01:00
Michele De Bonis 748189af10 implementation of JaroWinklerNormalizedName, addition of various stopwords in different languages and configuration test 2018-11-05 17:22:59 +01:00
Claudio Atzori e296f7a81c added DiffPatchMatch utility. Resumed commented tests! 2018-10-31 10:49:11 +01:00
Michele De Bonis dc41b76643 serialization test added. useless getter methods ignored by json serialization 2018-10-29 16:16:11 +01:00
Michele De Bonis ea36007d1f DedupConf parsed using Jackson library 2018-10-29 11:13:55 +01:00
Michele De Bonis 8b4762bf54 implementation of the toString methonds changed: from Gson to Jackson 2018-10-26 14:55:59 +02:00
Michele De Bonis 3cf3dc1934 modification in the initialization of clustering functions, distance algos and conditions. 2018-10-25 15:15:40 +02:00
Michele De Bonis 1cbbc3f15a update in the discovery of clustering, conditions and distance functions (annotated with custom annotations) 2018-10-24 12:09:41 +02:00
Claudio Atzori 4d379c2227 revised PidMatch implementation, cleanup 2018-10-20 08:38:19 +02:00
Claudio Atzori 3197f26691 [maven-release-plugin] prepare for next development iteration 2018-10-18 12:17:34 +02:00
Claudio Atzori 63815be2d6 [maven-release-plugin] prepare release dnet-dedup-3.0.0 2018-10-18 12:17:27 +02:00
Claudio Atzori ed14476b06 [maven-release-plugin] rollback the release of dnet-dedup-3.0.0 2018-10-18 12:13:03 +02:00