Commit Graph

192 Commits

Author SHA1 Message Date
miconis d71dae5fd2 implementation of the conditions in tree nodes. get rid of the conditions part of the configuration 2019-08-09 15:41:49 +02:00
miconis a5c5d2f01b implementation of the decision tree. It takes place of the distance algos, necessaryConditions and sufficientConditions are still there. The model contains only path, type and name of the field. ignoreMissing is still in the model because it is used by the conditions. 2019-08-09 10:08:34 +02:00
miconis 8c867101ef addition of a fixSpecial function to address the problem with special character in organization names, addition of new terms in translation maps 2019-08-06 17:06:05 +02:00
miconis 4502b44337 addition of the BlockUtils class for meta-blocking, implementation of a new local test with edge filtering example 2019-08-06 12:09:34 +02:00
miconis cffb712a99 Merge branch 'master' of https://github.com/dnet-team/dnet-dedup 2019-07-19 17:10:53 +02:00
miconis a85576c27e restyling of the JaroWinklerNormalizedName comparator, now it is optimized. Addition of some translations in the translation maps, addition of a clustering based on keywords in organizations legalnames 2019-07-19 17:10:29 +02:00
Claudio Atzori 6cb846331a [maven-release-plugin] prepare for next development iteration 2019-07-08 11:12:52 +02:00
Claudio Atzori c04d2232c2 [maven-release-plugin] prepare release dnet-dedup-3.0.13 2019-07-08 11:12:45 +02:00
miconis fb5e38db26 Merge branch 'master' of https://github.com/dnet-team/dnet-dedup 2019-07-08 11:02:29 +02:00
miconis 3c6f8d1e44 bug fixing in the keywordsclustering class 2019-07-08 11:01:49 +02:00
Claudio Atzori a69022617d [maven-release-plugin] prepare for next development iteration 2019-07-08 10:11:24 +02:00
Claudio Atzori c6baeb93d4 [maven-release-plugin] prepare release dnet-dedup-3.0.12 2019-07-08 10:11:17 +02:00
miconis f5de20a508 [maven-release-plugin] rollback the release of dnet-dedup-3.0.12 2019-07-08 10:00:48 +02:00
miconis ba50aa8654 [maven-release-plugin] prepare for next development iteration 2019-07-08 09:48:10 +02:00
miconis 7065110a21 [maven-release-plugin] prepare release dnet-dedup-3.0.12 2019-07-08 09:48:03 +02:00
miconis 15bec5e876 addition of doi normalization in PidMatch comparator, addition of keywordsclustering (clustering based on terms in the translation maps for the organizations), minor changes 2019-07-08 09:44:02 +02:00
Claudio Atzori 2dcffb965f [maven-release-plugin] prepare for next development iteration 2019-06-19 10:02:39 +02:00
Claudio Atzori 85126c59f7 [maven-release-plugin] prepare release dnet-dedup-3.0.11 2019-06-19 10:02:32 +02:00
Claudio Atzori 15d7b584f3 optimized classpath resolvers 2019-06-19 10:01:35 +02:00
Claudio Atzori ff4956def9 [maven-release-plugin] prepare for next development iteration 2019-06-18 14:46:34 +02:00
Claudio Atzori eb5ce312a3 [maven-release-plugin] prepare release dnet-dedup-3.0.10 2019-06-18 14:46:27 +02:00
Claudio Atzori f2bc665403 avoid to divide by zero: in case of missing values, return undefined response 2019-06-18 14:45:15 +02:00
Claudio Atzori e3f86b92c8 cleanup 2019-06-18 14:44:42 +02:00
miconis 54e4d0af04 exact match condition gives undefined if a field is missing, ignoremissing semantics changed: now performs the comparison in any case if =true, if false gives -1 in case of missing 2019-06-18 14:05:31 +02:00
miconis e8db8f2abb implementation of the integration test, addition of document blocks to group entities after clustering 2019-05-21 16:38:26 +02:00
Claudio Atzori f7a3bdf3f8 [maven-release-plugin] prepare for next development iteration 2019-04-03 12:35:00 +02:00
Claudio Atzori 98c179c8fb [maven-release-plugin] prepare release dnet-dedup-3.0.9 2019-04-03 12:34:52 +02:00
miconis 3e61a90c8f [maven-release-plugin] rollback the release of dnet-dedup-3.0.9 2019-04-03 12:27:28 +02:00
miconis 15fb9eb883 [maven-release-plugin] prepare for next development iteration 2019-04-03 12:26:05 +02:00
miconis a1ff4daa7f [maven-release-plugin] prepare release dnet-dedup-3.0.9 2019-04-03 12:25:56 +02:00
miconis 1d29bae47c branch cities merged into master 2019-04-03 12:22:33 +02:00
miconis 7e7018c51f addition of a sparktester test, implementation of 2 different classes for testing in dnet-dedup-test module, addition of new terms in the vocabulary and change in the implementation of the JaroWinklerNormalizedName comparator 2019-04-03 09:40:14 +02:00
miconis 4bd5a9beee minor changes 2019-03-26 15:48:21 +01:00
Michele De Bonis 662448e584 update of the comparator for legalnames of organizations 2019-03-21 14:27:27 +01:00
Claudio Atzori f2394fcd9f [maven-release-plugin] prepare for next development iteration 2019-02-18 09:09:14 +01:00
Claudio Atzori 722431dde1 [maven-release-plugin] prepare release dnet-dedup-3.0.8 2019-02-18 09:09:07 +01:00
Claudio Atzori 470c4b0f20 default configuration includes configurationId 2019-02-18 09:07:23 +01:00
Claudio Atzori ccb7e83196 [maven-release-plugin] prepare for next development iteration 2019-02-17 12:56:19 +01:00
Claudio Atzori 7d8e62d4cc [maven-release-plugin] prepare release dnet-dedup-3.0.7 2019-02-17 12:56:11 +01:00
Claudio Atzori 968cd47436 replace existing attributes when loading default configuration 2019-02-17 12:48:25 +01:00
Michele De Bonis 0735f3a822 implementation of the test classes and minor changes 2019-02-08 12:56:47 +01:00
Michele De Bonis 7a8d28991f implementation of the decision tree for the deduplication of the authors, implementation of multiple comparators to be used in a tree node and definition of the proto for person entity 2018-12-20 09:54:41 +01:00
Michele De Bonis 39613dbbd6 implementation of the decisional tree, addition of the dnet-openaire-data-protos module, definition of the person proto, blockprocessor and paceconfig modified with addition of support for the tree processing 2018-12-12 16:30:03 +01:00
Claudio Atzori f1c68d8ba3 apply limits (length, size) to pace Fields 2018-11-20 10:51:38 +01:00
Claudio Atzori c5979ffe18 [maven-release-plugin] prepare for next development iteration 2018-11-19 17:41:45 +01:00
Claudio Atzori 9869dff1d2 [maven-release-plugin] prepare release dnet-dedup-3.0.6 2018-11-19 17:41:37 +01:00
Claudio Atzori c2d4cb3ba6 added new properties to FieldDef (size, length) to limit the information mapped onto each MapDocument 2018-11-19 17:37:57 +01:00
Claudio Atzori 394fcafd41 [maven-release-plugin] prepare for next development iteration 2018-11-17 09:13:16 +01:00
Claudio Atzori 397554130c [maven-release-plugin] prepare release dnet-dedup-3.0.5 2018-11-17 09:13:09 +01:00
Claudio Atzori 0dfb2ea600 added distance function fot software titles 2018-11-17 09:11:38 +01:00
Michele De Bonis 3d4372ced9 addition of cities check 2018-11-16 16:11:03 +01:00
Claudio Atzori 55a9b4f501 [maven-release-plugin] prepare for next development iteration 2018-11-16 09:18:00 +01:00
Claudio Atzori 35ab630493 [maven-release-plugin] prepare release dnet-dedup-3.0.4 2018-11-16 09:17:53 +01:00
Claudio Atzori 399e4bc80f default (empty) configuration should be aligned with the updated model 2018-11-15 16:52:56 +01:00
Claudio Atzori 59bab8dba4 less verbose logging 2018-11-13 09:07:45 +01:00
Claudio Atzori 478ad72cb8 propagate exceptions in case of serialization errors, removed configuration pretty printing, removed unused class ScoredResult 2018-11-12 15:52:18 +01:00
Claudio Atzori f7616c7a8a [maven-release-plugin] prepare for next development iteration 2018-11-12 14:23:36 +01:00
Claudio Atzori df4b871c8b [maven-release-plugin] prepare release dnet-dedup-3.0.3 2018-11-12 14:23:29 +01:00
Michele De Bonis 72a9b3139e Merge branch 'master' of https://github.com/dnet-team/dnet-dedup 2018-11-12 14:11:26 +01:00
Michele De Bonis b5062f5429 configuration file updated, addition of condition on domain 2018-11-12 14:11:15 +01:00
Claudio Atzori 2a509b18fa [maven-release-plugin] prepare for next development iteration 2018-11-12 12:46:50 +01:00
Claudio Atzori e247218987 [maven-release-plugin] prepare release dnet-dedup-3.0.2 2018-11-12 12:46:42 +01:00
Claudio Atzori b7bc7f0401 getting rid of spark libs from dnet-pace-core 2018-11-12 12:46:06 +01:00
Claudio Atzori 3dacba37ea [maven-release-plugin] prepare for next development iteration 2018-11-12 11:40:42 +01:00
Claudio Atzori 8cc2517f5d [maven-release-plugin] prepare release dnet-dedup-3.0.1 2018-11-12 11:40:34 +01:00
Claudio Atzori 851ae5eec3 [maven-release-plugin] rollback the release of dnet-dedup-3.0.1 2018-11-12 11:39:07 +01:00
Claudio Atzori f283d58a6e [maven-release-plugin] prepare release dnet-dedup-3.0.1 2018-11-12 11:38:52 +01:00
Claudio Atzori 6d09041288 [maven-release-plugin] rollback the release of dnet-dedup-3.0.1 2018-11-12 11:28:28 +01:00
Claudio Atzori 46cee13596 [maven-release-plugin] prepare for next development iteration 2018-11-12 11:24:06 +01:00
Claudio Atzori e1c69ad24e [maven-release-plugin] prepare release dnet-dedup-3.0.1 2018-11-12 11:23:57 +01:00
Michele De Bonis b247a86e69 configuration files changed: dedupRun instead of run, assertion updated in tests 2018-11-06 11:02:00 +01:00
Michele De Bonis 4c8485d0bb deleted useless imports 2018-11-06 09:48:22 +01:00
Michele De Bonis 748189af10 implementation of JaroWinklerNormalizedName, addition of various stopwords in different languages and configuration test 2018-11-05 17:22:59 +01:00
Claudio Atzori e296f7a81c added DiffPatchMatch utility. Resumed commented tests! 2018-10-31 10:49:11 +01:00
Michele De Bonis dc41b76643 serialization test added. useless getter methods ignored by json serialization 2018-10-29 16:16:11 +01:00
Michele De Bonis ea36007d1f DedupConf parsed using Jackson library 2018-10-29 11:13:55 +01:00
Michele De Bonis 8b4762bf54 implementation of the toString methonds changed: from Gson to Jackson 2018-10-26 14:55:59 +02:00
Michele De Bonis 3cf3dc1934 modification in the initialization of clustering functions, distance algos and conditions. 2018-10-25 15:15:40 +02:00
Michele De Bonis 1cbbc3f15a update in the discovery of clustering, conditions and distance functions (annotated with custom annotations) 2018-10-24 12:09:41 +02:00
Claudio Atzori 4d379c2227 revised PidMatch implementation, cleanup 2018-10-20 08:38:19 +02:00
Claudio Atzori 3197f26691 [maven-release-plugin] prepare for next development iteration 2018-10-18 12:17:34 +02:00
Claudio Atzori 63815be2d6 [maven-release-plugin] prepare release dnet-dedup-3.0.0 2018-10-18 12:17:27 +02:00
Claudio Atzori ed14476b06 [maven-release-plugin] rollback the release of dnet-dedup-3.0.0 2018-10-18 12:13:03 +02:00
Claudio Atzori 82d5dce114 [maven-release-plugin] prepare release dnet-dedup-3.0.0 2018-10-18 12:12:45 +02:00
Claudio Atzori 4f29124607 [maven-release-plugin] rollback the release of dnet-dedup-3.0.0 2018-10-18 12:00:45 +02:00
Claudio Atzori 5a48937ae1 [maven-release-plugin] prepare for next development iteration 2018-10-18 11:58:43 +02:00
Claudio Atzori 5aec80345f [maven-release-plugin] prepare release dnet-dedup-3.0.0 2018-10-18 11:58:36 +02:00
Claudio Atzori 1b46966383 updated maven project structure 2018-10-18 11:56:26 +02:00
Michele De Bonis 72ebf7c0f3 update of the spark test 2018-10-18 10:12:44 +02:00
Sandro La Bruzzo 1bb5c26e6d Added FSpark Implementation of dedup 2018-10-11 15:19:20 +02:00
Sandro La Bruzzo d1c73bcf90 Added First Implementation of Spark Test 2018-10-02 17:07:17 +02:00
Sandro La Bruzzo 476c3d7b07 added d-net pace core module and ignored target folder 2018-10-02 10:37:54 +02:00