Claudio Atzori
|
cbec51e922
|
avoid to divide by zero: in case of missing values, return undefined response
|
2019-06-18 14:45:15 +02:00 |
Claudio Atzori
|
7063d286e0
|
cleanup
|
2019-06-18 14:44:42 +02:00 |
miconis
|
e7d170d0eb
|
exact match condition gives undefined if a field is missing, ignoremissing semantics changed: now performs the comparison in any case if =true, if false gives -1 in case of missing
|
2019-06-18 14:05:31 +02:00 |
miconis
|
a5526f6254
|
implementation of the integration test, addition of document blocks to group entities after clustering
|
2019-05-21 16:38:26 +02:00 |
miconis
|
3018031621
|
branch cities merged into master
|
2019-04-03 12:22:33 +02:00 |
miconis
|
f738c2b641
|
addition of a sparktester test, implementation of 2 different classes for testing in dnet-dedup-test module, addition of new terms in the vocabulary and change in the implementation of the JaroWinklerNormalizedName comparator
|
2019-04-03 09:40:14 +02:00 |
Michele De Bonis
|
f87790f701
|
update of the comparator for legalnames of organizations
|
2019-03-21 14:27:27 +01:00 |
Claudio Atzori
|
cabc2d21c2
|
replace existing attributes when loading default configuration
|
2019-02-17 12:48:25 +01:00 |
Michele De Bonis
|
b02aa08833
|
implementation of the test classes and minor changes
|
2019-02-08 12:56:47 +01:00 |
Michele De Bonis
|
9ff83d6567
|
implementation of the decision tree for the deduplication of the authors, implementation of multiple comparators to be used in a tree node and definition of the proto for person entity
|
2018-12-20 09:54:41 +01:00 |
Michele De Bonis
|
0bd20c565a
|
implementation of the decisional tree, addition of the dnet-openaire-data-protos module, definition of the person proto, blockprocessor and paceconfig modified with addition of support for the tree processing
|
2018-12-12 16:30:03 +01:00 |
Claudio Atzori
|
d72960f8b9
|
apply limits (length, size) to pace Fields
|
2018-11-20 10:51:38 +01:00 |
Claudio Atzori
|
e5a77f0a53
|
added new properties to FieldDef (size, length) to limit the information mapped onto each MapDocument
|
2018-11-19 17:37:57 +01:00 |
Claudio Atzori
|
a0e0df1cfd
|
added distance function fot software titles
|
2018-11-17 09:11:38 +01:00 |
Michele De Bonis
|
23c5a16525
|
addition of cities check
|
2018-11-16 16:11:03 +01:00 |
Claudio Atzori
|
fa657a05e6
|
default (empty) configuration should be aligned with the updated model
|
2018-11-15 16:52:56 +01:00 |
Claudio Atzori
|
e4ae7d426a
|
less verbose logging
|
2018-11-13 09:07:45 +01:00 |
Claudio Atzori
|
9a14b0ecbc
|
propagate exceptions in case of serialization errors, removed configuration pretty printing, removed unused class ScoredResult
|
2018-11-12 15:52:18 +01:00 |
Michele De Bonis
|
3a517a6551
|
Merge branch 'master' of https://github.com/dnet-team/dnet-dedup
|
2018-11-12 14:11:26 +01:00 |
Michele De Bonis
|
33387a3532
|
configuration file updated, addition of condition on domain
|
2018-11-12 14:11:15 +01:00 |
Claudio Atzori
|
925a437597
|
getting rid of spark libs from dnet-pace-core
|
2018-11-12 12:46:06 +01:00 |
Michele De Bonis
|
c84b5005e6
|
configuration files changed: dedupRun instead of run, assertion updated in tests
|
2018-11-06 11:02:00 +01:00 |
Michele De Bonis
|
4337e83950
|
implementation of JaroWinklerNormalizedName, addition of various stopwords in different languages and configuration test
|
2018-11-05 17:22:59 +01:00 |
Claudio Atzori
|
9f513352fb
|
added DiffPatchMatch utility. Resumed commented tests!
|
2018-10-31 10:49:11 +01:00 |
Michele De Bonis
|
7c59c3ebf0
|
serialization test added. useless getter methods ignored by json serialization
|
2018-10-29 16:16:11 +01:00 |
Michele De Bonis
|
0d03030694
|
DedupConf parsed using Jackson library
|
2018-10-29 11:13:55 +01:00 |
Michele De Bonis
|
0375f1cec9
|
implementation of the toString methonds changed: from Gson to Jackson
|
2018-10-26 14:55:59 +02:00 |
Michele De Bonis
|
d059bf68b8
|
modification in the initialization of clustering functions, distance algos and conditions.
|
2018-10-25 15:15:40 +02:00 |
Michele De Bonis
|
1d678ddc9c
|
update in the discovery of clustering, conditions and distance functions (annotated with custom annotations)
|
2018-10-24 12:09:41 +02:00 |
Claudio Atzori
|
bc4505e0e6
|
revised PidMatch implementation, cleanup
|
2018-10-20 08:38:19 +02:00 |
Claudio Atzori
|
f27655e96c
|
updated maven project structure
|
2018-10-18 11:56:26 +02:00 |
Michele De Bonis
|
1f0eeaf7ab
|
update of the spark test
|
2018-10-18 10:12:44 +02:00 |
Sandro La Bruzzo
|
67e5f9858b
|
Added FSpark Implementation of dedup
|
2018-10-11 15:19:20 +02:00 |
Sandro La Bruzzo
|
d0edb7b773
|
Added First Implementation of Spark Test
|
2018-10-02 17:07:17 +02:00 |
Sandro La Bruzzo
|
a043d0c716
|
added d-net pace core module and ignored target folder
|
2018-10-02 10:37:54 +02:00 |