Giambattista Bloisi
d2d173773e
Precompile blacklists patterns before evaluating clustering criteria
...
Enable Junit 5 tests in maven builds
Make path comparisons platform-independent
Read String resource files assuming they are encoded in UTF-8
Fix a few test conditions
2023-06-16 09:41:11 +02:00
Michele De Bonis
7e2e7dcdcd
implementation of the support for authors deduplication: cosinesimilarity comparator and double array json parser
2023-04-17 11:06:27 +02:00
miconis
fb2eed9f0e
implementation of the java version of the graph processor
2022-04-19 15:29:29 +02:00
miconis
e168d95ec0
bug fix in the authormatch comparator, implementation of tests
2022-01-13 11:58:28 +01:00
miconis
5e8757a457
implementation of new comparators for publication dedup configuration update
2021-12-27 17:35:02 +01:00
miconis
5a52aed8e1
dedup test implementation & graph drawing tools
2021-09-13 14:53:19 +02:00
miconis
e65526848a
implementation of the wf to dedup entities, addition of the module to run the wf on the cluster
2020-12-04 15:41:31 +01:00
miconis
5021e5048f
fixed error in the treeprocessor. it used th=-1 as default value, now it use th=1
2020-09-29 12:01:25 +02:00
miconis
b7a27ace62
clusteringtester removed in order to move it to dnet-dedup-openaire
2020-07-13 11:15:09 +02:00
miconis
12621b1c45
implementation of a class to test the clustering functions
2020-07-12 10:13:54 +02:00
miconis
b3ec4194da
implementation of the test for the dedup and addition of new support classes
2020-06-11 10:46:46 +02:00
miconis
a73bc6cddc
minor changes
2020-03-20 18:02:52 +01:00
miconis
cc86591fad
minor changes
2020-01-20 16:45:16 +01:00
miconis
6a27fb14a8
update in the implementation of the tree: addition of new logic aggregations and statistics
2020-01-14 11:42:43 +02:00
Sandro La Bruzzo
d09193a094
merged JqMapping branch into tree2
2019-12-13 11:30:02 +01:00
Sandro La Bruzzo
bd79999fb8
Improved deduplication
2019-12-05 14:14:25 +01:00
miconis
79e62787cf
jarowinklernormalizedname splitted in 3 different comparators: citymatch, keywordmatch and jarowinkler. Implementation of the TreeStatistic support functions
2019-11-20 10:45:00 +01:00
miconis
5b3adb3e65
code cleaning, distribution of the classes in packages and implementation of the new configuration
2019-11-07 12:47:12 +01:00
miconis
3ff5be675b
put the last modification of the master branch into the tree2. Addition of the configuration as parameter of the comparator. This is to allow the comparator to access it
2019-10-29 16:38:42 +01:00
miconis
2ffaa235a2
minor changes and configuration updates (synonym field added)
2019-10-23 16:31:45 +02:00
miconis
03c1b334d5
translation map moved in json configuration, support for synonyms added in the configuration, now the configuration is argument of conditions, distancealgos and clusteringfunctions
2019-10-08 14:53:52 +02:00
miconis
93b332cbe5
translation map updated
2019-09-25 09:53:06 +02:00
miconis
4bcf353a72
implementation of the conditions in tree nodes. get rid of the conditions part of the configuration
2019-08-09 15:41:49 +02:00
miconis
72b14ec36b
implementation of the decision tree. It takes place of the distance algos, necessaryConditions and sufficientConditions are still there. The model contains only path, type and name of the field. ignoreMissing is still in the model because it is used by the conditions.
2019-08-09 10:08:34 +02:00
miconis
f0b4c4cbd4
addition of a fixSpecial function to address the problem with special character in organization names, addition of new terms in translation maps
2019-08-06 17:06:05 +02:00
miconis
85070ce3fe
addition of the BlockUtils class for meta-blocking, implementation of a new local test with edge filtering example
2019-08-06 12:09:34 +02:00
miconis
84974dcdfa
restyling of the JaroWinklerNormalizedName comparator, now it is optimized. Addition of some translations in the translation maps, addition of a clustering based on keywords in organizations legalnames
2019-07-19 17:10:29 +02:00
miconis
2b866cfbeb
addition of doi normalization in PidMatch comparator, addition of keywordsclustering (clustering based on terms in the translation maps for the organizations), minor changes
2019-07-08 09:44:02 +02:00
miconis
e7d170d0eb
exact match condition gives undefined if a field is missing, ignoremissing semantics changed: now performs the comparison in any case if =true, if false gives -1 in case of missing
2019-06-18 14:05:31 +02:00
miconis
a5526f6254
implementation of the integration test, addition of document blocks to group entities after clustering
2019-05-21 16:38:26 +02:00
miconis
3018031621
branch cities merged into master
2019-04-03 12:22:33 +02:00
miconis
14c3afba23
clean up
2019-04-03 11:35:25 +02:00
miconis
f738c2b641
addition of a sparktester test, implementation of 2 different classes for testing in dnet-dedup-test module, addition of new terms in the vocabulary and change in the implementation of the JaroWinklerNormalizedName comparator
2019-04-03 09:40:14 +02:00
miconis
1dbb765343
minor changes
2019-03-26 15:40:40 +01:00
Michele De Bonis
f87790f701
update of the comparator for legalnames of organizations
2019-03-21 14:27:27 +01:00
Michele De Bonis
b02aa08833
implementation of the test classes and minor changes
2019-02-08 12:56:47 +01:00
Michele De Bonis
9ff83d6567
implementation of the decision tree for the deduplication of the authors, implementation of multiple comparators to be used in a tree node and definition of the proto for person entity
2018-12-20 09:54:41 +01:00
Michele De Bonis
0bd20c565a
implementation of the decisional tree, addition of the dnet-openaire-data-protos module, definition of the person proto, blockprocessor and paceconfig modified with addition of support for the tree processing
2018-12-12 16:30:03 +01:00
Claudio Atzori
d72960f8b9
apply limits (length, size) to pace Fields
2018-11-20 10:51:38 +01:00
Claudio Atzori
e5a77f0a53
added new properties to FieldDef (size, length) to limit the information mapped onto each MapDocument
2018-11-19 17:37:57 +01:00
Michele De Bonis
23c5a16525
addition of cities check
2018-11-16 16:11:03 +01:00
Michele De Bonis
33387a3532
configuration file updated, addition of condition on domain
2018-11-12 14:11:15 +01:00
Michele De Bonis
c84b5005e6
configuration files changed: dedupRun instead of run, assertion updated in tests
2018-11-06 11:02:00 +01:00
Michele De Bonis
5d81c04d0b
deleted useless imports
2018-11-06 09:48:22 +01:00
Michele De Bonis
4337e83950
implementation of JaroWinklerNormalizedName, addition of various stopwords in different languages and configuration test
2018-11-05 17:22:59 +01:00
Claudio Atzori
9f513352fb
added DiffPatchMatch utility. Resumed commented tests!
2018-10-31 10:49:11 +01:00
Michele De Bonis
1d678ddc9c
update in the discovery of clustering, conditions and distance functions (annotated with custom annotations)
2018-10-24 12:09:41 +02:00
Claudio Atzori
0bab8cf704
tests and relative resources migrated from openaire-mapping-utils
2018-10-18 15:30:51 +02:00
Claudio Atzori
f27655e96c
updated maven project structure
2018-10-18 11:56:26 +02:00
Michele De Bonis
1f0eeaf7ab
update of the spark test
2018-10-18 10:12:44 +02:00