Commit Graph

84 Commits

Author SHA1 Message Date
Claudio Atzori c3d67f709a adjusted dedup configuration for result entities: using new wordssuffixprefix clustering function, removed ngrampairs, adjusted queueMaxSize (800) and slidingWindowSize (80) 2020-07-02 17:35:22 +02:00
Claudio Atzori 7b288a94cb code formatting 2020-05-26 09:54:13 +02:00
miconis da1e5cf557 implementation of the result title merge. main title with higher trust, distinct between the others 2020-05-25 18:02:57 +02:00
Claudio Atzori 7181807e64 code formatting 2020-05-23 09:51:48 +02:00
miconis 0fd0c7d725 reimplementation of the sim between two authors. now it takes into account both name and surname. threshold incremented to 1.0 if the name is too short 2020-05-22 17:24:57 +02:00
Claudio Atzori 3cf2796ac6 code formatting 2020-05-22 12:34:00 +02:00
miconis 8bbd1d0501 reimplementation of the author merging in deduprecord creation. implementation of the test class. 2020-05-21 11:52:14 +02:00
Claudio Atzori 42f1a2bf94 bumped project version to 1.2.0-SNAPSHOT 2020-05-11 10:05:57 +02:00
Claudio Atzori fd519df616 new rels produced by dedup workflow must be unique 2020-05-08 19:00:38 +02:00
miconis 3df703f67d mergerels added to propagate relations 2020-05-04 12:08:12 +02:00
miconis 62e467eb0c assertion numbers updated to fit the new implementation of the pace-core 2020-04-28 11:46:23 +02:00
Claudio Atzori 6f5b899038 reformatted code according to the updated style descriptor 2020-04-28 11:23:29 +02:00
Claudio Atzori a0bdbacdae switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin 2020-04-27 14:52:31 +02:00
Claudio Atzori 7a3f8085f7 switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin 2020-04-27 14:45:40 +02:00
Claudio Atzori 278fc9d276 code formatting 2020-04-23 18:51:38 +02:00
miconis 8d258c85ff spark dedup test fixed, sample for dataset and orp added, test implemented 2020-04-23 18:16:20 +02:00
Claudio Atzori 91e72a6944 Dataset based implementation for SparkCreateDedupRecord phase, fixed datasource entity dump supplementing dedup unit tests 2020-04-21 12:06:08 +02:00
miconis 5c9ef08a8e spark dedup test fixed 2020-04-21 10:19:04 +02:00
Claudio Atzori eb8a020859 fixed behaviour of DedupRecordFactory 2020-04-20 18:44:06 +02:00
miconis 1102e32462 SparkDedupTest updated and organization dump fixed 2020-04-20 16:49:01 +02:00
Claudio Atzori ad7a131b18 introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin, applied to each java class in the project 2020-04-18 12:42:58 +02:00
miconis 6a089ec287 minor changes 2020-04-16 12:15:38 +02:00
miconis cd4d9a148f creating temporary directories in dedup test 2020-04-16 12:13:26 +02:00
miconis 0be2e72be5 further implementation of tests for the deduplication of each entity. publication dump added, empty entity files created 2020-04-08 18:02:30 +02:00
miconis 56fbe689f0 implementation of the tests for each spark action 2020-04-06 16:30:31 +02:00
miconis 53fd624c34 implemented test for sparkcreatesimrels 2020-04-03 18:32:25 +02:00
miconis a61763d149 structure for sparksimrel changed to be compliant with mockito testing 2020-04-02 18:37:53 +02:00
miconis bfa5bc74df minor changes 2020-04-01 19:05:48 +02:00
miconis 9802bcb9fe dedup testing 2020-04-01 18:48:31 +02:00
Claudio Atzori 673e744649 moved openaire specific implementations under dedicated package eu.dnetlib.dhp.oa 2020-03-27 10:42:17 +01:00
Sandro La Bruzzo e71e001b58 commented test that doesn't work 2020-03-26 14:15:21 +01:00
Claudio Atzori cd7dc3e1ae dhp-dedup-openaire workflow tests upgraded to junit5 2020-03-25 18:04:23 +01:00
Michele Artini ebe45003d9 fixed some junit packages 2020-03-25 16:45:03 +01:00
Claudio Atzori 71ae7dd272 renamed module dnet-dedup to dnet-dedup-openaire 2020-03-25 15:57:09 +01:00