1
0
Fork 0
Commit Graph

4642 Commits

Author SHA1 Message Date
Michele Artini 346a1d2b5a update eventId generator 2020-07-18 09:40:36 +02:00
Sandro La Bruzzo 9116d75b3e Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-17 18:01:30 +02:00
Miriam Baglioni d7d84c8217 - 2020-07-17 14:03:23 +02:00
Miriam Baglioni 47c7122773 changed priority from beta to production 2020-07-17 12:56:35 +02:00
Michele Artini 442f30930c removed duplicated fields 2020-07-17 12:25:36 +02:00
Michele Artini 3adedd0a68 trust truncated to 3 decimals 2020-07-17 11:58:11 +02:00
Claudio Atzori 1781609508 code formatting 2020-07-16 19:06:56 +02:00
Claudio Atzori db8b90a156 renamed CORE -> BETA 2020-07-16 19:05:13 +02:00
miconis a5a3ea24f8 [maven-release-plugin] prepare for next development iteration 2020-07-16 18:59:25 +02:00
miconis 840fe8f4d3 [maven-release-plugin] prepare release dnet-dedup-4.0.4 2020-07-16 18:59:22 +02:00
miconis 07ab904d60 implementation of the clustering function for the suffixprefix chain 2020-07-16 18:57:55 +02:00
Miriam Baglioni 44e1c40c42 merge upstream 2020-07-16 18:49:38 +02:00
Claudio Atzori 878f2b931c Merge branch 'master' into merge_graph 2020-07-16 16:34:24 +02:00
Claudio Atzori cc5d13da85 introduced parameter shouldIndex (true|false) 2020-07-16 13:46:39 +02:00
Claudio Atzori b098cc3cbe avoid repeating identical values for fields: source, description 2020-07-16 13:45:53 +02:00
Claudio Atzori 805de4eca1 fix: filter the blocks with size = 1 2020-07-16 10:11:32 +02:00
Claudio Atzori eaf7defe0c [maven-release-plugin] prepare for next development iteration 2020-07-15 17:57:09 +02:00
Claudio Atzori ff2c8eba12 [maven-release-plugin] prepare release dnet-dedup-4.0.3 2020-07-15 17:57:04 +02:00
Claudio Atzori 7cc3742a26 removed maven release.property 2020-07-15 17:52:27 +02:00
Claudio Atzori 14611ea450 reverted to 4.0.3-SNAPSHOT 2020-07-15 17:37:36 +02:00
Claudio Atzori 9f20f23870 Revert "wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files"
This reverts commit 51d91fa520.
2020-07-15 17:35:56 +02:00
Claudio Atzori 9efcd8e245 Revert "reverted to 4.0.3-SNAPSHOT"
This reverts commit ec97983ce1.
2020-07-15 17:28:37 +02:00
Claudio Atzori ba493f9ab8 [maven-release-plugin] rollback the release of dnet-dedup-4.0.3 2020-07-15 17:24:43 +02:00
Claudio Atzori 6c98d4c436 [maven-release-plugin] prepare release dnet-dedup-4.0.3 2020-07-15 17:24:25 +02:00
Claudio Atzori ec97983ce1 reverted to 4.0.3-SNAPSHOT 2020-07-15 17:20:12 +02:00
Claudio Atzori 51d91fa520 wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files 2020-07-15 17:13:45 +02:00
Claudio Atzori b79ea97107 Revert "wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files"
This reverts commit d2861950ac.
2020-07-15 17:11:46 +02:00
Claudio Atzori 92aadbfc7b [maven-release-plugin] prepare release dnet-dedup-4.0.3 2020-07-15 17:04:20 +02:00
Claudio Atzori d2861950ac wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files 2020-07-15 16:49:47 +02:00
Claudio Atzori 4b9fb2ffb8 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-07-15 11:26:04 +02:00
Claudio Atzori 5033c25587 code formatting 2020-07-15 11:26:00 +02:00
Claudio Atzori b90389bac4 code formatting 2020-07-15 11:24:48 +02:00
Claudio Atzori 4e6f46e8fa filter blocks with one record only 2020-07-15 11:22:20 +02:00
Michele Artini 262c29463e relations with multiple datasources 2020-07-15 09:18:40 +02:00
Claudio Atzori 7d6e269b40 reverted CreateRelatedEntitiesJob_phase1 to its previous state 2020-07-13 22:54:04 +02:00
Claudio Atzori 8e97598eb4 avoid to NPE in case of null instances 2020-07-13 20:46:14 +02:00
Claudio Atzori 06def0c0cb SparkBlockStats allows to repartition the input rdd via the numPartitions workflow parameter 2020-07-13 20:09:06 +02:00
miconis b52c246aed merge done 2020-07-13 19:57:02 +02:00
miconis b8a45041fd minor changes 2020-07-13 19:53:18 +02:00
Claudio Atzori 66f9f6d323 adjusted parameters for the dedup stats workflow 2020-07-13 19:26:46 +02:00
miconis 03ecfa5ebd implementation of the test class for the new block stats spark action 2020-07-13 18:48:23 +02:00
miconis 10e08ccf45 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-13 18:22:45 +02:00
miconis 9258e4f095 implementation of a new workflow to compute statistics on the blocks 2020-07-13 18:22:34 +02:00
Claudio Atzori c6f6fb0f28 code formatting 2020-07-13 16:46:13 +02:00
Claudio Atzori 8d2102d7d2 Merge branch 'deduptesting' 2020-07-13 16:32:43 +02:00
Claudio Atzori 344a90c2e6 updated assertions in propagateRelationTest 2020-07-13 16:32:04 +02:00
Claudio Atzori 1143f426aa WIP SparkCreateMergeRels distinct relations 2020-07-13 16:13:36 +02:00
Claudio Atzori 8c67938ad0 configurable number of partitions used in the SparkCreateSimRels phase 2020-07-13 16:07:07 +02:00
Claudio Atzori c73168b18e Merge branch 'deduptesting' of https://code-repo.d4science.org/D-Net/dnet-hadoop into deduptesting 2020-07-13 15:54:58 +02:00
Claudio Atzori c8284bab06 WIP SparkCreateMergeRels distinct relations 2020-07-13 15:54:51 +02:00