Michele Artini
|
346a1d2b5a
|
update eventId generator
|
2020-07-18 09:40:36 +02:00 |
Sandro La Bruzzo
|
9116d75b3e
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-07-17 18:01:30 +02:00 |
Miriam Baglioni
|
d7d84c8217
|
-
|
2020-07-17 14:03:23 +02:00 |
Miriam Baglioni
|
47c7122773
|
changed priority from beta to production
|
2020-07-17 12:56:35 +02:00 |
Michele Artini
|
442f30930c
|
removed duplicated fields
|
2020-07-17 12:25:36 +02:00 |
Michele Artini
|
3adedd0a68
|
trust truncated to 3 decimals
|
2020-07-17 11:58:11 +02:00 |
Claudio Atzori
|
1781609508
|
code formatting
|
2020-07-16 19:06:56 +02:00 |
Claudio Atzori
|
db8b90a156
|
renamed CORE -> BETA
|
2020-07-16 19:05:13 +02:00 |
miconis
|
a5a3ea24f8
|
[maven-release-plugin] prepare for next development iteration
|
2020-07-16 18:59:25 +02:00 |
miconis
|
840fe8f4d3
|
[maven-release-plugin] prepare release dnet-dedup-4.0.4
|
2020-07-16 18:59:22 +02:00 |
miconis
|
07ab904d60
|
implementation of the clustering function for the suffixprefix chain
|
2020-07-16 18:57:55 +02:00 |
Miriam Baglioni
|
44e1c40c42
|
merge upstream
|
2020-07-16 18:49:38 +02:00 |
Claudio Atzori
|
878f2b931c
|
Merge branch 'master' into merge_graph
|
2020-07-16 16:34:24 +02:00 |
Claudio Atzori
|
cc5d13da85
|
introduced parameter shouldIndex (true|false)
|
2020-07-16 13:46:39 +02:00 |
Claudio Atzori
|
b098cc3cbe
|
avoid repeating identical values for fields: source, description
|
2020-07-16 13:45:53 +02:00 |
Claudio Atzori
|
805de4eca1
|
fix: filter the blocks with size = 1
|
2020-07-16 10:11:32 +02:00 |
Claudio Atzori
|
eaf7defe0c
|
[maven-release-plugin] prepare for next development iteration
|
2020-07-15 17:57:09 +02:00 |
Claudio Atzori
|
ff2c8eba12
|
[maven-release-plugin] prepare release dnet-dedup-4.0.3
|
2020-07-15 17:57:04 +02:00 |
Claudio Atzori
|
7cc3742a26
|
removed maven release.property
|
2020-07-15 17:52:27 +02:00 |
Claudio Atzori
|
14611ea450
|
reverted to 4.0.3-SNAPSHOT
|
2020-07-15 17:37:36 +02:00 |
Claudio Atzori
|
9f20f23870
|
Revert "wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files"
This reverts commit 51d91fa520 .
|
2020-07-15 17:35:56 +02:00 |
Claudio Atzori
|
9efcd8e245
|
Revert "reverted to 4.0.3-SNAPSHOT"
This reverts commit ec97983ce1 .
|
2020-07-15 17:28:37 +02:00 |
Claudio Atzori
|
ba493f9ab8
|
[maven-release-plugin] rollback the release of dnet-dedup-4.0.3
|
2020-07-15 17:24:43 +02:00 |
Claudio Atzori
|
6c98d4c436
|
[maven-release-plugin] prepare release dnet-dedup-4.0.3
|
2020-07-15 17:24:25 +02:00 |
Claudio Atzori
|
ec97983ce1
|
reverted to 4.0.3-SNAPSHOT
|
2020-07-15 17:20:12 +02:00 |
Claudio Atzori
|
51d91fa520
|
wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files
|
2020-07-15 17:13:45 +02:00 |
Claudio Atzori
|
b79ea97107
|
Revert "wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files"
This reverts commit d2861950ac .
|
2020-07-15 17:11:46 +02:00 |
Claudio Atzori
|
92aadbfc7b
|
[maven-release-plugin] prepare release dnet-dedup-4.0.3
|
2020-07-15 17:04:20 +02:00 |
Claudio Atzori
|
d2861950ac
|
wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files
|
2020-07-15 16:49:47 +02:00 |
Claudio Atzori
|
4b9fb2ffb8
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-07-15 11:26:04 +02:00 |
Claudio Atzori
|
5033c25587
|
code formatting
|
2020-07-15 11:26:00 +02:00 |
Claudio Atzori
|
b90389bac4
|
code formatting
|
2020-07-15 11:24:48 +02:00 |
Claudio Atzori
|
4e6f46e8fa
|
filter blocks with one record only
|
2020-07-15 11:22:20 +02:00 |
Michele Artini
|
262c29463e
|
relations with multiple datasources
|
2020-07-15 09:18:40 +02:00 |
Claudio Atzori
|
7d6e269b40
|
reverted CreateRelatedEntitiesJob_phase1 to its previous state
|
2020-07-13 22:54:04 +02:00 |
Claudio Atzori
|
8e97598eb4
|
avoid to NPE in case of null instances
|
2020-07-13 20:46:14 +02:00 |
Claudio Atzori
|
06def0c0cb
|
SparkBlockStats allows to repartition the input rdd via the numPartitions workflow parameter
|
2020-07-13 20:09:06 +02:00 |
miconis
|
b52c246aed
|
merge done
|
2020-07-13 19:57:02 +02:00 |
miconis
|
b8a45041fd
|
minor changes
|
2020-07-13 19:53:18 +02:00 |
Claudio Atzori
|
66f9f6d323
|
adjusted parameters for the dedup stats workflow
|
2020-07-13 19:26:46 +02:00 |
miconis
|
03ecfa5ebd
|
implementation of the test class for the new block stats spark action
|
2020-07-13 18:48:23 +02:00 |
miconis
|
10e08ccf45
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-07-13 18:22:45 +02:00 |
miconis
|
9258e4f095
|
implementation of a new workflow to compute statistics on the blocks
|
2020-07-13 18:22:34 +02:00 |
Claudio Atzori
|
c6f6fb0f28
|
code formatting
|
2020-07-13 16:46:13 +02:00 |
Claudio Atzori
|
8d2102d7d2
|
Merge branch 'deduptesting'
|
2020-07-13 16:32:43 +02:00 |
Claudio Atzori
|
344a90c2e6
|
updated assertions in propagateRelationTest
|
2020-07-13 16:32:04 +02:00 |
Claudio Atzori
|
1143f426aa
|
WIP SparkCreateMergeRels distinct relations
|
2020-07-13 16:13:36 +02:00 |
Claudio Atzori
|
8c67938ad0
|
configurable number of partitions used in the SparkCreateSimRels phase
|
2020-07-13 16:07:07 +02:00 |
Claudio Atzori
|
c73168b18e
|
Merge branch 'deduptesting' of https://code-repo.d4science.org/D-Net/dnet-hadoop into deduptesting
|
2020-07-13 15:54:58 +02:00 |
Claudio Atzori
|
c8284bab06
|
WIP SparkCreateMergeRels distinct relations
|
2020-07-13 15:54:51 +02:00 |