Miriam Baglioni
332258d199
split the classes related to the communities dump and to the whole graph dump
2020-07-24 17:21:48 +02:00
Claudio Atzori
56bbfdc65d
introduced parameter 'numParitions', driving the hive DB table data partitioning. Currently specified only for table 'project'
2020-07-23 08:54:10 +02:00
Sandro La Bruzzo
9ab594ccf6
fixed test
2020-07-21 10:36:21 +02:00
Claudio Atzori
ebf60020ac
map results as OPRs in case of missing //CobjCategory/@type and the vocabulary dnet:result_typologies doesn't resolve the super type
2020-07-20 19:01:10 +02:00
Miriam Baglioni
355d7e426e
added dumo for project - not finished
2020-07-20 18:54:43 +02:00
Miriam Baglioni
a2f01e5259
added getter and setter
2020-07-20 18:54:17 +02:00
Miriam Baglioni
40bbe94f7c
merge with master fork
2020-07-20 18:10:03 +02:00
Miriam Baglioni
2a15494b16
merge upstream
2020-07-20 18:05:01 +02:00
Miriam Baglioni
23160b4d29
realignment of the workflow classes with the changes in the structure of the module
2020-07-20 18:04:30 +02:00
Miriam Baglioni
b904e0699a
-
2020-07-20 18:02:53 +02:00
Miriam Baglioni
3aab7680f6
changed the test results
2020-07-20 18:00:43 +02:00
Miriam Baglioni
cde0300801
moved from projects to project
2020-07-20 17:57:35 +02:00
Miriam Baglioni
5076e4f320
changed test to comply with the modifications
2020-07-20 17:55:18 +02:00
Miriam Baglioni
08dbd99455
changed to dump the whole results graph by usign classes already implemented for communities. Added class to dump also organization
2020-07-20 17:54:28 +02:00
Miriam Baglioni
e47ea9349c
extended some types by adding provenance as the couple (provenance, trust) and moved some classes to be used by the complete graph dump also
2020-07-20 17:46:27 +02:00
Claudio Atzori
32f5e466e3
imports cleanup
2020-07-20 17:42:58 +02:00
Claudio Atzori
54ac583923
code formatting
2020-07-20 17:37:08 +02:00
Claudio Atzori
124e7ce19c
in case of missing attribute //dr:CobjCategory/@type the resulttype is derived by looking up the vocabulary dnet:result_typologies with the 1st instance type available
2020-07-20 17:33:37 +02:00
Claudio Atzori
050dda223d
Merge pull request 'removed duplicated fields' ( #25 ) from unique_field_in_lists into master
...
Looks good as a temporary workaround. I agree the model could seamlessly make the distinct operation by using HashSets instead of Linked (or Array) Lists.
The task to update the model in such a way is added on #9#issuecomment-1583
Thanks!
2020-07-20 12:12:50 +02:00
Claudio Atzori
e0c4cf6f7b
added parameter to drive the graph merge strategy: priority (BETA|PROD)
2020-07-20 10:48:01 +02:00
Claudio Atzori
94ccdb4852
Merge branch 'master' into merge_graph
2020-07-20 10:14:55 +02:00
Claudio Atzori
0937c9998f
Merge branch 'deduptesting'
2020-07-20 10:00:20 +02:00
Claudio Atzori
105176105c
updated dnet-pace-core dependency to version 4.0.4 to include the latest clustering function
2020-07-20 09:59:47 +02:00
Claudio Atzori
de72b1c859
cleanup
2020-07-20 09:59:11 +02:00
Michele Artini
331a3cbdd0
fixed originalId
2020-07-20 09:50:29 +02:00
Michele Artini
c59c5369b1
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-18 09:40:54 +02:00
Michele Artini
346a1d2b5a
update eventId generator
2020-07-18 09:40:36 +02:00
Sandro La Bruzzo
9116d75b3e
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-17 18:01:30 +02:00
Miriam Baglioni
d7d84c8217
-
2020-07-17 14:03:23 +02:00
Miriam Baglioni
47c7122773
changed priority from beta to production
2020-07-17 12:56:35 +02:00
Michele Artini
442f30930c
removed duplicated fields
2020-07-17 12:25:36 +02:00
Claudio Atzori
1781609508
code formatting
2020-07-16 19:06:56 +02:00
Claudio Atzori
db8b90a156
renamed CORE -> BETA
2020-07-16 19:05:13 +02:00
Miriam Baglioni
44e1c40c42
merge upstream
2020-07-16 18:49:38 +02:00
Claudio Atzori
878f2b931c
Merge branch 'master' into merge_graph
2020-07-16 16:34:24 +02:00
Claudio Atzori
cc5d13da85
introduced parameter shouldIndex (true|false)
2020-07-16 13:46:39 +02:00
Claudio Atzori
b098cc3cbe
avoid repeating identical values for fields: source, description
2020-07-16 13:45:53 +02:00
Claudio Atzori
805de4eca1
fix: filter the blocks with size = 1
2020-07-16 10:11:32 +02:00
Claudio Atzori
4b9fb2ffb8
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-07-15 11:26:04 +02:00
Claudio Atzori
5033c25587
code formatting
2020-07-15 11:26:00 +02:00
Claudio Atzori
b90389bac4
code formatting
2020-07-15 11:24:48 +02:00
Claudio Atzori
4e6f46e8fa
filter blocks with one record only
2020-07-15 11:22:20 +02:00
Michele Artini
262c29463e
relations with multiple datasources
2020-07-15 09:18:40 +02:00
Claudio Atzori
7d6e269b40
reverted CreateRelatedEntitiesJob_phase1 to its previous state
2020-07-13 22:54:04 +02:00
Claudio Atzori
8e97598eb4
avoid to NPE in case of null instances
2020-07-13 20:46:14 +02:00
Claudio Atzori
06def0c0cb
SparkBlockStats allows to repartition the input rdd via the numPartitions workflow parameter
2020-07-13 20:09:06 +02:00
miconis
b52c246aed
merge done
2020-07-13 19:57:02 +02:00
miconis
b8a45041fd
minor changes
2020-07-13 19:53:18 +02:00
Claudio Atzori
66f9f6d323
adjusted parameters for the dedup stats workflow
2020-07-13 19:26:46 +02:00
miconis
03ecfa5ebd
implementation of the test class for the new block stats spark action
2020-07-13 18:48:23 +02:00