Miriam Baglioni
|
ad24c8478f
|
added missing parameter
|
2020-03-24 16:19:59 +01:00 |
Miriam Baglioni
|
46094a3eec
|
bug fixing for implementation with dataset
|
2020-03-24 16:19:36 +01:00 |
Claudio Atzori
|
51ff68db66
|
Merge branch 'dedupTest' of https://code-repo.d4science.org/D-Net/dnet-hadoop into dedupTest
|
2020-03-24 11:18:19 +01:00 |
Claudio Atzori
|
1e869e7bed
|
using method available from currently used library
|
2020-03-24 11:17:44 +01:00 |
miconis
|
f0d72b76a8
|
package structure fixed
|
2020-03-24 10:51:40 +01:00 |
Claudio Atzori
|
aaedbb1b8b
|
WIP: dedup workflow, stage 2
|
2020-03-24 09:59:28 +01:00 |
Michele Artini
|
e3760c7f39
|
fix a bug with organization countries
|
2020-03-24 08:43:56 +01:00 |
Claudio Atzori
|
8b0ba3d76a
|
posprocessing script correctly run as hive2 action
|
2020-03-23 17:40:39 +01:00 |
miconis
|
93e2291291
|
minor changes
|
2020-03-23 17:17:56 +01:00 |
miconis
|
f7890a90df
|
implementation of the mechanism that checks the existance of a mergerel file
|
2020-03-23 17:13:30 +01:00 |
Miriam Baglioni
|
ad712f2d79
|
added the needed variables in the config and read the variables in the workflow
|
2020-03-23 17:11:36 +01:00 |
Miriam Baglioni
|
f1e9fe9752
|
changed implementation using dataset and query on hive
|
2020-03-23 17:11:00 +01:00 |
Miriam Baglioni
|
f09cd1e911
|
removed unuseful variable in the configuration
|
2020-03-23 17:10:14 +01:00 |
Miriam Baglioni
|
9418e3d4fa
|
read dataset from files instead of using hive tables
|
2020-03-23 17:09:27 +01:00 |
Miriam Baglioni
|
a7bf037306
|
remove unused class
|
2020-03-23 14:36:43 +01:00 |
Miriam Baglioni
|
8ab8b6b0bf
|
minor
|
2020-03-23 14:35:23 +01:00 |
Miriam Baglioni
|
30d58fd98c
|
change the configuration of the workflow
|
2020-03-23 14:32:49 +01:00 |
Miriam Baglioni
|
a440152b46
|
refactoring
|
2020-03-23 14:30:56 +01:00 |
Miriam Baglioni
|
47561f3597
|
changed the implementation from rdd to dataset got from sql queries (on hive)
|
2020-03-23 11:58:32 +01:00 |
miconis
|
c20e179f5a
|
structure of the workflows updated
|
2020-03-23 11:43:49 +01:00 |
Claudio Atzori
|
658d40ccbe
|
WIP trying to use hive2 actions
|
2020-03-23 11:14:54 +01:00 |
Claudio Atzori
|
ecb64e4998
|
Merge branch 'migration_wfs_regular_all_steps'
|
2020-03-23 08:57:01 +01:00 |
Michele Artini
|
15160032bd
|
fixed a bug setting some organization fields
|
2020-03-23 08:39:14 +01:00 |
Claudio Atzori
|
a4c52661a0
|
WIP: fixing dedup workflows
|
2020-03-20 19:17:24 +01:00 |
Claudio Atzori
|
6cb0a9bff0
|
dedup wf directory structure aligned with project commons
|
2020-03-20 16:48:14 +01:00 |
miconis
|
e16e644faf
|
implementation of the workflow for entity update and for relations update
|
2020-03-20 13:01:56 +01:00 |
przemek
|
638b78f96a
|
Merge remote-tracking branch 'origin/master' into przemyslawjacewicz_actionmanager_impl_prototype
|
2020-03-19 15:12:56 +01:00 |
miconis
|
6d879e2ee1
|
integration of the new AtomicAction class
|
2020-03-19 15:10:42 +01:00 |
miconis
|
6e0fb8efa0
|
minor changes
|
2020-03-19 15:08:03 +01:00 |
miconis
|
4e82a24af2
|
minor changes and implementation of the create connected components action
|
2020-03-19 15:01:07 +01:00 |
Claudio Atzori
|
36236dd1c1
|
action migration workflow produces eu.dnetlib.dhp.schema.action.AtomicAction(s)
|
2020-03-19 14:00:38 +01:00 |
Claudio Atzori
|
a0ab15a64c
|
need to stick on using guava:11.0.2 as it is the version used by the hadoop components (oozie client for sure). The last version (28.2-jre) breaks the oozie workflow submission
|
2020-03-19 13:58:58 +01:00 |
Sandro La Bruzzo
|
0594b92a6d
|
implemented relation with dataset
|
2020-03-19 11:11:07 +01:00 |
Claudio Atzori
|
1850a02ae4
|
added simpler, AtomicAction replacement, based on the dhp.Oaf model
|
2020-03-19 10:44:16 +01:00 |
miconis
|
679b5869e5
|
implementation of the lookup procedure to take dedup conf from the resource profiles
|
2020-03-18 17:41:56 +01:00 |
Claudio Atzori
|
abe8fb69a2
|
added global properties, moved postprocessing script inside the oozie_app directory
|
2020-03-18 15:43:54 +01:00 |
miconis
|
f32eae5ce9
|
implementation of the spark action for the simrel creation
|
2020-03-18 14:27:49 +01:00 |
Claudio Atzori
|
c7e0730720
|
compress the output produced by migration steps 1 and 2
|
2020-03-18 09:34:57 +01:00 |
Claudio Atzori
|
2f11e37602
|
fixed expansion of path variables
|
2020-03-17 19:41:07 +01:00 |
Claudio Atzori
|
2795b0b096
|
no need to mkdir a the all_entities file
|
2020-03-17 17:22:14 +01:00 |
Claudio Atzori
|
19746ad308
|
when reuseContent, reset ${workingPath}/all_entities
|
2020-03-17 17:17:06 +01:00 |
Claudio Atzori
|
2f0c85eeb3
|
updated parameters for regular_all_steps worfklow, introduced flag 'reuseContent'
|
2020-03-17 17:04:58 +01:00 |
Miriam Baglioni
|
67ea3cf3ed
|
changed the way to read the file with info on resource or relation. From sequenceFile to textFile
|
2020-03-17 16:32:05 +01:00 |
Miriam Baglioni
|
b4652d018c
|
moved the creation of new dir to common class.
|
2020-03-17 16:31:24 +01:00 |
Claudio Atzori
|
b8290b5851
|
updated parameters for regular_all_steps worfklow
|
2020-03-17 15:45:30 +01:00 |
Claudio Atzori
|
4706f24ec5
|
updated parameters for regular_all_steps worfklow
|
2020-03-17 15:23:54 +01:00 |
Claudio Atzori
|
aeb01fa353
|
reading from newline delimited json textfiles instead of sequence files
|
2020-03-17 11:57:24 +01:00 |
Miriam Baglioni
|
92f4e0001d
|
Merge branch 'bulktag'
|
2020-03-16 13:33:27 +01:00 |
Miriam Baglioni
|
ab08a37024
|
Merge remote-tracking branch 'upstream/master'
|
2020-03-16 12:45:23 +01:00 |
Claudio Atzori
|
af835f2f98
|
when migrating actionsets from DM cluster, populate the AtomicAction.targetValue when empty (dedup similarities)
|
2020-03-15 18:07:59 +01:00 |