Miriam Baglioni
|
6989fb9c8a
|
changed the project test according to the newly introduced join with the db project codes
|
2020-05-28 23:53:24 +02:00 |
Miriam Baglioni
|
782984d8e5
|
added needed parameter
|
2020-05-28 23:52:41 +02:00 |
Miriam Baglioni
|
01f7876595
|
fix issue with flatMap - the return type must not be null
|
2020-05-28 23:50:32 +02:00 |
Miriam Baglioni
|
773735f870
|
added the path to the file containing the projects code from the db
|
2020-05-28 17:30:45 +02:00 |
Miriam Baglioni
|
6a15067a64
|
added one step in the workflow
|
2020-05-28 17:30:09 +02:00 |
Miriam Baglioni
|
5309a99a70
|
modified the PrepareProjects to consider those in the db
|
2020-05-28 17:29:53 +02:00 |
Miriam Baglioni
|
b737ed8236
|
added part to read projects from the openaire db to filter out those in the csv file that are not in the db
|
2020-05-28 17:29:21 +02:00 |
Miriam Baglioni
|
35b7279147
|
changed test because data are saved as SequenceFile now, and because of the group by the umber of produced update decrease
|
2020-05-28 10:26:12 +02:00 |
Miriam Baglioni
|
df44db686a
|
refactoring
|
2020-05-28 10:07:00 +02:00 |
Miriam Baglioni
|
87b07f4af8
|
removed unused variables
|
2020-05-28 10:05:43 +02:00 |
Miriam Baglioni
|
1060977272
|
added fs actions to remove and the create the workingDir
|
2020-05-28 10:04:36 +02:00 |
Miriam Baglioni
|
96d1a3c431
|
deleted the file were to store the csv files
|
2020-05-28 10:04:10 +02:00 |
Miriam Baglioni
|
669c05c771
|
added groupBy before creating Actions
|
2020-05-28 10:00:45 +02:00 |
Miriam Baglioni
|
1855453434
|
changed the outputdir of the last step
|
2020-05-27 17:59:36 +02:00 |
Miriam Baglioni
|
ac8025f469
|
-
|
2020-05-22 15:29:41 +02:00 |
Miriam Baglioni
|
50ad83b97f
|
-
|
2020-05-22 15:27:19 +02:00 |
Miriam Baglioni
|
473c6d3a23
|
produces AtomicActions instead of Projects
|
2020-05-22 15:26:57 +02:00 |
Miriam Baglioni
|
4589c428b1
|
generate action sets and saves them in the hdfs path for the actions sets
|
2020-05-21 16:30:39 +02:00 |
Miriam Baglioni
|
055eec5a77
|
added resource for prepare project test
|
2020-05-20 13:54:10 +02:00 |
Miriam Baglioni
|
9079bc1f61
|
-
|
2020-05-20 13:53:32 +02:00 |
Miriam Baglioni
|
67ba4fde57
|
added test for prepare projects step
|
2020-05-20 13:53:08 +02:00 |
Miriam Baglioni
|
3c0eb12d3e
|
removed the not zipped files
|
2020-05-20 10:31:05 +02:00 |
Miriam Baglioni
|
c0d9e02340
|
zipped test resources that are too big
|
2020-05-20 10:30:25 +02:00 |
Miriam Baglioni
|
5e9c9fa87c
|
tests
|
2020-05-20 10:29:57 +02:00 |
Miriam Baglioni
|
faed7521bf
|
added resources for testing
|
2020-05-20 10:29:29 +02:00 |
Miriam Baglioni
|
75491482de
|
added a new preparation step to replicate each project for the programme it is associated to
|
2020-05-20 10:28:56 +02:00 |
Miriam Baglioni
|
eb0e47ba53
|
parameters for h2020 programme
|
2020-05-20 10:26:44 +02:00 |
Miriam Baglioni
|
08218d2f3f
|
new workflow with added steps
|
2020-05-19 18:44:25 +02:00 |
Miriam Baglioni
|
457293ccc0
|
test for the variuos steps of project update with programme
|
2020-05-19 18:43:42 +02:00 |
Miriam Baglioni
|
9447d78ef3
|
added preparation classes
|
2020-05-19 18:42:50 +02:00 |
Miriam Baglioni
|
f0f14caf99
|
removed script files for shell actions not performed
|
2020-05-18 13:06:16 +02:00 |
Miriam Baglioni
|
23bbac7d7c
|
-
|
2020-05-18 13:05:03 +02:00 |
Miriam Baglioni
|
abc45f2708
|
added dnet-45 HttpConnector and related Classes, produced the POJO for projects and programme
|
2020-05-18 13:04:06 +02:00 |
Miriam Baglioni
|
5a648016ef
|
parameters from the GetFile class
|
2020-05-15 18:18:50 +02:00 |
Miriam Baglioni
|
83c262a483
|
workflow to download the files
|
2020-05-15 18:18:31 +02:00 |
Miriam Baglioni
|
22cb9e0da7
|
simple code to get file from URL
|
2020-05-15 18:18:01 +02:00 |
Claudio Atzori
|
0825321d0b
|
improved unit tests in dhp-aggregation
|
2020-05-05 12:39:04 +02:00 |
Claudio Atzori
|
439c6255a2
|
cleanup
|
2020-04-29 19:09:07 +02:00 |
Claudio Atzori
|
6f5b899038
|
reformatted code according to the updated style descriptor
|
2020-04-28 11:23:29 +02:00 |
Claudio Atzori
|
a0bdbacdae
|
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
|
2020-04-27 14:52:31 +02:00 |
Claudio Atzori
|
7a3f8085f7
|
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
|
2020-04-27 14:45:40 +02:00 |
Claudio Atzori
|
9147af7fed
|
actionsets migration workflow moved in dhp-workflows/dhp-actionmanager
|
2020-04-20 15:24:33 +02:00 |
Claudio Atzori
|
d714bfb4d4
|
collectedfrom field moved in common parent class Oaf.java
|
2020-04-20 12:25:19 +02:00 |
Claudio Atzori
|
ad7a131b18
|
introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin, applied to each java class in the project
|
2020-04-18 12:42:58 +02:00 |
Claudio Atzori
|
6b5f9ca9cb
|
raw graph creation workflow moved under dhp-graph-mapper, claims integration is included
|
2020-04-10 17:53:07 +02:00 |
Claudio Atzori
|
7061d07727
|
ActionSets migration serialize the output as plain text files instead of SequenceFiles
|
2020-04-01 14:58:22 +02:00 |
Michele Artini
|
f6e86b44a6
|
tests
|
2020-03-27 11:46:37 +01:00 |
Michele Artini
|
408be3c632
|
test and fixed a problem with datacite namespaces
|
2020-03-27 11:44:50 +01:00 |
Claudio Atzori
|
c0e825e713
|
dhp-aggregation workflow tests upgraded to junit5
|
2020-03-25 17:59:45 +01:00 |
Michele Artini
|
ebe45003d9
|
fixed some junit packages
|
2020-03-25 16:45:03 +01:00 |
Michele Artini
|
fd57722c69
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-25 15:56:49 +01:00 |
Michele Artini
|
2559299da4
|
tests
|
2020-03-25 12:25:00 +01:00 |
Michele Artini
|
0fda2c3a30
|
some tests on db records
|
2020-03-25 09:43:58 +01:00 |
Michele Artini
|
e3760c7f39
|
fix a bug with organization countries
|
2020-03-24 08:43:56 +01:00 |
Claudio Atzori
|
ecb64e4998
|
Merge branch 'migration_wfs_regular_all_steps'
|
2020-03-23 08:57:01 +01:00 |
Michele Artini
|
15160032bd
|
fixed a bug setting some organization fields
|
2020-03-23 08:39:14 +01:00 |
Claudio Atzori
|
36236dd1c1
|
action migration workflow produces eu.dnetlib.dhp.schema.action.AtomicAction(s)
|
2020-03-19 14:00:38 +01:00 |
Claudio Atzori
|
abe8fb69a2
|
added global properties, moved postprocessing script inside the oozie_app directory
|
2020-03-18 15:43:54 +01:00 |
Claudio Atzori
|
c7e0730720
|
compress the output produced by migration steps 1 and 2
|
2020-03-18 09:34:57 +01:00 |
Claudio Atzori
|
2f11e37602
|
fixed expansion of path variables
|
2020-03-17 19:41:07 +01:00 |
Claudio Atzori
|
2795b0b096
|
no need to mkdir a the all_entities file
|
2020-03-17 17:22:14 +01:00 |
Claudio Atzori
|
19746ad308
|
when reuseContent, reset ${workingPath}/all_entities
|
2020-03-17 17:17:06 +01:00 |
Claudio Atzori
|
2f0c85eeb3
|
updated parameters for regular_all_steps worfklow, introduced flag 'reuseContent'
|
2020-03-17 17:04:58 +01:00 |
Claudio Atzori
|
b8290b5851
|
updated parameters for regular_all_steps worfklow
|
2020-03-17 15:45:30 +01:00 |
Claudio Atzori
|
4706f24ec5
|
updated parameters for regular_all_steps worfklow
|
2020-03-17 15:23:54 +01:00 |
Claudio Atzori
|
af835f2f98
|
when migrating actionsets from DM cluster, populate the AtomicAction.targetValue when empty (dedup similarities)
|
2020-03-15 18:07:59 +01:00 |
Claudio Atzori
|
9c84e21b87
|
added workflow to migrate latest version of each actionset content from DM to OCEAN cluster, mapping the targetValues from the old protobuf data model to the dhp.OAF datamodel
|
2020-03-13 15:56:52 +01:00 |
Michele Artini
|
b6efa9d6ab
|
Configuration of the SequenceFile Writer
|
2020-03-05 15:49:14 +01:00 |
Michele Artini
|
755eade2fb
|
fix creation ids
|
2020-03-04 14:49:45 +01:00 |
Michele Artini
|
e7167b996a
|
logs and closeable
|
2020-03-04 10:46:36 +01:00 |
Michele Artini
|
4b29a121b0
|
migration using spark in step2
|
2020-03-02 16:12:14 +01:00 |
Michele Artini
|
5445a57102
|
migration using spark in step2
|
2020-03-02 16:11:59 +01:00 |
Michele Artini
|
93665773ea
|
Fixed a problem with JavaRDD Union
|
2020-02-25 15:59:21 +01:00 |
Michele Artini
|
5d3739b5cf
|
migration of claims
|
2020-02-19 15:11:17 +01:00 |
Michele Artini
|
173f1df1e5
|
saved a query for openaire production database
|
2020-02-19 10:15:08 +01:00 |
Sandro La Bruzzo
|
9a2d74ac82
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-02-19 10:13:45 +01:00 |
Sandro La Bruzzo
|
e5d7cdf422
|
fixed sql query
|
2020-02-19 10:13:36 +01:00 |
Claudio Atzori
|
6a288625e5
|
fixed workflow outgoing node
|
2020-02-17 15:04:33 +01:00 |
Sandro La Bruzzo
|
76ee85141a
|
added oozie job for DNET migration and implemented Spark job for extracting entities
|
2020-02-17 12:31:44 +01:00 |
Michele Artini
|
176c5606bd
|
aligned with origin/master, aligned model and mapping
|
2020-02-17 10:40:53 +01:00 |
Michele Artini
|
80cb52593f
|
bug fixing
|
2020-02-13 15:34:13 +01:00 |
Michele Artini
|
cdea0dae75
|
bug fixing
|
2020-02-12 16:34:00 +01:00 |
Michele Artini
|
69336195d3
|
simplifications
|
2020-02-12 11:12:38 +01:00 |
Michele Artini
|
06c2fd6df9
|
bug fixing
|
2020-02-11 15:29:50 +01:00 |
Michele Artini
|
5fc09b179c
|
bug fixing
|
2020-02-11 12:48:03 +01:00 |
Michele Artini
|
95740767e0
|
Ready for tests
|
2020-02-10 16:04:06 +01:00 |
Michele Artini
|
181e8498d4
|
...
|
2020-02-07 16:02:49 +01:00 |
Michele Artini
|
bb1533a07e
|
partial commit
|
2020-02-05 15:35:40 +01:00 |
Michele Artini
|
fbb0fc140b
|
partial implementation of migration
|
2020-02-04 15:25:47 +01:00 |
Michele Artini
|
6bfe2dc96e
|
partial implementation
|
2020-01-22 16:00:23 +01:00 |
Michele Artini
|
f6eccdde33
|
partial implementation
|
2020-01-21 14:17:05 +01:00 |
Michele Artini
|
cd114f1c3b
|
partial update
|
2020-01-21 12:32:10 +01:00 |
Michele Artini
|
b35c59eb42
|
partial implementation of entities from db
|
2020-01-20 16:04:19 +01:00 |
Michele Artini
|
81f82b5d34
|
partial implementation of applications to migrate entities
|
2020-01-17 15:26:21 +01:00 |
Sandro La Bruzzo
|
abd9034da0
|
implemented DedupRecord factory with the merge of publications
|
2019-12-11 15:43:24 +01:00 |
miconis
|
4b66b471a4
|
implementation of the sorting by trust mechanism and the merge of oaf entities
|
2019-12-10 14:57:16 +01:00 |
Sandro La Bruzzo
|
cc63706347
|
Implemented deduplication on spark
|
2019-12-06 13:38:00 +01:00 |
Claudio Atzori
|
1e7a2ac41d
|
align parmeter names, graph import procedure WIP
|
2019-11-04 17:41:01 +01:00 |
Claudio Atzori
|
c8bb81cd9a
|
align dependencies with IIS cluster
|
2019-10-29 18:10:20 +01:00 |
Sandro La Bruzzo
|
5a8a323f2a
|
dhp-collection-worker integrated in dhp-workflows
|
2019-10-24 11:36:59 +02:00 |
Claudio Atzori
|
c7654b6fe3
|
renamed collection & transformation oozie workflow files
|
2019-10-18 09:42:20 +02:00 |
Claudio Atzori
|
27db5afdad
|
integrating the oozie workflow build/deploy/run mechanism, took inspiration from iis
|
2019-10-17 18:38:30 +02:00 |
Sandro La Bruzzo
|
bbb87d0e3d
|
implemented saxonHE on transformation spark job
|
2019-10-10 11:33:51 +02:00 |
Sandro La Bruzzo
|
4b8c7c279d
|
Added documentation on a class, and reused ArgumetApplicationParser on dhp-aggregation
|
2019-10-07 17:02:53 +02:00 |
Sandro La Bruzzo
|
53ec9bccca
|
changed the implemetation of RabitMQ Comunication
|
2019-04-16 12:28:01 +02:00 |
Sandro La Bruzzo
|
403c13eebf
|
Implemented message manager, Fixed bug on collection worker, implemented Collecion and Transform spark job
|
2019-04-11 15:39:29 +02:00 |
Sandro La Bruzzo
|
ded6aef5e1
|
moved collector worker
|
2019-04-03 16:05:16 +02:00 |
Sandro La Bruzzo
|
c2ecbf5572
|
moved collector worker
|
2019-04-03 16:03:36 +02:00 |
Sandro La Bruzzo
|
12c65eab4c
|
implemented command line
|
2019-03-25 15:18:31 +01:00 |
Sandro La Bruzzo
|
6156562893
|
Added test
|
2019-03-18 10:47:28 +01:00 |
Sandro La Bruzzo
|
e67d9ee1a9
|
added first implementation of dnet-workflows
|
2019-03-18 10:44:35 +01:00 |