Commit Graph

2247 Commits

Author SHA1 Message Date
Claudio Atzori 923d19ea8e mdstore read lock/unlock when bulk copying records from mongodb to hdfs 2021-05-04 18:06:21 +02:00
Claudio Atzori ba86835951 using common constants from ModelConstants 2021-05-04 11:51:52 +02:00
Michele Artini f4bd2b5619 recert file SparkDedupTest.java 2021-05-04 10:26:14 +02:00
Michele Artini b4877da363 Merge branch 'stable_ids' into prepare_ror_actionset 2021-05-03 08:13:55 +02:00
Alessia Bardi 9a20057615 fixed query for organisations' pids 2021-04-29 15:23:39 +02:00
Michele Artini 6692128234 Merge branch 'stable_ids' into prepare_ror_actionset 2021-04-29 13:24:08 +02:00
Michele Artini a278d67175 parse input file 2021-04-29 11:34:47 +02:00
Claudio Atzori f6ccd54d87 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-04-29 10:10:01 +02:00
Claudio Atzori 91e7220f20 cleaned up workflow for actionset migration, adjusted dnet|cnr* dependency versions 2021-04-29 10:09:52 +02:00
Michele Artini f77ba34126 pid types 2021-04-29 09:50:05 +02:00
Michele Artini 7c5cd86927 annotations and tests 2021-04-29 09:29:19 +02:00
Michele Artini b5cf505cc6 partial implementation of the ROR->actionset workflow 2021-04-28 16:00:24 +02:00
Enrico Ottonello c537986b7c deleted folders with merged data immediately before merge phases 2021-04-28 11:25:25 +02:00
Sandro La Bruzzo 2129e9caa7 updated pangaea transformation to parse directly the xml 2021-04-28 10:21:03 +02:00
Claudio Atzori 5afa7d3e0c core utilities in dhp-common moved in external module dhp-schemas 2021-04-27 15:44:01 +02:00
Sandro La Bruzzo 63c0303137 removed unused import, add log 2021-04-27 12:17:23 +02:00
Sandro La Bruzzo 74484d2823 bug fixing 2021-04-27 12:13:44 +02:00
Sandro La Bruzzo c74b03d59c Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids 2021-04-27 11:31:07 +02:00
Sandro La Bruzzo 7f8848ecdd added first implementation of Pangaea Mapping 2021-04-27 11:30:37 +02:00
Claudio Atzori 27ab8a704d adjusted poms to align with the external dhp-schema module 2021-04-27 10:12:27 +02:00
Claudio Atzori a7cf449b36 cleanup 2021-04-27 10:11:26 +02:00
Claudio Atzori fa42026590 fixed PersonCleaner extension functions 2021-04-27 10:10:06 +02:00
Claudio Atzori ef4bfd82e2 code formatting 2021-04-27 10:09:31 +02:00
Claudio Atzori faa8f6f4e2 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-04-27 09:57:03 +02:00
miconis 6d5c14e030 assertions updated in entity merger test 2021-04-27 09:47:49 +02:00
Claudio Atzori c2bb03c8b5 depending on external dhp-schemas module 2021-04-23 17:57:35 +02:00
Claudio Atzori c25238480c making ODF record parsing namespace unaware (#6629) 2021-04-23 17:34:57 +02:00
miconis d0e3366c34 Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids 2021-04-22 11:45:19 +02:00
miconis 3c12eeadce bug fix in propagation of relations 2021-04-22 11:44:33 +02:00
Claudio Atzori e5abbec2ba [orcid] download of the lambda file defined in a script 2021-04-22 11:22:10 +02:00
Claudio Atzori 55964cbd81 [orcid] large oozie workflow cleanup; updated workflow for the orcidnodoi actionset creation 2021-04-22 10:18:09 +02:00
Claudio Atzori 8f309b72ff [dedup] using node names consistently across the workflow 2021-04-21 17:54:51 +02:00
Claudio Atzori 52244f813a merging from enrico.ottonello/dnet-hadoop:orcid-no-doi 2021-04-21 12:24:09 +02:00
Sandro La Bruzzo fd29307b84 updated workflow name 2021-04-21 09:21:41 +02:00
Claudio Atzori 815b9f4d56 [openorgs dedup] fixed workflow parameter declarations. Introduced support for resuming the execution from intermediate steps 2021-04-20 17:24:45 +02:00
Claudio Atzori d0d477cca3 code formatting 2021-04-20 12:50:34 +02:00
miconis 0393cdce42 addition of alternative names in export queries 2021-04-20 12:45:21 +02:00
miconis cadd0a5de8 modification of the queries for openorgs: they now consider also pending orgs 2021-04-20 12:06:56 +02:00
Sandro La Bruzzo e06c7f32f6 updated id figshare as described in #6377 2021-04-20 10:18:07 +02:00
Sandro La Bruzzo dbe0d0378e resolved ticket #6377 2021-04-20 09:44:44 +02:00
Sandro La Bruzzo 524e5f3092 Improved parallelization on transformation wf on hadoop 2021-04-19 15:17:25 +02:00
Sandro La Bruzzo cdfe01bbae improved parallelization on transformation job 2021-04-19 15:14:52 +02:00
Sandro La Bruzzo 3ae67b7a1d Merge remote-tracking branch 'origin/stable_ids' into stable_ids 2021-04-16 17:36:57 +02:00
Sandro La Bruzzo a16e5299f9 applied unique function on the final dataset 2021-04-16 17:36:48 +02:00
Claudio Atzori 45057440c1 code formatting 2021-04-16 17:28:25 +02:00
Enrico Ottonello 34ca792a55 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2021-04-16 17:18:46 +02:00
Enrico Ottonello 27068aacd1 wf to move orcid-no-doi dataset on the folder ready the import 2021-04-16 17:17:47 +02:00
miconis 7ad573d023 bug fix: changed join in propagaterelations without applying filter on the id 2021-04-16 16:40:42 +02:00
Sandro La Bruzzo 67085da305 fixed NPE 2021-04-16 11:05:58 +02:00
Sandro La Bruzzo 644aa8f40c Merge remote-tracking branch 'origin/stable_ids' into stable_ids 2021-04-16 09:14:26 +02:00