Commit Graph

494 Commits

Author SHA1 Message Date
Enrico Ottonello 1edcd53581 added shell actions to download all 11 activities files from ORCID 2020-04-28 20:25:09 +02:00
Enrico Ottonello a1861b9eaa workflow works in parallel on 2 activity files 2020-04-24 18:33:37 +02:00
Enrico Ottonello 941e94af06 added workflow for generating authors with dois data sequence file 2020-04-24 15:50:40 +02:00
Enrico Ottonello c03ac6e5bb Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost 2020-04-23 15:26:06 +02:00
Enrico Ottonello 4a6aea1a37 fix formtat problem 2020-04-23 15:25:39 +02:00
Sandro La Bruzzo fdc0523e4c Merge remote-tracking branch 'origin/master' into doiboost 2020-04-23 09:34:13 +02:00
Sandro La Bruzzo 4ba386d996 improved crossref mapping 2020-04-23 09:33:48 +02:00
Claudio Atzori 8851050814 replaced hive_db_name with hiveDbName 2020-04-23 08:36:40 +02:00
Claudio Atzori 91f81107b1 applying code formatting 2020-04-23 07:52:32 +02:00
Claudio Atzori 1e7583c5a6 filtered invisible records in data provision workflow 2020-04-23 07:51:34 +02:00
Claudio Atzori 9ddafd46ca fixed dedup record id prefix, set the correct dataInfo in the DedupRecordFactory 2020-04-23 07:50:18 +02:00
Claudio Atzori ade4cb97af fixed parameters passed to the postprocessing action in the workflow mapping the graph as hive DB 2020-04-22 18:24:06 +02:00
Sandro La Bruzzo bb6c9785b4 Merge remote-tracking branch 'origin/master' into doiboost 2020-04-22 15:00:57 +02:00
Sandro La Bruzzo 157915988c improved crossref mapping 2020-04-22 15:00:44 +02:00
Enrico Ottonello 5977f08e92 merged 2020-04-22 14:50:50 +02:00
Enrico Ottonello 7d759947ae used vtd for parsing orcid xml record, set 4g heapspace 2020-04-22 14:41:19 +02:00
Claudio Atzori ba4339f142 excluded org.apache.hadoop:hadoop-common from the dnet-actionmanager-common dependency to avoid multiple transitive jaxb-impl versions to conflict when instantiating the ISLookup client stub 2020-04-22 14:23:09 +02:00
Claudio Atzori e81960335c Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-04-22 10:46:37 +02:00
Michele Artini 9e4d58f505 ResultType 2020-04-22 10:07:26 +02:00
Claudio Atzori c891661822 small adjustments in the graph2hive workflow 2020-04-21 18:52:23 +02:00
Claudio Atzori 0b55795d4d small adjustments in the provisioning workflow 2020-04-21 16:15:04 +02:00
Claudio Atzori 88fbb3a353 added sparkSqlWarehouseDir to the default extra spark options passed to each workflow 2020-04-21 16:13:43 +02:00
Claudio Atzori cd320efa96 added extra spark options to graph to hive workflow 2020-04-21 16:12:20 +02:00
Claudio Atzori 91e72a6944 Dataset based implementation for SparkCreateDedupRecord phase, fixed datasource entity dump supplementing dedup unit tests 2020-04-21 12:06:08 +02:00
miconis 5c9ef08a8e spark dedup test fixed 2020-04-21 10:19:04 +02:00
Sandro La Bruzzo 3624947a7f Merge remote-tracking branch 'origin/master' into doiboost 2020-04-21 08:34:24 +02:00
Claudio Atzori d772d967aa restored changes from master branch 2020-04-20 18:53:06 +02:00
Claudio Atzori eb8a020859 fixed behaviour of DedupRecordFactory 2020-04-20 18:44:06 +02:00
Sandro La Bruzzo 039f9b7871 Merge remote-tracking branch 'origin/master' into doiboost 2020-04-20 18:10:29 +02:00
Sandro La Bruzzo e4b105cece improved crossref mapping 2020-04-20 18:10:07 +02:00
Claudio Atzori ede1af3d85 Merge branch 'master' into deduptesting 2020-04-20 16:52:14 +02:00
miconis 1102e32462 SparkDedupTest updated and organization dump fixed 2020-04-20 16:49:01 +02:00
Claudio Atzori 667d23c58b finalising Actionset migration workflow 2020-04-20 16:45:21 +02:00
miconis 4da13e4570 Revert "Merge branch 'master' into deduptesting"
This reverts commit 772f75d167, reversing
changes made to 5f45f2c77f.
2020-04-20 16:04:49 +02:00
Claudio Atzori 9147af7fed actionsets migration workflow moved in dhp-workflows/dhp-actionmanager 2020-04-20 15:24:33 +02:00
miconis 772f75d167 Merge branch 'master' into deduptesting 2020-04-20 14:50:12 +02:00
Sandro La Bruzzo 5d46ec7d5f fixed name of wrong package 2020-04-20 14:49:32 +02:00
Sandro La Bruzzo 82cc3b707d fixed name of wrong package 2020-04-20 14:47:06 +02:00
Sandro La Bruzzo b2c872cb4d merged master 2020-04-20 14:04:40 +02:00
Sandro La Bruzzo 7029942e06 Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost 2020-04-20 13:26:41 +02:00
Sandro La Bruzzo 0e45f4d450 continue mapping from crossref to OAF 2020-04-20 13:26:29 +02:00
Enrico Ottonello a466648b4b renamed output file 2020-04-20 12:32:03 +02:00
Claudio Atzori d714bfb4d4 collectedfrom field moved in common parent class Oaf.java 2020-04-20 12:25:19 +02:00
Enrico Ottonello 4ae55e3891 added workflow parameters 2020-04-20 12:00:04 +02:00
Michele Artini 8ff7facfa3 fixed collectedFrom ID 2020-04-20 11:09:27 +02:00
Sandro La Bruzzo eef60bb9f4 created structure of oozie wf for ORCID 2020-04-20 10:24:57 +02:00
Sandro La Bruzzo 4d0d9de07e reorganized package and fixed test 2020-04-20 10:02:42 +02:00
Sandro La Bruzzo 618bc1fc72 first implementation of crossrefMapping 2020-04-20 09:53:34 +02:00
Michele Artini 25307965d2 add a default datainfo if missing 2020-04-20 09:43:27 +02:00
Michele Artini d2058fdc47 tests 2020-04-20 09:31:14 +02:00