1
0
Fork 0
Commit Graph

1058 Commits

Author SHA1 Message Date
Enrico Ottonello fc80e8c7de added accumulator; last modified date of the record is added to saved data; lambda file is partitioned into 20 parts before starting downloading 2020-05-18 19:51:29 +02:00
Claudio Atzori f3bc8aed31 lifted memory requirements for country propagation wf 2020-05-18 15:29:10 +02:00
Miriam Baglioni b71fbb68b1 removed the removeOutputDir command from code. Reltions are written in Append. The erase of the output dir ment to remove all the relations computed in the prevoius steps 2020-05-18 13:57:20 +02:00
Miriam Baglioni 629af7cb79 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-18 13:07:36 +02:00
Claudio Atzori ef9a9a9f1a remove the outout path when starting 2020-05-15 22:34:19 +02:00
Enrico Ottonello 0b29bb7e3b spark job to download orcid record modified after a fixed date 2020-05-15 19:49:26 +02:00
Claudio Atzori 7838f2c63f init the empty list for author pids mapped from OAF 2020-05-15 17:06:01 +02:00
Claudio Atzori 82b615ab33 NPE check 2020-05-15 16:04:46 +02:00
Miriam Baglioni e26a67c3eb merge with upstream 2020-05-15 15:53:05 +02:00
Claudio Atzori 7a89507ab1 code formatting 2020-05-15 15:16:54 +02:00
Miriam Baglioni 5ec8c49ad5 removed serialization points 2020-05-15 12:49:58 +02:00
Claudio Atzori 1d35836a58 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-05-15 12:26:31 +02:00
Claudio Atzori cfc8948717 fixed mapping OdfToGraph: pick the correct element to map author pids and author affiliations; extended mapping Oaf2Graph: added support for author pids 2020-05-15 12:26:16 +02:00
Michele Artini 2a4e68a292 events recognition 2020-05-15 12:25:37 +02:00
Claudio Atzori a832658296 code formatting 2020-05-15 10:21:09 +02:00
Claudio Atzori b7e198475a added common methods to create HiveDB table identifiers 2020-05-15 10:20:07 +02:00
Claudio Atzori 50d6a2ad3c added output directory removal in the blacklist spark actions; included common global properties in blacklist's workflow.xml 2020-05-15 09:53:37 +02:00
Claudio Atzori 18f46e47b9 added relations to the graph2hive import workflow 2020-05-15 09:34:48 +02:00
Claudio Atzori 9d028ffe1c cleanup 2020-05-15 09:28:55 +02:00
Claudio Atzori fd62359538 cleanup 2020-05-15 09:28:15 +02:00
Claudio Atzori eb64335a54 parallel implementation for graph Hive importer 2020-05-15 09:05:26 +02:00
Miriam Baglioni 94571c9a51 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-14 18:29:55 +02:00
Miriam Baglioni f25db01664 changed in the constant from propagationconstants to modelconstants 2020-05-14 18:29:24 +02:00
Miriam Baglioni d05630d979 removed the constants added in ModelConstants 2020-05-14 18:22:50 +02:00
Miriam Baglioni 42085e8d99 added some constants 2020-05-14 18:22:28 +02:00
Claudio Atzori f044d09315 revised mapping: more accurate mapping for name/surname from datacite format; improved mapping of null values 2020-05-14 15:07:24 +02:00
Miriam Baglioni e7eb4f377e Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-14 10:34:17 +02:00
Miriam Baglioni 8828458acf minor changes 2020-05-14 10:34:12 +02:00
Claudio Atzori ab37953332 added global properties in wf definitions to avoid repeating name-node and job-tracker in the (many) distcp actions; reintroduced output directory removal at the beginning of each spark action 2020-05-14 10:25:41 +02:00
Claudio Atzori 12bfa6702e Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-05-13 17:01:17 +02:00
Claudio Atzori 5ecacad70a fixed default resource typing in Oaf/Odf mapping 2020-05-13 17:01:11 +02:00
Enrico Ottonello 12756f9d41 multithread (4 threads) test to feed elastic search 2020-05-13 16:11:40 +02:00
Michele Artini c0265213a0 partial implementation 2020-05-13 12:00:27 +02:00
Sandro La Bruzzo a92ee0f41e Merge remote-tracking branch 'origin/master' into doiboost 2020-05-13 10:38:13 +02:00
Sandro La Bruzzo d876f47d06 next step of MAG conversion implemented 2020-05-13 10:38:04 +02:00
Claudio Atzori 1ddd33de41 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-05-13 09:04:41 +02:00
Claudio Atzori 85f3c55992 fixed node names in blacklist workflow 2020-05-13 09:04:33 +02:00
Miriam Baglioni 43f127448d changed the package name from dhp-propagation to dhp-enrichment for the preparation phase of funding propagation 2020-05-12 18:24:26 +02:00
Enrico Ottonello 08040cef80 spark action to analyze orcid lambda file 2020-05-12 16:57:43 +02:00
Claudio Atzori ec0782e582 renamed jar containing the bulktagging and propagation workflows from dhp-[bulktagging|propagation] to dhp-enrichment; adjusted xml formatting 2020-05-12 15:49:28 +02:00
Miriam Baglioni 1547ca7e15 added blacklist step to the end of the provision wf 2020-05-12 12:17:27 +02:00
Miriam Baglioni 14979f299e changed the configuration factory 2020-05-12 11:28:38 +02:00
Miriam Baglioni f8aef6161a minor modification 2020-05-12 11:28:07 +02:00
Miriam Baglioni 7387f3449a changed the route to find the verb resolver classes 2020-05-12 11:27:38 +02:00
Miriam Baglioni 7687519f00 merged conflicts with upstream branch 2020-05-12 10:03:44 +02:00
Miriam Baglioni 8ffc050b8a fixed problem in communityconfigurationfactory test 2020-05-12 10:01:09 +02:00
Claudio Atzori 527e8169a8 adjusted paths pointing to test configurations, cleanup 2020-05-11 18:17:05 +02:00
Claudio Atzori f9a62ba63b added wf nodes to copy entities to the output path 2020-05-11 18:16:39 +02:00
Miriam Baglioni ad63effb4e removed deletion of working dir 2020-05-11 17:48:22 +02:00
Claudio Atzori c6b028f2af code formatting 2020-05-11 17:38:08 +02:00