Commit Graph

971 Commits

Author SHA1 Message Date
Claudio Atzori 0bdfbb0a57 reintroduced RDD based relation cut off procedure 2020-05-19 15:02:21 +02:00
Claudio Atzori f3bc8aed31 lifted memory requirements for country propagation wf 2020-05-18 15:29:10 +02:00
Miriam Baglioni b71fbb68b1 removed the removeOutputDir command from code. Reltions are written in Append. The erase of the output dir ment to remove all the relations computed in the prevoius steps 2020-05-18 13:57:20 +02:00
Miriam Baglioni 629af7cb79 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-18 13:07:36 +02:00
Miriam Baglioni f0f14caf99 removed script files for shell actions not performed 2020-05-18 13:06:16 +02:00
Miriam Baglioni 23bbac7d7c - 2020-05-18 13:05:03 +02:00
Miriam Baglioni 4f1ff7ba73 added dependency to org.apache.commons common-csv 2020-05-18 13:04:39 +02:00
Miriam Baglioni abc45f2708 added dnet-45 HttpConnector and related Classes, produced the POJO for projects and programme 2020-05-18 13:04:06 +02:00
Claudio Atzori ef9a9a9f1a remove the outout path when starting 2020-05-15 22:34:19 +02:00
Miriam Baglioni 5a648016ef parameters from the GetFile class 2020-05-15 18:18:50 +02:00
Miriam Baglioni 83c262a483 workflow to download the files 2020-05-15 18:18:31 +02:00
Miriam Baglioni 22cb9e0da7 simple code to get file from URL 2020-05-15 18:18:01 +02:00
Claudio Atzori 7838f2c63f init the empty list for author pids mapped from OAF 2020-05-15 17:06:01 +02:00
Claudio Atzori 82b615ab33 NPE check 2020-05-15 16:04:46 +02:00
Miriam Baglioni 3aaad753fd Merge branch 'master' into dhp_oaf_model 2020-05-15 15:55:23 +02:00
Miriam Baglioni e26a67c3eb merge with upstream 2020-05-15 15:53:05 +02:00
Claudio Atzori eeb2b7e4cd Merge branch 'master' into dhp_oaf_model 2020-05-15 15:17:03 +02:00
Claudio Atzori 7a89507ab1 code formatting 2020-05-15 15:16:54 +02:00
Miriam Baglioni 5ec8c49ad5 removed serialization points 2020-05-15 12:49:58 +02:00
Claudio Atzori 1d35836a58 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-05-15 12:26:31 +02:00
Claudio Atzori cfc8948717 fixed mapping OdfToGraph: pick the correct element to map author pids and author affiliations; extended mapping Oaf2Graph: added support for author pids 2020-05-15 12:26:16 +02:00
Michele Artini 2a4e68a292 events recognition 2020-05-15 12:25:37 +02:00
Claudio Atzori a832658296 code formatting 2020-05-15 10:21:09 +02:00
Claudio Atzori b7e198475a added common methods to create HiveDB table identifiers 2020-05-15 10:20:07 +02:00
Claudio Atzori 50d6a2ad3c added output directory removal in the blacklist spark actions; included common global properties in blacklist's workflow.xml 2020-05-15 09:53:37 +02:00
Claudio Atzori 18f46e47b9 added relations to the graph2hive import workflow 2020-05-15 09:34:48 +02:00
Claudio Atzori 9d028ffe1c cleanup 2020-05-15 09:28:55 +02:00
Claudio Atzori fd62359538 cleanup 2020-05-15 09:28:15 +02:00
Claudio Atzori eb64335a54 parallel implementation for graph Hive importer 2020-05-15 09:05:26 +02:00
Miriam Baglioni 94571c9a51 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-14 18:29:55 +02:00
Miriam Baglioni f25db01664 changed in the constant from propagationconstants to modelconstants 2020-05-14 18:29:24 +02:00
Miriam Baglioni d05630d979 removed the constants added in ModelConstants 2020-05-14 18:22:50 +02:00
Miriam Baglioni 42085e8d99 added some constants 2020-05-14 18:22:28 +02:00
Claudio Atzori f044d09315 revised mapping: more accurate mapping for name/surname from datacite format; improved mapping of null values 2020-05-14 15:07:24 +02:00
Miriam Baglioni e7eb4f377e Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-14 10:34:17 +02:00
Miriam Baglioni 8828458acf minor changes 2020-05-14 10:34:12 +02:00
Claudio Atzori ab37953332 added global properties in wf definitions to avoid repeating name-node and job-tracker in the (many) distcp actions; reintroduced output directory removal at the beginning of each spark action 2020-05-14 10:25:41 +02:00
Claudio Atzori 12bfa6702e Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-05-13 17:01:17 +02:00
Claudio Atzori 5ecacad70a fixed default resource typing in Oaf/Odf mapping 2020-05-13 17:01:11 +02:00
Michele Artini c0265213a0 partial implementation 2020-05-13 12:00:27 +02:00
Claudio Atzori 1ddd33de41 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-05-13 09:04:41 +02:00
Claudio Atzori 85f3c55992 fixed node names in blacklist workflow 2020-05-13 09:04:33 +02:00
Miriam Baglioni 43f127448d changed the package name from dhp-propagation to dhp-enrichment for the preparation phase of funding propagation 2020-05-12 18:24:26 +02:00
Claudio Atzori ec0782e582 renamed jar containing the bulktagging and propagation workflows from dhp-[bulktagging|propagation] to dhp-enrichment; adjusted xml formatting 2020-05-12 15:49:28 +02:00
Miriam Baglioni 1547ca7e15 added blacklist step to the end of the provision wf 2020-05-12 12:17:27 +02:00
Miriam Baglioni 14979f299e changed the configuration factory 2020-05-12 11:28:38 +02:00
Miriam Baglioni f8aef6161a minor modification 2020-05-12 11:28:07 +02:00
Miriam Baglioni 7387f3449a changed the route to find the verb resolver classes 2020-05-12 11:27:38 +02:00
Miriam Baglioni 7687519f00 merged conflicts with upstream branch 2020-05-12 10:03:44 +02:00
Miriam Baglioni 8ffc050b8a fixed problem in communityconfigurationfactory test 2020-05-12 10:01:09 +02:00