Commit Graph

946 Commits

Author SHA1 Message Date
Miriam Baglioni 6750075fbd merge upstream 2020-05-21 16:31:09 +02:00
Miriam Baglioni 4589c428b1 generate action sets and saves them in the hdfs path for the actions sets 2020-05-21 16:30:39 +02:00
miconis 8b35e0e7f0 reimplementation of the author merging in deduprecord creation. implementation of the test class. minor changes 2020-05-21 12:02:44 +02:00
miconis 8bbd1d0501 reimplementation of the author merging in deduprecord creation. implementation of the test class. 2020-05-21 11:52:14 +02:00
Michele Artini e43d4d7778 added a coalesce in sql query 2020-05-21 11:08:07 +02:00
Claudio Atzori dbfb9c19fe minor changes 2020-05-21 10:00:14 +02:00
Michele Artini b3bcbb3129 resolve name of organization countries 2020-05-21 08:41:32 +02:00
Claudio Atzori da4267d0fe Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-05-20 14:58:22 +02:00
Claudio Atzori d7d2a0637f added extra parameters to the provision indexing workflow 2020-05-20 14:55:38 +02:00
Miriam Baglioni 055eec5a77 added resource for prepare project test 2020-05-20 13:54:10 +02:00
Miriam Baglioni 9079bc1f61 - 2020-05-20 13:53:32 +02:00
Miriam Baglioni 67ba4fde57 added test for prepare projects step 2020-05-20 13:53:08 +02:00
Miriam Baglioni 5e0e554000 Merge branch 'master' into dhp_oaf_model 2020-05-20 10:57:30 +02:00
Miriam Baglioni 76f3f73caa merge upstream 2020-05-20 10:31:40 +02:00
Miriam Baglioni 3c0eb12d3e removed the not zipped files 2020-05-20 10:31:05 +02:00
Miriam Baglioni c0d9e02340 zipped test resources that are too big 2020-05-20 10:30:25 +02:00
Miriam Baglioni 5e9c9fa87c tests 2020-05-20 10:29:57 +02:00
Miriam Baglioni faed7521bf added resources for testing 2020-05-20 10:29:29 +02:00
Miriam Baglioni 75491482de added a new preparation step to replicate each project for the programme it is associated to 2020-05-20 10:28:56 +02:00
Miriam Baglioni 24daa1deaa added to the Project class a new field that is the list of programmes 2020-05-20 10:28:16 +02:00
Miriam Baglioni d323100af0 added the new Programme POJO. It contains the code and the description of the programme 2020-05-20 10:27:27 +02:00
Miriam Baglioni eb0e47ba53 parameters for h2020 programme 2020-05-20 10:26:44 +02:00
Miriam Baglioni 08218d2f3f new workflow with added steps 2020-05-19 18:44:25 +02:00
Miriam Baglioni 457293ccc0 test for the variuos steps of project update with programme 2020-05-19 18:43:42 +02:00
Miriam Baglioni 9447d78ef3 added preparation classes 2020-05-19 18:42:50 +02:00
Michele Artini 85ca5622d4 partial implementation of generation of simple events 2020-05-19 16:17:35 +02:00
Claudio Atzori 0bdfbb0a57 reintroduced RDD based relation cut off procedure 2020-05-19 15:02:21 +02:00
Claudio Atzori f3bc8aed31 lifted memory requirements for country propagation wf 2020-05-18 15:29:10 +02:00
Miriam Baglioni b71fbb68b1 removed the removeOutputDir command from code. Reltions are written in Append. The erase of the output dir ment to remove all the relations computed in the prevoius steps 2020-05-18 13:57:20 +02:00
Miriam Baglioni 629af7cb79 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-18 13:07:36 +02:00
Miriam Baglioni f0f14caf99 removed script files for shell actions not performed 2020-05-18 13:06:16 +02:00
Miriam Baglioni 23bbac7d7c - 2020-05-18 13:05:03 +02:00
Miriam Baglioni 4f1ff7ba73 added dependency to org.apache.commons common-csv 2020-05-18 13:04:39 +02:00
Miriam Baglioni abc45f2708 added dnet-45 HttpConnector and related Classes, produced the POJO for projects and programme 2020-05-18 13:04:06 +02:00
Claudio Atzori ef9a9a9f1a remove the outout path when starting 2020-05-15 22:34:19 +02:00
Miriam Baglioni 5a648016ef parameters from the GetFile class 2020-05-15 18:18:50 +02:00
Miriam Baglioni 83c262a483 workflow to download the files 2020-05-15 18:18:31 +02:00
Miriam Baglioni 22cb9e0da7 simple code to get file from URL 2020-05-15 18:18:01 +02:00
Claudio Atzori 7838f2c63f init the empty list for author pids mapped from OAF 2020-05-15 17:06:01 +02:00
Claudio Atzori 82b615ab33 NPE check 2020-05-15 16:04:46 +02:00
Miriam Baglioni 3aaad753fd Merge branch 'master' into dhp_oaf_model 2020-05-15 15:55:23 +02:00
Miriam Baglioni e26a67c3eb merge with upstream 2020-05-15 15:53:05 +02:00
Claudio Atzori 7a89507ab1 code formatting 2020-05-15 15:16:54 +02:00
Miriam Baglioni 5ec8c49ad5 removed serialization points 2020-05-15 12:49:58 +02:00
Claudio Atzori 1d35836a58 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-05-15 12:26:31 +02:00
Claudio Atzori cfc8948717 fixed mapping OdfToGraph: pick the correct element to map author pids and author affiliations; extended mapping Oaf2Graph: added support for author pids 2020-05-15 12:26:16 +02:00
Michele Artini 2a4e68a292 events recognition 2020-05-15 12:25:37 +02:00
Claudio Atzori a832658296 code formatting 2020-05-15 10:21:09 +02:00
Claudio Atzori b7e198475a added common methods to create HiveDB table identifiers 2020-05-15 10:20:07 +02:00
Claudio Atzori 50d6a2ad3c added output directory removal in the blacklist spark actions; included common global properties in blacklist's workflow.xml 2020-05-15 09:53:37 +02:00