Commit Graph

770 Commits

Author SHA1 Message Date
Miriam Baglioni 74215f6d9f refactoring 2020-05-25 10:38:16 +02:00
Miriam Baglioni dbde2d243a changed due to move of PacePerson from dhp-graph-mapper to dhp-common 2020-05-25 10:35:39 +02:00
Miriam Baglioni f754c424bd changed logic to compute only onece PacePerson for each Author to be enriched 2020-05-25 10:35:02 +02:00
Miriam Baglioni 8f51af4e9b added PacePerson to get name surname for authors having only fullname set 2020-05-25 10:34:30 +02:00
Miriam Baglioni b258f99ece fix for issue that duplicated result 2020-05-25 10:26:48 +02:00
Miriam Baglioni 8f6ce970f9 moved PacePerson to dhp-common to avoid conflict in dependency with graph-mapper 2020-05-25 10:25:55 +02:00
Miriam Baglioni 0d1ec1913f added fix to avoid duplication of results 2020-05-22 18:42:25 +02:00
Miriam Baglioni 29066a6b46 applied code cleanup 2020-05-22 15:38:50 +02:00
Miriam Baglioni 8610ad5142 added groupby id to fix multiple result with same id at join step 2020-05-22 15:32:55 +02:00
Miriam Baglioni 1e44703e3e merge upstream 2020-05-22 15:30:07 +02:00
Miriam Baglioni 70389b0a30 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-22 13:53:23 +02:00
Miriam Baglioni 4308f31165 added fix to make test run 2020-05-22 13:13:01 +02:00
Claudio Atzori 946598cfba Merge branch 'master' into provision_indexing 2020-05-22 12:35:41 +02:00
Claudio Atzori 3cf2796ac6 code formatting 2020-05-22 12:34:00 +02:00
Michele Artini dc4621b3cb filter ORCID e MAG identifiers 2020-05-22 12:25:01 +02:00
Michele Artini 9f2d0f1b08 filter ORCID e MAG identifiers 2020-05-22 11:00:27 +02:00
Michele Artini 9de71e54a8 filter ORCID e MAG identifiers 2020-05-22 10:47:39 +02:00
Michele Artini c5f7e17348 author fullnames 2020-05-22 10:08:02 +02:00
Claudio Atzori ad40470040 Merge branch 'master' into provision_indexing 2020-05-22 08:51:22 +02:00
Claudio Atzori 925d933204 making XmlRecordFactory immune to graph encoding changes (mostly to avoid NPEs) 2020-05-22 08:50:44 +02:00
Claudio Atzori b33dd58be4 replaced parameter 'reuseRecords' with 'resumeFrom', allowing to restart the provision workflow execution from any step, useful for manual submissions or debugging 2020-05-22 08:50:06 +02:00
Michele Artini c7ca3cf35b Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-21 16:48:20 +02:00
Michele Artini 3e34517479 partial implementation of events with rels 2020-05-21 16:47:53 +02:00
Miriam Baglioni 6750075fbd merge upstream 2020-05-21 16:31:09 +02:00
miconis 8b35e0e7f0 reimplementation of the author merging in deduprecord creation. implementation of the test class. minor changes 2020-05-21 12:02:44 +02:00
miconis 8bbd1d0501 reimplementation of the author merging in deduprecord creation. implementation of the test class. 2020-05-21 11:52:14 +02:00
Michele Artini e43d4d7778 added a coalesce in sql query 2020-05-21 11:08:07 +02:00
Claudio Atzori dbfb9c19fe minor changes 2020-05-21 10:00:14 +02:00
Michele Artini b3bcbb3129 resolve name of organization countries 2020-05-21 08:41:32 +02:00
Claudio Atzori da4267d0fe Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-05-20 14:58:22 +02:00
Claudio Atzori d7d2a0637f added extra parameters to the provision indexing workflow 2020-05-20 14:55:38 +02:00
Miriam Baglioni 76f3f73caa merge upstream 2020-05-20 10:31:40 +02:00
Michele Artini 85ca5622d4 partial implementation of generation of simple events 2020-05-19 16:17:35 +02:00
Claudio Atzori 0bdfbb0a57 reintroduced RDD based relation cut off procedure 2020-05-19 15:02:21 +02:00
Claudio Atzori f3bc8aed31 lifted memory requirements for country propagation wf 2020-05-18 15:29:10 +02:00
Miriam Baglioni b71fbb68b1 removed the removeOutputDir command from code. Reltions are written in Append. The erase of the output dir ment to remove all the relations computed in the prevoius steps 2020-05-18 13:57:20 +02:00
Miriam Baglioni 629af7cb79 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-18 13:07:36 +02:00
Claudio Atzori ef9a9a9f1a remove the outout path when starting 2020-05-15 22:34:19 +02:00
Claudio Atzori 7838f2c63f init the empty list for author pids mapped from OAF 2020-05-15 17:06:01 +02:00
Claudio Atzori 82b615ab33 NPE check 2020-05-15 16:04:46 +02:00
Miriam Baglioni e26a67c3eb merge with upstream 2020-05-15 15:53:05 +02:00
Claudio Atzori 7a89507ab1 code formatting 2020-05-15 15:16:54 +02:00
Miriam Baglioni 5ec8c49ad5 removed serialization points 2020-05-15 12:49:58 +02:00
Claudio Atzori 1d35836a58 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-05-15 12:26:31 +02:00
Claudio Atzori cfc8948717 fixed mapping OdfToGraph: pick the correct element to map author pids and author affiliations; extended mapping Oaf2Graph: added support for author pids 2020-05-15 12:26:16 +02:00
Michele Artini 2a4e68a292 events recognition 2020-05-15 12:25:37 +02:00
Claudio Atzori a832658296 code formatting 2020-05-15 10:21:09 +02:00
Claudio Atzori 50d6a2ad3c added output directory removal in the blacklist spark actions; included common global properties in blacklist's workflow.xml 2020-05-15 09:53:37 +02:00
Claudio Atzori 18f46e47b9 added relations to the graph2hive import workflow 2020-05-15 09:34:48 +02:00
Claudio Atzori 9d028ffe1c cleanup 2020-05-15 09:28:55 +02:00