Commit Graph

1267 Commits

Author SHA1 Message Date
Enrico Ottonello 7e1c987370 Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost 2020-05-08 14:49:50 +02:00
Enrico Ottonello 9d812788e4 added job to download from orcid the records modified after a fixed date, the info are taken from last_modified.csv on hdfs 2020-05-08 14:49:39 +02:00
Miriam Baglioni 9a29ab7508 got back to the readPath we have before 2020-05-08 13:08:56 +02:00
Miriam Baglioni 28556507e7 - 2020-05-08 12:54:52 +02:00
Claudio Atzori b2192fdcdc simplified reset_outputpath nodes across the workflows, applied common xml formatting 2020-05-08 12:33:31 +02:00
Miriam Baglioni 4c94231cad merge with master fork 2020-05-08 12:25:57 +02:00
Miriam Baglioni 9b4c0d4b3a - 2020-05-08 11:51:45 +02:00
Miriam Baglioni 53952707b6 modified test because of new step of data preparation. It now expects to find ResultCountrySet serialization nstead of DatasourceCountry 2020-05-08 11:49:19 +02:00
Claudio Atzori 62ea19f1d3 introduced mapping for ExternalReferences, made urls defined within an instance unique 2020-05-08 09:43:26 +02:00
Claudio Atzori 8c67073a07 force speculative execution to false 2020-05-08 09:42:21 +02:00
Miriam Baglioni d6b9de9f46 Merge branch 'master' of https://code-repo.d4science.org/miriam.baglioni/dnet-hadoop 2020-05-07 18:22:59 +02:00
Miriam Baglioni f95d288681 fixed swithch of parameters 2020-05-07 18:22:32 +02:00
Claudio Atzori 166aafd936 heavy cleanup 2020-05-07 18:22:26 +02:00
Michele Artini ac0da5a7ee Partial implementation of broker events 2020-05-07 12:31:26 +02:00
Miriam Baglioni fb405275f7 merged with master 2020-05-07 11:48:21 +02:00
Miriam Baglioni e124278934 - 2020-05-07 11:47:11 +02:00
Claudio Atzori 5111671e62 celanup 2020-05-07 11:47:00 +02:00
Miriam Baglioni 9f8855991c changed Encorders.bean to Encoders.kryo 2020-05-07 11:44:35 +02:00
Miriam Baglioni 207b899d6d merged with upstream 2020-05-07 11:43:53 +02:00
Claudio Atzori e07feb4c5f removed spurious file 2020-05-07 11:42:46 +02:00
Claudio Atzori 5b3f8a0e90 using Encoders.bean instead of kryo 2020-05-07 11:41:41 +02:00
Miriam Baglioni 182225becb Merge branch 'master' of https://code-repo.d4science.org/miriam.baglioni/dnet-hadoop 2020-05-07 11:38:17 +02:00
Miriam Baglioni 5efae3acb9 new workflow for job3 2020-05-07 11:38:10 +02:00
Claudio Atzori 73243793b2 Dataset based implementation for SparkCountryPropagationJob3 2020-05-07 11:15:24 +02:00
Claudio Atzori 128c3bf1c8 restored Author bean with simple getter/setter, author pid addition moved into dedicated implementation SparkOrcidToResultFromSemRelJob3 2020-05-07 11:14:56 +02:00
Miriam Baglioni b2fec32c87 new workflow for job3 2020-05-07 10:01:57 +02:00
Miriam Baglioni 29bc8c44b1 changes in the construction of new country set 2020-05-07 10:01:34 +02:00
Miriam Baglioni 55e825acd4 chenged the test according to changes in SparkCOuntryPropagationJob2 2020-05-07 10:01:00 +02:00
Miriam Baglioni 16193cf0ba new workflow and parameter for country propagation 2020-05-07 09:59:58 +02:00
Miriam Baglioni 5a476c7a13 chenged the xquery for the cfhb table 2020-05-07 09:58:17 +02:00
Miriam Baglioni 42ad51577a new implementation with one more serialization step 2020-05-07 09:57:49 +02:00
Claudio Atzori 17860d3ab6 general changes in the RAW graph mapping: missing collectedfrom/hostedby causes records to be skipped; factored out most of the constants in ModelConstants class (dhp-schemas) 2020-05-06 13:20:02 +02:00
Claudio Atzori fdfecc9578 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-05-06 11:28:01 +02:00
Claudio Atzori c79e2f5977 drop workingPath before starting the dedup workflow 2020-05-06 11:27:44 +02:00
Michele Artini 8f30a09d84 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-05 17:12:22 +02:00
Michele Artini ccc609f909 new module for the production of broker events 2020-05-05 17:09:00 +02:00
Miriam Baglioni dd2e698a72 added a sequentialization step on the spark job. Addedd new parameter 2020-05-05 17:03:43 +02:00
Claudio Atzori 0825321d0b improved unit tests in dhp-aggregation 2020-05-05 12:39:04 +02:00
Miriam Baglioni 252b219dd5 chanced the name of some properties 2020-05-05 10:03:32 +02:00
Claudio Atzori 4a8487165c using long param names in wf definition 2020-05-04 19:19:29 +02:00
Claudio Atzori a2fc37df5f adjusted parameters 2020-05-04 19:18:59 +02:00
Claudio Atzori f1b7e14036 code formatting 2020-05-04 19:18:34 +02:00
Claudio Atzori 405f495d54 code formatting 2020-05-04 19:18:12 +02:00
Claudio Atzori c54d7ca18c example measures in serialization test 2020-05-04 17:02:40 +02:00
Claudio Atzori 11938dac5e this commit adds: validated/validationDate to relationships; measure type and simple unit test to indicate the relative serialization 2020-05-04 16:47:07 +02:00
Claudio Atzori 24d8d097b6 sync with master branch 2020-05-04 16:44:13 +02:00
Claudio Atzori de5fbe325c bits of javadoc 2020-05-04 16:00:48 +02:00
Miriam Baglioni 78578c3ccf fixed wrong trnasition name in workflow 2020-05-04 15:46:24 +02:00
Miriam Baglioni cc7d9b6b19 merge upstream 2020-05-04 13:59:09 +02:00
Miriam Baglioni 3957c815b9 changed the name of some parameters 2020-05-04 13:58:52 +02:00