Commit Graph

1051 Commits

Author SHA1 Message Date
miconis 5d7ac78c41 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-22 17:25:08 +02:00
miconis 0fd0c7d725 reimplementation of the sim between two authors. now it takes into account both name and surname. threshold incremented to 1.0 if the name is too short 2020-05-22 17:24:57 +02:00
Michele Artini eb606dc1e2 partial implementation of events with rels 2020-05-22 17:17:41 +02:00
Miriam Baglioni 29066a6b46 applied code cleanup 2020-05-22 15:38:50 +02:00
Miriam Baglioni 8610ad5142 added groupby id to fix multiple result with same id at join step 2020-05-22 15:32:55 +02:00
Miriam Baglioni 1e44703e3e merge upstream 2020-05-22 15:30:07 +02:00
Sandro La Bruzzo 72278b9375 Merge remote-tracking branch 'origin/master' into doiboost 2020-05-22 15:17:13 +02:00
Sandro La Bruzzo 22936d0877 Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost 2020-05-22 15:15:17 +02:00
Sandro La Bruzzo 9fbb221457 completed mapping of UnpayWall and ORCID 2020-05-22 15:15:09 +02:00
Miriam Baglioni 70389b0a30 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-22 13:53:23 +02:00
Miriam Baglioni 4308f31165 added fix to make test run 2020-05-22 13:13:01 +02:00
Claudio Atzori 946598cfba Merge branch 'master' into provision_indexing 2020-05-22 12:35:41 +02:00
Claudio Atzori 3cf2796ac6 code formatting 2020-05-22 12:34:00 +02:00
Michele Artini dc4621b3cb filter ORCID e MAG identifiers 2020-05-22 12:25:01 +02:00
Michele Artini 9f2d0f1b08 filter ORCID e MAG identifiers 2020-05-22 11:00:27 +02:00
Michele Artini 9de71e54a8 filter ORCID e MAG identifiers 2020-05-22 10:47:39 +02:00
Michele Artini c5f7e17348 author fullnames 2020-05-22 10:08:02 +02:00
Claudio Atzori ad40470040 Merge branch 'master' into provision_indexing 2020-05-22 08:51:22 +02:00
Claudio Atzori 925d933204 making XmlRecordFactory immune to graph encoding changes (mostly to avoid NPEs) 2020-05-22 08:50:44 +02:00
Claudio Atzori b33dd58be4 replaced parameter 'reuseRecords' with 'resumeFrom', allowing to restart the provision workflow execution from any step, useful for manual submissions or debugging 2020-05-22 08:50:06 +02:00
Michele Artini c7ca3cf35b Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-21 16:48:20 +02:00
Michele Artini 3e34517479 partial implementation of events with rels 2020-05-21 16:47:53 +02:00
Miriam Baglioni 6750075fbd merge upstream 2020-05-21 16:31:09 +02:00
miconis 8b35e0e7f0 reimplementation of the author merging in deduprecord creation. implementation of the test class. minor changes 2020-05-21 12:02:44 +02:00
miconis 8bbd1d0501 reimplementation of the author merging in deduprecord creation. implementation of the test class. 2020-05-21 11:52:14 +02:00
Michele Artini e43d4d7778 added a coalesce in sql query 2020-05-21 11:08:07 +02:00
Claudio Atzori dbfb9c19fe minor changes 2020-05-21 10:00:14 +02:00
Michele Artini b3bcbb3129 resolve name of organization countries 2020-05-21 08:41:32 +02:00
Enrico Ottonello 1109d3b3fc Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost 2020-05-21 00:41:27 +02:00
Enrico Ottonello 869a53040e save to text file format 2020-05-21 00:41:21 +02:00
Sandro La Bruzzo 5818abaab4 fixed Crossref Mapping 2020-05-20 17:05:46 +02:00
Claudio Atzori da4267d0fe Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-05-20 14:58:22 +02:00
Claudio Atzori d7d2a0637f added extra parameters to the provision indexing workflow 2020-05-20 14:55:38 +02:00
Miriam Baglioni 76f3f73caa merge upstream 2020-05-20 10:31:40 +02:00
Sandro La Bruzzo b771d67e9d next step of MAG conversion implemented 2020-05-20 08:14:03 +02:00
Michele Artini 85ca5622d4 partial implementation of generation of simple events 2020-05-19 16:17:35 +02:00
Claudio Atzori 0bdfbb0a57 reintroduced RDD based relation cut off procedure 2020-05-19 15:02:21 +02:00
Enrico Ottonello 934ad570e0 joined summaries and activities dataset 2020-05-19 12:57:21 +02:00
Enrico Ottonello ca722d4d18 merged 2020-05-19 09:43:12 +02:00
Enrico Ottonello 7362bc3e9d workflow to generate seq(doi,AuthorList) 2020-05-19 09:34:44 +02:00
Sandro La Bruzzo 8c95b50f26 Merge remote-tracking branch 'origin/master' into doiboost 2020-05-19 09:25:04 +02:00
Sandro La Bruzzo 486e850bcc next step of MAG conversion implemented 2020-05-19 09:24:45 +02:00
Enrico Ottonello d4e9075f22 Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost 2020-05-18 19:51:36 +02:00
Enrico Ottonello fc80e8c7de added accumulator; last modified date of the record is added to saved data; lambda file is partitioned into 20 parts before starting downloading 2020-05-18 19:51:29 +02:00
Claudio Atzori f3bc8aed31 lifted memory requirements for country propagation wf 2020-05-18 15:29:10 +02:00
Miriam Baglioni b71fbb68b1 removed the removeOutputDir command from code. Reltions are written in Append. The erase of the output dir ment to remove all the relations computed in the prevoius steps 2020-05-18 13:57:20 +02:00
Miriam Baglioni 629af7cb79 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-18 13:07:36 +02:00
Claudio Atzori ef9a9a9f1a remove the outout path when starting 2020-05-15 22:34:19 +02:00
Enrico Ottonello 0b29bb7e3b spark job to download orcid record modified after a fixed date 2020-05-15 19:49:26 +02:00
Claudio Atzori 7838f2c63f init the empty list for author pids mapped from OAF 2020-05-15 17:06:01 +02:00