Commit Graph

1058 Commits

Author SHA1 Message Date
Claudio Atzori de108f54d6 code formatting 2020-05-23 10:21:19 +02:00
Claudio Atzori 6b56cae57d added mapping for bestaccessrights 2020-05-23 09:57:39 +02:00
Claudio Atzori 7181807e64 code formatting 2020-05-23 09:51:48 +02:00
Sandro La Bruzzo 2408083566 implemented filtering step 2020-05-23 08:46:49 +02:00
Sandro La Bruzzo 244f6e50cf Merge remote-tracking branch 'origin/master' into doiboost 2020-05-22 20:52:15 +02:00
Sandro La Bruzzo 147dd389bf minor fix 2020-05-22 20:51:42 +02:00
Miriam Baglioni 0d1ec1913f added fix to avoid duplication of results 2020-05-22 18:42:25 +02:00
miconis 5d7ac78c41 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-22 17:25:08 +02:00
miconis 0fd0c7d725 reimplementation of the sim between two authors. now it takes into account both name and surname. threshold incremented to 1.0 if the name is too short 2020-05-22 17:24:57 +02:00
Michele Artini eb606dc1e2 partial implementation of events with rels 2020-05-22 17:17:41 +02:00
Miriam Baglioni 29066a6b46 applied code cleanup 2020-05-22 15:38:50 +02:00
Miriam Baglioni 8610ad5142 added groupby id to fix multiple result with same id at join step 2020-05-22 15:32:55 +02:00
Miriam Baglioni 1e44703e3e merge upstream 2020-05-22 15:30:07 +02:00
Sandro La Bruzzo 72278b9375 Merge remote-tracking branch 'origin/master' into doiboost 2020-05-22 15:17:13 +02:00
Sandro La Bruzzo 22936d0877 Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost 2020-05-22 15:15:17 +02:00
Sandro La Bruzzo 9fbb221457 completed mapping of UnpayWall and ORCID 2020-05-22 15:15:09 +02:00
Miriam Baglioni 70389b0a30 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-22 13:53:23 +02:00
Miriam Baglioni 4308f31165 added fix to make test run 2020-05-22 13:13:01 +02:00
Claudio Atzori 946598cfba Merge branch 'master' into provision_indexing 2020-05-22 12:35:41 +02:00
Claudio Atzori 3cf2796ac6 code formatting 2020-05-22 12:34:00 +02:00
Michele Artini dc4621b3cb filter ORCID e MAG identifiers 2020-05-22 12:25:01 +02:00
Michele Artini 9f2d0f1b08 filter ORCID e MAG identifiers 2020-05-22 11:00:27 +02:00
Michele Artini 9de71e54a8 filter ORCID e MAG identifiers 2020-05-22 10:47:39 +02:00
Michele Artini c5f7e17348 author fullnames 2020-05-22 10:08:02 +02:00
Claudio Atzori ad40470040 Merge branch 'master' into provision_indexing 2020-05-22 08:51:22 +02:00
Claudio Atzori 925d933204 making XmlRecordFactory immune to graph encoding changes (mostly to avoid NPEs) 2020-05-22 08:50:44 +02:00
Claudio Atzori b33dd58be4 replaced parameter 'reuseRecords' with 'resumeFrom', allowing to restart the provision workflow execution from any step, useful for manual submissions or debugging 2020-05-22 08:50:06 +02:00
Michele Artini c7ca3cf35b Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-21 16:48:20 +02:00
Michele Artini 3e34517479 partial implementation of events with rels 2020-05-21 16:47:53 +02:00
Miriam Baglioni 6750075fbd merge upstream 2020-05-21 16:31:09 +02:00
miconis 8b35e0e7f0 reimplementation of the author merging in deduprecord creation. implementation of the test class. minor changes 2020-05-21 12:02:44 +02:00
miconis 8bbd1d0501 reimplementation of the author merging in deduprecord creation. implementation of the test class. 2020-05-21 11:52:14 +02:00
Michele Artini e43d4d7778 added a coalesce in sql query 2020-05-21 11:08:07 +02:00
Claudio Atzori dbfb9c19fe minor changes 2020-05-21 10:00:14 +02:00
Michele Artini b3bcbb3129 resolve name of organization countries 2020-05-21 08:41:32 +02:00
Enrico Ottonello 1109d3b3fc Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost 2020-05-21 00:41:27 +02:00
Enrico Ottonello 869a53040e save to text file format 2020-05-21 00:41:21 +02:00
Sandro La Bruzzo 5818abaab4 fixed Crossref Mapping 2020-05-20 17:05:46 +02:00
Claudio Atzori da4267d0fe Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-05-20 14:58:22 +02:00
Claudio Atzori d7d2a0637f added extra parameters to the provision indexing workflow 2020-05-20 14:55:38 +02:00
Miriam Baglioni 76f3f73caa merge upstream 2020-05-20 10:31:40 +02:00
Sandro La Bruzzo b771d67e9d next step of MAG conversion implemented 2020-05-20 08:14:03 +02:00
Michele Artini 85ca5622d4 partial implementation of generation of simple events 2020-05-19 16:17:35 +02:00
Claudio Atzori 0bdfbb0a57 reintroduced RDD based relation cut off procedure 2020-05-19 15:02:21 +02:00
Enrico Ottonello 934ad570e0 joined summaries and activities dataset 2020-05-19 12:57:21 +02:00
Enrico Ottonello ca722d4d18 merged 2020-05-19 09:43:12 +02:00
Enrico Ottonello 7362bc3e9d workflow to generate seq(doi,AuthorList) 2020-05-19 09:34:44 +02:00
Sandro La Bruzzo 8c95b50f26 Merge remote-tracking branch 'origin/master' into doiboost 2020-05-19 09:25:04 +02:00
Sandro La Bruzzo 486e850bcc next step of MAG conversion implemented 2020-05-19 09:24:45 +02:00
Enrico Ottonello d4e9075f22 Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost 2020-05-18 19:51:36 +02:00