Commit Graph

367 Commits

Author SHA1 Message Date
Przemysław Jacewicz 2e996d610f WIP: promote job functions implementation snapshot 2020-03-11 17:02:57 +01:00
Przemysław Jacewicz cc63cdc9e6 WIP: promote job implementation snapshot 2020-03-11 17:02:06 +01:00
Przemysław Jacewicz 69540f6f78 Serialization-safe supplier added 2020-03-11 16:59:05 +01:00
Przemysław Jacewicz e6e214dab5 Oaf merge and get strategy added 2020-03-11 16:58:17 +01:00
Przemysław Jacewicz f7454a9ed8 Added equals and hashCode for OAF types 2020-03-11 16:57:28 +01:00
Claudio Atzori 7b6f0c8756 reading graph dump as text files, encoded as newline-delimited JSON records, as indicated in the wiki 2020-03-10 17:19:17 +01:00
Claudio Atzori 60aedb1110 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-03-10 17:09:44 +01:00
Claudio Atzori a3f184fd3f added field websiteurl in related organizations 2020-03-10 17:08:58 +01:00
Claudio Atzori 0e95544495 fixed serialization for datasource subjects 2020-03-10 17:07:44 +01:00
Sandro La Bruzzo 7b28783fb4 updated unpaywall mapping 2020-03-08 17:00:19 +01:00
Michele Artini b6efa9d6ab Configuration of the SequenceFile Writer 2020-03-05 15:49:14 +01:00
Claudio Atzori ccb153de78 updated image 2020-03-05 15:11:42 +01:00
Claudio Atzori 5e342a555c no need to compute the inverse relClass, fixed text() in xpath expressions 2020-03-05 12:51:48 +01:00
Claudio Atzori 6ec04d4e02 specified column used to perform the join operation in the javadoc 2020-03-05 12:50:38 +01:00
Claudio Atzori 960619de98 updated image 2020-03-04 16:51:55 +01:00
Claudio Atzori e89aa52e58 updated image 2020-03-04 16:18:49 +01:00
Claudio Atzori 5474e8ac9f Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-03-04 14:54:46 +01:00
Claudio Atzori d7137e566e added dhp-doc-resources, aimed to include all the documentation resources used in the wiki pages 2020-03-04 14:54:41 +01:00
Michele Artini 7a2a466161 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-03-04 14:50:59 +01:00
Michele Artini 755eade2fb fix creation ids 2020-03-04 14:49:45 +01:00
Claudio Atzori 6379f32466 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-03-04 10:57:06 +01:00
Claudio Atzori 0233987603 introduced post processing step following the hive DB creation/population 2020-03-04 10:56:50 +01:00
Claudio Atzori 1e563bc15e introduced distinct properties driving the resouce usage for the XML record creation and the indexing phase 2020-03-04 10:55:11 +01:00
Claudio Atzori 9af3e904be close the SparkSession at the end 2020-03-04 10:53:31 +01:00
Michele Artini 086af63158 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-03-04 10:46:40 +01:00
Michele Artini e7167b996a logs and closeable 2020-03-04 10:46:36 +01:00
Claudio Atzori 25ceec29ab code formatting 2020-03-04 10:44:24 +01:00
Claudio Atzori 63c00c5e88 fixed typo 2020-03-04 10:43:44 +01:00
Claudio Atzori 9cf5ce2e66 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-03-02 17:03:10 +01:00
Claudio Atzori bc7cfd5975 indexing workflow WIP: fixed projects fundingtree xml conversion, prioritized links between results and projects when limiting them to 100 in the join procedure 2020-03-02 17:03:07 +01:00
Michele Artini 4b29a121b0 migration using spark in step2 2020-03-02 16:12:14 +01:00
Michele Artini 5445a57102 migration using spark in step2 2020-03-02 16:11:59 +01:00
Sandro La Bruzzo b32655e48e changed code to save intermediate result 2020-02-27 10:18:46 +01:00
Claudio Atzori 60bc2b1a20 drop the hive DB before populating it from scratch 2020-02-27 10:10:55 +01:00
Sandro La Bruzzo f09e065865 incremented number of repartition 2020-02-26 19:26:19 +01:00
Sandro La Bruzzo 071f5c3e52 fixed NPE 2020-02-26 15:42:20 +01:00
Sandro La Bruzzo a1a6fc8315 fixed NPE 2020-02-26 15:42:13 +01:00
Sandro La Bruzzo 1edf02a3ce added log 2020-02-26 15:25:03 +01:00
Sandro La Bruzzo c3ecabd8e8 fixed NPE 2020-02-26 14:40:02 +01:00
Sandro La Bruzzo 5d0f46651b fixed NPE 2020-02-26 14:31:34 +01:00
Sandro La Bruzzo bc342bf73a fixed wrong generation type in summary 2020-02-26 12:49:47 +01:00
Sandro La Bruzzo 3112e21858 fixed typo 2020-02-26 12:22:43 +01:00
Sandro La Bruzzo 119ae6eef5 fixed wrong loop in the workflow 2020-02-26 12:18:50 +01:00
Sandro La Bruzzo 7936583a3d added generation of Scholix collection 2020-02-26 12:09:06 +01:00
Przemysław Jacewicz 02db368dc5 Merge branch 'master' into przemyslawjacewicz_actionmanager_impl_prototype 2020-02-26 11:50:20 +01:00
Sandro La Bruzzo 2ef3705b2c Added Provision workflow 2020-02-26 10:51:35 +01:00
Michele Artini 689908b2e9 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-02-25 16:00:51 +01:00
Michele Artini 93665773ea Fixed a problem with JavaRDD Union 2020-02-25 15:59:21 +01:00
Sandro La Bruzzo b021b8a2e1 Added index wf 2020-02-24 10:15:55 +01:00
Claudio Atzori 6a73fd5da5 in order to reuse the same XmlRecordFactory across different tasks, the state of contexts must be one per record built 2020-02-21 09:17:19 +01:00