Commit Graph

342 Commits

Author SHA1 Message Date
Claudio Atzori 0233987603 introduced post processing step following the hive DB creation/population 2020-03-04 10:56:50 +01:00
Claudio Atzori 1e563bc15e introduced distinct properties driving the resouce usage for the XML record creation and the indexing phase 2020-03-04 10:55:11 +01:00
Claudio Atzori 9af3e904be close the SparkSession at the end 2020-03-04 10:53:31 +01:00
Michele Artini 086af63158 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-03-04 10:46:40 +01:00
Michele Artini e7167b996a logs and closeable 2020-03-04 10:46:36 +01:00
Claudio Atzori 25ceec29ab code formatting 2020-03-04 10:44:24 +01:00
Claudio Atzori 63c00c5e88 fixed typo 2020-03-04 10:43:44 +01:00
Claudio Atzori 9cf5ce2e66 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-03-02 17:03:10 +01:00
Claudio Atzori bc7cfd5975 indexing workflow WIP: fixed projects fundingtree xml conversion, prioritized links between results and projects when limiting them to 100 in the join procedure 2020-03-02 17:03:07 +01:00
Michele Artini 4b29a121b0 migration using spark in step2 2020-03-02 16:12:14 +01:00
Michele Artini 5445a57102 migration using spark in step2 2020-03-02 16:11:59 +01:00
Sandro La Bruzzo b32655e48e changed code to save intermediate result 2020-02-27 10:18:46 +01:00
Claudio Atzori 60bc2b1a20 drop the hive DB before populating it from scratch 2020-02-27 10:10:55 +01:00
Sandro La Bruzzo f09e065865 incremented number of repartition 2020-02-26 19:26:19 +01:00
Sandro La Bruzzo 071f5c3e52 fixed NPE 2020-02-26 15:42:20 +01:00
Sandro La Bruzzo a1a6fc8315 fixed NPE 2020-02-26 15:42:13 +01:00
Sandro La Bruzzo 1edf02a3ce added log 2020-02-26 15:25:03 +01:00
Sandro La Bruzzo c3ecabd8e8 fixed NPE 2020-02-26 14:40:02 +01:00
Sandro La Bruzzo 5d0f46651b fixed NPE 2020-02-26 14:31:34 +01:00
Sandro La Bruzzo bc342bf73a fixed wrong generation type in summary 2020-02-26 12:49:47 +01:00
Sandro La Bruzzo 3112e21858 fixed typo 2020-02-26 12:22:43 +01:00
Sandro La Bruzzo 119ae6eef5 fixed wrong loop in the workflow 2020-02-26 12:18:50 +01:00
Sandro La Bruzzo 7936583a3d added generation of Scholix collection 2020-02-26 12:09:06 +01:00
Sandro La Bruzzo 2ef3705b2c Added Provision workflow 2020-02-26 10:51:35 +01:00
Michele Artini 689908b2e9 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-02-25 16:00:51 +01:00
Michele Artini 93665773ea Fixed a problem with JavaRDD Union 2020-02-25 15:59:21 +01:00
Sandro La Bruzzo b021b8a2e1 Added index wf 2020-02-24 10:15:55 +01:00
Claudio Atzori 6a73fd5da5 in order to reuse the same XmlRecordFactory across different tasks, the state of contexts must be one per record built 2020-02-21 09:17:19 +01:00
Michele Artini 4c94e74a84 Added a missing dependency 2020-02-20 11:43:32 +01:00
Michele Artini d49cd2fdc6 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-02-20 11:21:54 +01:00
Claudio Atzori d42dde52ba implemented method to merge relations 2020-02-19 17:29:05 +01:00
Claudio Atzori 5e5e32cb48 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-02-19 16:56:52 +01:00
Claudio Atzori 33185fd0b7 ISLookupClientFactory moved in dhp-common 2020-02-19 16:56:38 +01:00
Michele Artini 5d3739b5cf migration of claims 2020-02-19 15:11:17 +01:00
Michele Artini 173f1df1e5 saved a query for openaire production database 2020-02-19 10:15:08 +01:00
Sandro La Bruzzo 9a2d74ac82 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-02-19 10:13:45 +01:00
Sandro La Bruzzo e5d7cdf422 fixed sql query 2020-02-19 10:13:36 +01:00
Sandro La Bruzzo 2b8675462f refactoring code 2020-02-19 10:07:08 +01:00
Claudio Atzori ed76521d9b removed stale test resources, will be re-added later on 2020-02-18 11:51:08 +01:00
Claudio Atzori 0f364605ff removed stale tests, need to reimplemente them anyway 2020-02-18 11:48:19 +01:00
Claudio Atzori 6a288625e5 fixed workflow outgoing node 2020-02-17 15:04:33 +01:00
Claudio Atzori 1b18fd4d54 sync with master branch 2020-02-17 13:49:46 +01:00
Claudio Atzori 5bae30f399 adding readme for dhp-schema 2020-02-17 13:38:33 +01:00
Sandro La Bruzzo 4f04759738 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-02-17 12:31:58 +01:00
Sandro La Bruzzo 76ee85141a added oozie job for DNET migration and implemented Spark job for extracting entities 2020-02-17 12:31:44 +01:00
Claudio Atzori c460e2d281 Aggiornare 'dhp-workflows/docs/oozie-installer.markdown' 2020-02-17 11:54:48 +01:00
Sandro La Bruzzo fe93c709f1 Merge branch 'master' of michele.artini/dnet-hadoop into master 2020-02-17 10:43:08 +01:00
Michele Artini 176c5606bd aligned with origin/master, aligned model and mapping 2020-02-17 10:40:53 +01:00
Claudio Atzori 56d1810a66 working procedure for records indexing using Spark, via lib com.lucidworks.spark:spark-solr 2020-02-14 12:28:52 +01:00
Claudio Atzori 1ee1baa8c0 Merge branch 'master' into provision_indexing 2020-02-13 18:17:07 +01:00