Commit Graph

106 Commits

Author SHA1 Message Date
Claudio Atzori ff30f99c65 using newline delimited json files for the raw graph materialization. Introduced contentPath parameter 2020-04-15 16:16:20 +02:00
Alessia Bardi 550a9f82ed Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-04-14 17:53:01 +02:00
Alessia Bardi a68fae9bcb now supporting openaire 4.0 compliance 2020-04-14 17:52:48 +02:00
Sandro La Bruzzo c36239e693 fixed incremental indexing 2020-04-14 17:47:36 +02:00
Claudio Atzori 82e8341f50 reorganizing parameter names in the provision workflow 2020-04-14 15:54:41 +02:00
Claudio Atzori 6b5f9ca9cb raw graph creation workflow moved under dhp-graph-mapper, claims integration is included 2020-04-10 17:53:07 +02:00
Claudio Atzori 47f3d9b757 unit test for GraphHiveImporterJob 2020-04-08 13:24:43 +02:00
Claudio Atzori d74e128aa6 Utility classes moved in dhp-common and dhp-schemas 2020-04-07 11:56:22 +02:00
Claudio Atzori 377e1ba840 [maven-release-plugin] prepare for next development iteration 2020-03-30 20:06:00 +02:00
Claudio Atzori 76d9315129 [maven-release-plugin] prepare release dhp-1.1.6 2020-03-30 20:05:56 +02:00
Claudio Atzori ef429010ee removed log file and job-override.properties 2020-03-30 20:00:58 +02:00
Sandro La Bruzzo 62cc257e5c fixed step1 workflow 2020-03-27 17:07:34 +01:00
Claudio Atzori 1767dfaa3f method can be protected, it is meant to be used only in tests 2020-03-27 14:31:26 +01:00
Sandro La Bruzzo a4b6a51168 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-03-27 13:48:56 +01:00
Sandro La Bruzzo 15d9106b3f FIxed merge of dhp dedup 2020-03-27 13:48:44 +01:00
Claudio Atzori e196fff212 adjusted path for source resource in unit test 2020-03-27 13:45:10 +01:00
Sandro La Bruzzo 8c9a56a0c8 refactored package name 2020-03-27 13:19:33 +01:00
Sandro La Bruzzo a9935f80d4 refactor class name and workflow name for graph mapper, added javadoc 2020-03-27 13:16:24 +01:00
Claudio Atzori 673e744649 moved openaire specific implementations under dedicated package eu.dnetlib.dhp.oa 2020-03-27 10:42:17 +01:00
Claudio Atzori 098fabab3f reorganizing content under dhp-workflows/dhp-graph-mapper 2020-03-26 19:44:19 +01:00
Claudio Atzori 77c4294924 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-03-26 18:26:52 +01:00
Claudio Atzori 43cbcda7ef unit test for SparkGraphImporterJob 2020-03-26 18:26:40 +01:00
Sandro La Bruzzo 0cd022ad6a merge with master 2020-03-26 14:08:29 +01:00
Claudio Atzori abcd3f5bf5 added sample data for unit tests 2020-03-26 11:12:52 +01:00
Claudio Atzori 9dff4adbc3 dhp-graph-mapper workflow tests upgraded to junit5 2020-03-25 18:25:12 +01:00
Michele Artini ebe45003d9 fixed some junit packages 2020-03-25 16:45:03 +01:00
Claudio Atzori 2180cc4fe7 more fields included in result view definition 2020-03-25 11:21:46 +01:00
Claudio Atzori 8b0ba3d76a posprocessing script correctly run as hive2 action 2020-03-23 17:40:39 +01:00
Claudio Atzori 658d40ccbe WIP trying to use hive2 actions 2020-03-23 11:14:54 +01:00
Sandro La Bruzzo 0594b92a6d implemented relation with dataset 2020-03-19 11:11:07 +01:00
Claudio Atzori abe8fb69a2 added global properties, moved postprocessing script inside the oozie_app directory 2020-03-18 15:43:54 +01:00
Claudio Atzori 8fe7ae1482 xml formatting 2020-03-13 15:53:56 +01:00
Sandro La Bruzzo addaaa091f migrate relation from RDD to Dataset 2020-03-13 09:13:20 +01:00
Claudio Atzori 7b6f0c8756 reading graph dump as text files, encoded as newline-delimited JSON records, as indicated in the wiki 2020-03-10 17:19:17 +01:00
Claudio Atzori 0233987603 introduced post processing step following the hive DB creation/population 2020-03-04 10:56:50 +01:00
Claudio Atzori 9af3e904be close the SparkSession at the end 2020-03-04 10:53:31 +01:00
Claudio Atzori 25ceec29ab code formatting 2020-03-04 10:44:24 +01:00
Claudio Atzori 60bc2b1a20 drop the hive DB before populating it from scratch 2020-02-27 10:10:55 +01:00
Sandro La Bruzzo 2b8675462f refactoring code 2020-02-19 10:07:08 +01:00
Claudio Atzori 1b18fd4d54 sync with master branch 2020-02-17 13:49:46 +01:00
Sandro La Bruzzo 76ee85141a added oozie job for DNET migration and implemented Spark job for extracting entities 2020-02-17 12:31:44 +01:00
Michele Artini 176c5606bd aligned with origin/master, aligned model and mapping 2020-02-17 10:40:53 +01:00
Claudio Atzori 1ee1baa8c0 Merge branch 'master' into provision_indexing 2020-02-13 18:17:07 +01:00
Claudio Atzori a3d0b57b25 [maven-release-plugin] prepare for next development iteration 2020-02-13 18:11:33 +01:00
Claudio Atzori 6ed9a15bc8 [maven-release-plugin] prepare release dhp-1.1.5 2020-02-13 18:11:31 +01:00
Claudio Atzori 49e648f7c3 bumped version 2020-02-13 18:09:31 +01:00
Claudio Atzori 1fee6e2b7e implemented XML records construction and serialization, indexing WIP 2020-02-13 16:53:27 +01:00
Sandro La Bruzzo 19a80e4638 implemented workfow for aggregation and generation of infospace graph 2020-01-24 09:58:55 +01:00
Michele Artini b35c59eb42 partial implementation of entities from db 2020-01-20 16:04:19 +01:00
Sandro La Bruzzo abd9034da0 implemented DedupRecord factory with the merge of publications 2019-12-11 15:43:24 +01:00