Commit Graph

499 Commits

Author SHA1 Message Date
Claudio Atzori 439c6255a2 cleanup 2020-04-29 19:09:07 +02:00
Claudio Atzori 77ac995770 cleaned up poms, added descriptions 2020-04-29 18:44:17 +02:00
Claudio Atzori 64d790a266 updated maven plugin dependencies 2020-04-29 16:56:18 +02:00
Claudio Atzori fe81f674ec updated maven-javadoc-plugin to v3.2.0, disabled doclint to avoid compilation to fail in case of incomplete javadoc tags 2020-04-29 16:19:57 +02:00
Claudio Atzori 0ab13b703b added LICENSE file - AGPL-3.0 2020-04-29 16:11:17 +02:00
Claudio Atzori 8fd81e863d added default value for the external_stats_db_name 2020-04-29 15:36:24 +02:00
Claudio Atzori c6f3ff4462 stats workflow content relocated into common package; added <global> property definitions in stats workflow.xml 2020-04-29 14:29:27 +02:00
miconis e0d14fe4f8 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-04-29 13:02:53 +02:00
miconis 0352d3b0ba entity dumps in dedup compressed 2020-04-29 13:02:34 +02:00
Michele Artini c43b4c8962 formatting 2020-04-29 12:56:58 +02:00
Michele Artini a5d7007005 Fix relations in migration
Fix pom.xml in dhp-stats-update
2020-04-29 12:05:41 +02:00
Claudio Atzori 3616d0f88d Merge pull request 'Adding the stats workflow to the dnet-hadoop hierarchy' (#6) from spyros/dnet-hadoop:master into master
Integrating stats update workflow.
2020-04-29 10:35:02 +02:00
Claudio Atzori 964972d29a added data provision workflow definition WIP 2020-04-29 09:25:50 +02:00
miconis 62e467eb0c assertion numbers updated to fit the new implementation of the pace-core 2020-04-28 11:46:23 +02:00
Claudio Atzori 6f5b899038 reformatted code according to the updated style descriptor 2020-04-28 11:23:29 +02:00
Claudio Atzori e6d68d1364 added customised style for automatic code formatting, introduced automatic import sorting plugin net.revelc.code:impsort-maven-plugin 2020-04-28 11:09:50 +02:00
Claudio Atzori ac25f2d8d1 integrated changes from master 2020-04-28 08:55:28 +02:00
Claudio Atzori a0bdbacdae switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin 2020-04-27 14:52:31 +02:00
Claudio Atzori d3fd05e3c5 switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin 2020-04-27 14:52:23 +02:00
Claudio Atzori 7a3f8085f7 switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin 2020-04-27 14:45:40 +02:00
Michele Artini 1260d03eba skip empty projects 2020-04-27 13:51:13 +02:00
Claudio Atzori fad94c2155 updated dependency dnet-pace-core to version 4.0.1 to include 7bc00a3f5f 2020-04-24 16:47:10 +02:00
Claudio Atzori 268462623a refined definition of equals and hash methods for Oaf model classes, now based on entity identifier, while relations consider sourceid, targetid and relationship semantic; Factored out function to group Oaf objects in grouping operations; Raw graph creation procedure merges entities and relationships providing the same identity 2020-04-24 14:42:01 +02:00
Claudio Atzori a3e480d1c9 implmented DispatchEntitiesApplication using spark2 datasets 2020-04-24 14:36:53 +02:00
Claudio Atzori 48157e0fc4 GraphHiveImporterJob moved in dedicate package 2020-04-24 14:32:28 +02:00
Claudio Atzori 5100527400 added default value for resulttype field 2020-04-23 19:14:37 +02:00
Claudio Atzori 278fc9d276 code formatting 2020-04-23 18:51:38 +02:00
miconis 5414236644 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-04-23 18:17:23 +02:00
miconis 8d258c85ff spark dedup test fixed, sample for dataset and orp added, test implemented 2020-04-23 18:16:20 +02:00
Michele Artini 072eae3803 fixed a problem with missing contexts 2020-04-23 16:42:49 +02:00
Michele Artini b164d96874 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-04-23 16:19:16 +02:00
Michele Artini d920ce501e fixed a problem with missing instances 2020-04-23 16:18:40 +02:00
Claudio Atzori 8851050814 replaced hive_db_name with hiveDbName 2020-04-23 08:36:40 +02:00
Claudio Atzori 91f81107b1 applying code formatting 2020-04-23 07:52:32 +02:00
Claudio Atzori 1e7583c5a6 filtered invisible records in data provision workflow 2020-04-23 07:51:34 +02:00
Claudio Atzori 9ddafd46ca fixed dedup record id prefix, set the correct dataInfo in the DedupRecordFactory 2020-04-23 07:50:18 +02:00
Claudio Atzori ade4cb97af fixed parameters passed to the postprocessing action in the workflow mapping the graph as hive DB 2020-04-22 18:24:06 +02:00
Claudio Atzori ba4339f142 excluded org.apache.hadoop:hadoop-common from the dnet-actionmanager-common dependency to avoid multiple transitive jaxb-impl versions to conflict when instantiating the ISLookup client stub 2020-04-22 14:23:09 +02:00
Claudio Atzori e81960335c Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-04-22 10:46:37 +02:00
Michele Artini 9e4d58f505 ResultType 2020-04-22 10:07:26 +02:00
Claudio Atzori c891661822 small adjustments in the graph2hive workflow 2020-04-21 18:52:23 +02:00
Claudio Atzori 0b55795d4d small adjustments in the provisioning workflow 2020-04-21 16:15:04 +02:00
Claudio Atzori 88fbb3a353 added sparkSqlWarehouseDir to the default extra spark options passed to each workflow 2020-04-21 16:13:43 +02:00
Claudio Atzori cd320efa96 added extra spark options to graph to hive workflow 2020-04-21 16:12:20 +02:00
Claudio Atzori 91e72a6944 Dataset based implementation for SparkCreateDedupRecord phase, fixed datasource entity dump supplementing dedup unit tests 2020-04-21 12:06:08 +02:00
miconis 5c9ef08a8e spark dedup test fixed 2020-04-21 10:19:04 +02:00
Claudio Atzori d772d967aa restored changes from master branch 2020-04-20 18:53:06 +02:00
Claudio Atzori eb8a020859 fixed behaviour of DedupRecordFactory 2020-04-20 18:44:06 +02:00
Claudio Atzori ede1af3d85 Merge branch 'master' into deduptesting 2020-04-20 16:52:14 +02:00
miconis 1102e32462 SparkDedupTest updated and organization dump fixed 2020-04-20 16:49:01 +02:00