Commit Graph

375 Commits

Author SHA1 Message Date
Claudio Atzori 268462623a refined definition of equals and hash methods for Oaf model classes, now based on entity identifier, while relations consider sourceid, targetid and relationship semantic; Factored out function to group Oaf objects in grouping operations; Raw graph creation procedure merges entities and relationships providing the same identity 2020-04-24 14:42:01 +02:00
Claudio Atzori a3e480d1c9 implmented DispatchEntitiesApplication using spark2 datasets 2020-04-24 14:36:53 +02:00
Claudio Atzori 48157e0fc4 GraphHiveImporterJob moved in dedicate package 2020-04-24 14:32:28 +02:00
Claudio Atzori 278fc9d276 code formatting 2020-04-23 18:51:38 +02:00
miconis 5414236644 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-04-23 18:17:23 +02:00
miconis 8d258c85ff spark dedup test fixed, sample for dataset and orp added, test implemented 2020-04-23 18:16:20 +02:00
Michele Artini 072eae3803 fixed a problem with missing contexts 2020-04-23 16:42:49 +02:00
Michele Artini b164d96874 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-04-23 16:19:16 +02:00
Michele Artini d920ce501e fixed a problem with missing instances 2020-04-23 16:18:40 +02:00
Claudio Atzori 8851050814 replaced hive_db_name with hiveDbName 2020-04-23 08:36:40 +02:00
Claudio Atzori 91f81107b1 applying code formatting 2020-04-23 07:52:32 +02:00
Claudio Atzori 1e7583c5a6 filtered invisible records in data provision workflow 2020-04-23 07:51:34 +02:00
Claudio Atzori 9ddafd46ca fixed dedup record id prefix, set the correct dataInfo in the DedupRecordFactory 2020-04-23 07:50:18 +02:00
Claudio Atzori ade4cb97af fixed parameters passed to the postprocessing action in the workflow mapping the graph as hive DB 2020-04-22 18:24:06 +02:00
Claudio Atzori e81960335c Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-04-22 10:46:37 +02:00
Michele Artini 9e4d58f505 ResultType 2020-04-22 10:07:26 +02:00
Claudio Atzori c891661822 small adjustments in the graph2hive workflow 2020-04-21 18:52:23 +02:00
Claudio Atzori 0b55795d4d small adjustments in the provisioning workflow 2020-04-21 16:15:04 +02:00
Claudio Atzori 88fbb3a353 added sparkSqlWarehouseDir to the default extra spark options passed to each workflow 2020-04-21 16:13:43 +02:00
Claudio Atzori cd320efa96 added extra spark options to graph to hive workflow 2020-04-21 16:12:20 +02:00
Claudio Atzori 91e72a6944 Dataset based implementation for SparkCreateDedupRecord phase, fixed datasource entity dump supplementing dedup unit tests 2020-04-21 12:06:08 +02:00
miconis 5c9ef08a8e spark dedup test fixed 2020-04-21 10:19:04 +02:00
Claudio Atzori d772d967aa restored changes from master branch 2020-04-20 18:53:06 +02:00
Claudio Atzori eb8a020859 fixed behaviour of DedupRecordFactory 2020-04-20 18:44:06 +02:00
Claudio Atzori ede1af3d85 Merge branch 'master' into deduptesting 2020-04-20 16:52:14 +02:00
miconis 1102e32462 SparkDedupTest updated and organization dump fixed 2020-04-20 16:49:01 +02:00
Claudio Atzori 667d23c58b finalising Actionset migration workflow 2020-04-20 16:45:21 +02:00
miconis 4da13e4570 Revert "Merge branch 'master' into deduptesting"
This reverts commit 772f75d167, reversing
changes made to 5f45f2c77f.
2020-04-20 16:04:49 +02:00
Claudio Atzori 9147af7fed actionsets migration workflow moved in dhp-workflows/dhp-actionmanager 2020-04-20 15:24:33 +02:00
miconis 772f75d167 Merge branch 'master' into deduptesting 2020-04-20 14:50:12 +02:00
Claudio Atzori d714bfb4d4 collectedfrom field moved in common parent class Oaf.java 2020-04-20 12:25:19 +02:00
Michele Artini 8ff7facfa3 fixed collectedFrom ID 2020-04-20 11:09:27 +02:00
Michele Artini 25307965d2 add a default datainfo if missing 2020-04-20 09:43:27 +02:00
Michele Artini d2058fdc47 tests 2020-04-20 09:31:14 +02:00
Michele Artini 478a958f09 tests 2020-04-20 09:15:27 +02:00
Claudio Atzori 5f45f2c77f Merge branch 'master' into deduptesting 2020-04-18 12:46:40 +02:00
Claudio Atzori ad7a131b18 introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin, applied to each java class in the project 2020-04-18 12:42:58 +02:00
Claudio Atzori a2938dd059 cleanup 2020-04-18 12:24:22 +02:00
Claudio Atzori 9374ff03ea Merge branch 'master' into deduptesting 2020-04-18 12:06:58 +02:00
Claudio Atzori 71813795f6 various refactorings on the dnet-dedup-openaire workflow 2020-04-18 12:06:23 +02:00
miconis 6450bb0daa test for softwares dedup added. definition of orp, dataset and sw dedup configurations 2020-04-17 17:31:59 +02:00
Claudio Atzori 038ac7afd7 relation consistency workflow separated from dedup scan and creation of CCs 2020-04-17 13:12:44 +02:00
Claudio Atzori c92bfeeaee Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-04-17 13:07:52 +02:00
Sandro La Bruzzo 01ea7721f3 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-04-17 12:12:25 +02:00
Sandro La Bruzzo 5e2fa996aa fixed problem with conversion of long into string 2020-04-17 12:11:51 +02:00
miconis 418cf94642 implementation of the deletedbyinference test in propagating relations 2020-04-17 10:40:21 +02:00
Claudio Atzori cb0952428e Merge branch 'master' into deduptesting 2020-04-16 14:42:25 +02:00
Claudio Atzori cc21bbfb1a Merge branch 'deduptesting' of https://code-repo.d4science.org/D-Net/dnet-hadoop into deduptesting 2020-04-16 14:41:37 +02:00
Claudio Atzori ec5dfc068d added spark.sql.shuffle.partitions=3840 to dedup scan wf 2020-04-16 14:41:28 +02:00
Claudio Atzori 09f356b047 Merge pull request 'Closes #7: subdirs inside graph table dirs' (#8) from przemyslaw.jacewicz/dnet-hadoop:przemyslawjacewicz_7_distcp_configuration_fix into master
Run the code from this PR in isolation and it worked fine. Thanks!
2020-04-16 14:38:46 +02:00