Commit Graph

407 Commits

Author SHA1 Message Date
Claudio Atzori 17860d3ab6 general changes in the RAW graph mapping: missing collectedfrom/hostedby causes records to be skipped; factored out most of the constants in ModelConstants class (dhp-schemas) 2020-05-06 13:20:02 +02:00
Claudio Atzori fdfecc9578 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-05-06 11:28:01 +02:00
Claudio Atzori c79e2f5977 drop workingPath before starting the dedup workflow 2020-05-06 11:27:44 +02:00
Michele Artini 8f30a09d84 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-05 17:12:22 +02:00
Michele Artini ccc609f909 new module for the production of broker events 2020-05-05 17:09:00 +02:00
Claudio Atzori 0825321d0b improved unit tests in dhp-aggregation 2020-05-05 12:39:04 +02:00
Claudio Atzori 4a8487165c using long param names in wf definition 2020-05-04 19:19:29 +02:00
Claudio Atzori a2fc37df5f adjusted parameters 2020-05-04 19:18:59 +02:00
Claudio Atzori f1b7e14036 code formatting 2020-05-04 19:18:34 +02:00
miconis 085cf173d7 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-05-04 12:08:20 +02:00
miconis 3df703f67d mergerels added to propagate relations 2020-05-04 12:08:12 +02:00
Claudio Atzori bac37b3973 fixed children expansion in XML records 2020-05-04 11:51:17 +02:00
Claudio Atzori 077ccd8743 stats wf properties cleanup 2020-05-04 11:41:46 +02:00
Michele Artini eb9bd42970 fixed a problem with journals 2020-04-30 11:06:05 +02:00
Michele Artini a0a6109bbc fixed a problem with journals 2020-04-30 11:03:46 +02:00
Claudio Atzori 439c6255a2 cleanup 2020-04-29 19:09:07 +02:00
Claudio Atzori 77ac995770 cleaned up poms, added descriptions 2020-04-29 18:44:17 +02:00
Claudio Atzori 8fd81e863d added default value for the external_stats_db_name 2020-04-29 15:36:24 +02:00
Claudio Atzori c6f3ff4462 stats workflow content relocated into common package; added <global> property definitions in stats workflow.xml 2020-04-29 14:29:27 +02:00
miconis e0d14fe4f8 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-04-29 13:02:53 +02:00
miconis 0352d3b0ba entity dumps in dedup compressed 2020-04-29 13:02:34 +02:00
Michele Artini c43b4c8962 formatting 2020-04-29 12:56:58 +02:00
Michele Artini a5d7007005 Fix relations in migration
Fix pom.xml in dhp-stats-update
2020-04-29 12:05:41 +02:00
Claudio Atzori 3616d0f88d Merge pull request 'Adding the stats workflow to the dnet-hadoop hierarchy' (#6) from spyros/dnet-hadoop:master into master
Integrating stats update workflow.
2020-04-29 10:35:02 +02:00
Claudio Atzori 964972d29a added data provision workflow definition WIP 2020-04-29 09:25:50 +02:00
miconis 62e467eb0c assertion numbers updated to fit the new implementation of the pace-core 2020-04-28 11:46:23 +02:00
Claudio Atzori 6f5b899038 reformatted code according to the updated style descriptor 2020-04-28 11:23:29 +02:00
Claudio Atzori ac25f2d8d1 integrated changes from master 2020-04-28 08:55:28 +02:00
Claudio Atzori a0bdbacdae switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin 2020-04-27 14:52:31 +02:00
Claudio Atzori 7a3f8085f7 switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin 2020-04-27 14:45:40 +02:00
Michele Artini 1260d03eba skip empty projects 2020-04-27 13:51:13 +02:00
Claudio Atzori 268462623a refined definition of equals and hash methods for Oaf model classes, now based on entity identifier, while relations consider sourceid, targetid and relationship semantic; Factored out function to group Oaf objects in grouping operations; Raw graph creation procedure merges entities and relationships providing the same identity 2020-04-24 14:42:01 +02:00
Claudio Atzori a3e480d1c9 implmented DispatchEntitiesApplication using spark2 datasets 2020-04-24 14:36:53 +02:00
Claudio Atzori 48157e0fc4 GraphHiveImporterJob moved in dedicate package 2020-04-24 14:32:28 +02:00
Claudio Atzori 278fc9d276 code formatting 2020-04-23 18:51:38 +02:00
miconis 5414236644 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-04-23 18:17:23 +02:00
miconis 8d258c85ff spark dedup test fixed, sample for dataset and orp added, test implemented 2020-04-23 18:16:20 +02:00
Michele Artini 072eae3803 fixed a problem with missing contexts 2020-04-23 16:42:49 +02:00
Michele Artini b164d96874 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-04-23 16:19:16 +02:00
Michele Artini d920ce501e fixed a problem with missing instances 2020-04-23 16:18:40 +02:00
Claudio Atzori 8851050814 replaced hive_db_name with hiveDbName 2020-04-23 08:36:40 +02:00
Claudio Atzori 91f81107b1 applying code formatting 2020-04-23 07:52:32 +02:00
Claudio Atzori 1e7583c5a6 filtered invisible records in data provision workflow 2020-04-23 07:51:34 +02:00
Claudio Atzori 9ddafd46ca fixed dedup record id prefix, set the correct dataInfo in the DedupRecordFactory 2020-04-23 07:50:18 +02:00
Claudio Atzori ade4cb97af fixed parameters passed to the postprocessing action in the workflow mapping the graph as hive DB 2020-04-22 18:24:06 +02:00
Claudio Atzori e81960335c Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-04-22 10:46:37 +02:00
Michele Artini 9e4d58f505 ResultType 2020-04-22 10:07:26 +02:00
Claudio Atzori c891661822 small adjustments in the graph2hive workflow 2020-04-21 18:52:23 +02:00
Claudio Atzori 0b55795d4d small adjustments in the provisioning workflow 2020-04-21 16:15:04 +02:00
Claudio Atzori 88fbb3a353 added sparkSqlWarehouseDir to the default extra spark options passed to each workflow 2020-04-21 16:13:43 +02:00