Commit Graph

3685 Commits

Author SHA1 Message Date
Sandro La Bruzzo 0594b92a6d implemented relation with dataset 2020-03-19 11:11:07 +01:00
Claudio Atzori 1850a02ae4 added simpler, AtomicAction replacement, based on the dhp.Oaf model 2020-03-19 10:44:16 +01:00
miconis 679b5869e5 implementation of the lookup procedure to take dedup conf from the resource profiles 2020-03-18 17:41:56 +01:00
Claudio Atzori abe8fb69a2 added global properties, moved postprocessing script inside the oozie_app directory 2020-03-18 15:43:54 +01:00
miconis f32eae5ce9 implementation of the spark action for the simrel creation 2020-03-18 14:27:49 +01:00
Claudio Atzori c7e0730720 compress the output produced by migration steps 1 and 2 2020-03-18 09:34:57 +01:00
Claudio Atzori 2f11e37602 fixed expansion of path variables 2020-03-17 19:41:07 +01:00
Claudio Atzori 2795b0b096 no need to mkdir a the all_entities file 2020-03-17 17:22:14 +01:00
Claudio Atzori 19746ad308 when reuseContent, reset ${workingPath}/all_entities 2020-03-17 17:17:06 +01:00
Claudio Atzori 2f0c85eeb3 updated parameters for regular_all_steps worfklow, introduced flag 'reuseContent' 2020-03-17 17:04:58 +01:00
Miriam Baglioni 67ea3cf3ed changed the way to read the file with info on resource or relation. From sequenceFile to textFile 2020-03-17 16:32:05 +01:00
Miriam Baglioni b4652d018c moved the creation of new dir to common class. 2020-03-17 16:31:24 +01:00
Claudio Atzori b8290b5851 updated parameters for regular_all_steps worfklow 2020-03-17 15:45:30 +01:00
Claudio Atzori 4706f24ec5 updated parameters for regular_all_steps worfklow 2020-03-17 15:23:54 +01:00
Claudio Atzori aeb01fa353 reading from newline delimited json textfiles instead of sequence files 2020-03-17 11:57:24 +01:00
Miriam Baglioni 92f4e0001d Merge branch 'bulktag' 2020-03-16 13:33:27 +01:00
Miriam Baglioni ab08a37024 Merge remote-tracking branch 'upstream/master' 2020-03-16 12:45:23 +01:00
Claudio Atzori af835f2f98 when migrating actionsets from DM cluster, populate the AtomicAction.targetValue when empty (dedup similarities) 2020-03-15 18:07:59 +01:00
Claudio Atzori 9c84e21b87 added workflow to migrate latest version of each actionset content from DM to OCEAN cluster, mapping the targetValues from the old protobuf data model to the dhp.OAF datamodel 2020-03-13 15:56:52 +01:00
Claudio Atzori 8fe7ae1482 xml formatting 2020-03-13 15:53:56 +01:00
Claudio Atzori 23a929177d updates to the graph require this to be an actual class 2020-03-13 14:56:35 +01:00
Przemysław Jacewicz d0c9b0cdd6 WIP promote job functions updated 2020-03-13 12:36:42 +01:00
Przemysław Jacewicz 8d9b3c5de2 WIP action payload mapping into OAF type moved, (local) graph table name enum created, tests fixed 2020-03-13 10:01:39 +01:00
Przemysław Jacewicz 5cc560c7e5 Removed unnecessary dependency on old OAF model 2020-03-13 09:57:46 +01:00
Sandro La Bruzzo addaaa091f migrate relation from RDD to Dataset 2020-03-13 09:13:20 +01:00
Przemysław Jacewicz 3f24593e51 WIP: promote job tests and test resources implementation snapshot 2020-03-11 17:06:29 +01:00
Przemysław Jacewicz 2e996d610f WIP: promote job functions implementation snapshot 2020-03-11 17:02:57 +01:00
Przemysław Jacewicz cc63cdc9e6 WIP: promote job implementation snapshot 2020-03-11 17:02:06 +01:00
Przemysław Jacewicz 69540f6f78 Serialization-safe supplier added 2020-03-11 16:59:05 +01:00
Przemysław Jacewicz e6e214dab5 Oaf merge and get strategy added 2020-03-11 16:58:17 +01:00
Przemysław Jacewicz f7454a9ed8 Added equals and hashCode for OAF types 2020-03-11 16:57:28 +01:00
Claudio Atzori 7b6f0c8756 reading graph dump as text files, encoded as newline-delimited JSON records, as indicated in the wiki 2020-03-10 17:19:17 +01:00
Claudio Atzori 60aedb1110 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-03-10 17:09:44 +01:00
Claudio Atzori a3f184fd3f added field websiteurl in related organizations 2020-03-10 17:08:58 +01:00
Claudio Atzori 0e95544495 fixed serialization for datasource subjects 2020-03-10 17:07:44 +01:00
Sandro La Bruzzo 7b28783fb4 updated unpaywall mapping 2020-03-08 17:00:19 +01:00
Michele Artini b6efa9d6ab Configuration of the SequenceFile Writer 2020-03-05 15:49:14 +01:00
Claudio Atzori ccb153de78 updated image 2020-03-05 15:11:42 +01:00
Claudio Atzori 5e342a555c no need to compute the inverse relClass, fixed text() in xpath expressions 2020-03-05 12:51:48 +01:00
Claudio Atzori 6ec04d4e02 specified column used to perform the join operation in the javadoc 2020-03-05 12:50:38 +01:00
Claudio Atzori 960619de98 updated image 2020-03-04 16:51:55 +01:00
Claudio Atzori e89aa52e58 updated image 2020-03-04 16:18:49 +01:00
Claudio Atzori 5474e8ac9f Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-03-04 14:54:46 +01:00
Claudio Atzori d7137e566e added dhp-doc-resources, aimed to include all the documentation resources used in the wiki pages 2020-03-04 14:54:41 +01:00
Michele Artini 7a2a466161 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-03-04 14:50:59 +01:00
Michele Artini 755eade2fb fix creation ids 2020-03-04 14:49:45 +01:00
Claudio Atzori 6379f32466 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-03-04 10:57:06 +01:00
Claudio Atzori 0233987603 introduced post processing step following the hive DB creation/population 2020-03-04 10:56:50 +01:00
Claudio Atzori 1e563bc15e introduced distinct properties driving the resouce usage for the XML record creation and the indexing phase 2020-03-04 10:55:11 +01:00
Claudio Atzori 9af3e904be close the SparkSession at the end 2020-03-04 10:53:31 +01:00