Commit Graph

301 Commits

Author SHA1 Message Date
Miriam Baglioni 46094a3eec bug fixing for implementation with dataset 2020-03-24 16:19:36 +01:00
Miriam Baglioni ad712f2d79 added the needed variables in the config and read the variables in the workflow 2020-03-23 17:11:36 +01:00
Miriam Baglioni f1e9fe9752 changed implementation using dataset and query on hive 2020-03-23 17:11:00 +01:00
Miriam Baglioni f09cd1e911 removed unuseful variable in the configuration 2020-03-23 17:10:14 +01:00
Miriam Baglioni 9418e3d4fa read dataset from files instead of using hive tables 2020-03-23 17:09:27 +01:00
Miriam Baglioni a7bf037306 remove unused class 2020-03-23 14:36:43 +01:00
Miriam Baglioni 8ab8b6b0bf minor 2020-03-23 14:35:23 +01:00
Miriam Baglioni 30d58fd98c change the configuration of the workflow 2020-03-23 14:32:49 +01:00
Miriam Baglioni a440152b46 refactoring 2020-03-23 14:30:56 +01:00
Miriam Baglioni 47561f3597 changed the implementation from rdd to dataset got from sql queries (on hive) 2020-03-23 11:58:32 +01:00
Miriam Baglioni 67ea3cf3ed changed the way to read the file with info on resource or relation. From sequenceFile to textFile 2020-03-17 16:32:05 +01:00
Miriam Baglioni b4652d018c moved the creation of new dir to common class. 2020-03-17 16:31:24 +01:00
Miriam Baglioni 92f4e0001d Merge branch 'bulktag' 2020-03-16 13:33:27 +01:00
Miriam Baglioni ab08a37024 Merge remote-tracking branch 'upstream/master' 2020-03-16 12:45:23 +01:00
Claudio Atzori af835f2f98 when migrating actionsets from DM cluster, populate the AtomicAction.targetValue when empty (dedup similarities) 2020-03-15 18:07:59 +01:00
Claudio Atzori 9c84e21b87 added workflow to migrate latest version of each actionset content from DM to OCEAN cluster, mapping the targetValues from the old protobuf data model to the dhp.OAF datamodel 2020-03-13 15:56:52 +01:00
Claudio Atzori 8fe7ae1482 xml formatting 2020-03-13 15:53:56 +01:00
Claudio Atzori 23a929177d updates to the graph require this to be an actual class 2020-03-13 14:56:35 +01:00
Claudio Atzori 7b6f0c8756 reading graph dump as text files, encoded as newline-delimited JSON records, as indicated in the wiki 2020-03-10 17:19:17 +01:00
Claudio Atzori 60aedb1110 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-03-10 17:09:44 +01:00
Claudio Atzori a3f184fd3f added field websiteurl in related organizations 2020-03-10 17:08:58 +01:00
Claudio Atzori 0e95544495 fixed serialization for datasource subjects 2020-03-10 17:07:44 +01:00
Michele Artini b6efa9d6ab Configuration of the SequenceFile Writer 2020-03-05 15:49:14 +01:00
Claudio Atzori ccb153de78 updated image 2020-03-05 15:11:42 +01:00
Claudio Atzori 5e342a555c no need to compute the inverse relClass, fixed text() in xpath expressions 2020-03-05 12:51:48 +01:00
Claudio Atzori 6ec04d4e02 specified column used to perform the join operation in the javadoc 2020-03-05 12:50:38 +01:00
Claudio Atzori 960619de98 updated image 2020-03-04 16:51:55 +01:00
Claudio Atzori e89aa52e58 updated image 2020-03-04 16:18:49 +01:00
Claudio Atzori 5474e8ac9f Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-03-04 14:54:46 +01:00
Claudio Atzori d7137e566e added dhp-doc-resources, aimed to include all the documentation resources used in the wiki pages 2020-03-04 14:54:41 +01:00
Michele Artini 7a2a466161 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-03-04 14:50:59 +01:00
Michele Artini 755eade2fb fix creation ids 2020-03-04 14:49:45 +01:00
Claudio Atzori 6379f32466 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-03-04 10:57:06 +01:00
Claudio Atzori 0233987603 introduced post processing step following the hive DB creation/population 2020-03-04 10:56:50 +01:00
Claudio Atzori 1e563bc15e introduced distinct properties driving the resouce usage for the XML record creation and the indexing phase 2020-03-04 10:55:11 +01:00
Claudio Atzori 9af3e904be close the SparkSession at the end 2020-03-04 10:53:31 +01:00
Michele Artini 086af63158 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-03-04 10:46:40 +01:00
Michele Artini e7167b996a logs and closeable 2020-03-04 10:46:36 +01:00
Claudio Atzori 25ceec29ab code formatting 2020-03-04 10:44:24 +01:00
Claudio Atzori 63c00c5e88 fixed typo 2020-03-04 10:43:44 +01:00
Miriam Baglioni c37f2bd1b5 moved some classes to package to make code clearer 2020-03-03 16:42:23 +01:00
Miriam Baglioni d9d2060561 implementation for bulk tagging 2020-03-03 16:38:50 +01:00
Miriam Baglioni e80f80ca93 properties and workflow for new propagation 2020-03-02 17:03:31 +01:00
Claudio Atzori 9cf5ce2e66 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-03-02 17:03:10 +01:00
Claudio Atzori bc7cfd5975 indexing workflow WIP: fixed projects fundingtree xml conversion, prioritized links between results and projects when limiting them to 100 in the join procedure 2020-03-02 17:03:07 +01:00
Miriam Baglioni 50080c1b3c changed the implementation of addAll method. Before adding all the items in a collection, we check if the accumulator set is not empty 2020-03-02 16:41:37 +01:00
Miriam Baglioni 02815dd2cf update result for community moved in propagationconstants 2020-03-02 16:40:56 +01:00
Miriam Baglioni 95f8c3092f update for new propagation implementation and moving of updateResult for community business logic since the same can be used for result to community from organization and result to community from semrel 2020-03-02 16:40:17 +01:00
Miriam Baglioni 3d63f35dcb implementation of new propagation. Result to community for results linked to given organization. We exploit the hasAuthorInstitution semantic link to discover which results are related to institutions 2020-03-02 16:39:03 +01:00
Michele Artini 4b29a121b0 migration using spark in step2 2020-03-02 16:12:14 +01:00