Claudio Atzori
|
a0ab15a64c
|
need to stick on using guava:11.0.2 as it is the version used by the hadoop components (oozie client for sure). The last version (28.2-jre) breaks the oozie workflow submission
|
2020-03-19 13:58:58 +01:00 |
Sandro La Bruzzo
|
0594b92a6d
|
implemented relation with dataset
|
2020-03-19 11:11:07 +01:00 |
Claudio Atzori
|
1850a02ae4
|
added simpler, AtomicAction replacement, based on the dhp.Oaf model
|
2020-03-19 10:44:16 +01:00 |
miconis
|
679b5869e5
|
implementation of the lookup procedure to take dedup conf from the resource profiles
|
2020-03-18 17:41:56 +01:00 |
Claudio Atzori
|
abe8fb69a2
|
added global properties, moved postprocessing script inside the oozie_app directory
|
2020-03-18 15:43:54 +01:00 |
miconis
|
f32eae5ce9
|
implementation of the spark action for the simrel creation
|
2020-03-18 14:27:49 +01:00 |
Claudio Atzori
|
c7e0730720
|
compress the output produced by migration steps 1 and 2
|
2020-03-18 09:34:57 +01:00 |
Claudio Atzori
|
2f11e37602
|
fixed expansion of path variables
|
2020-03-17 19:41:07 +01:00 |
Claudio Atzori
|
2795b0b096
|
no need to mkdir a the all_entities file
|
2020-03-17 17:22:14 +01:00 |
Claudio Atzori
|
19746ad308
|
when reuseContent, reset ${workingPath}/all_entities
|
2020-03-17 17:17:06 +01:00 |
Claudio Atzori
|
2f0c85eeb3
|
updated parameters for regular_all_steps worfklow, introduced flag 'reuseContent'
|
2020-03-17 17:04:58 +01:00 |
Claudio Atzori
|
b8290b5851
|
updated parameters for regular_all_steps worfklow
|
2020-03-17 15:45:30 +01:00 |
Claudio Atzori
|
4706f24ec5
|
updated parameters for regular_all_steps worfklow
|
2020-03-17 15:23:54 +01:00 |
Claudio Atzori
|
aeb01fa353
|
reading from newline delimited json textfiles instead of sequence files
|
2020-03-17 11:57:24 +01:00 |
Claudio Atzori
|
af835f2f98
|
when migrating actionsets from DM cluster, populate the AtomicAction.targetValue when empty (dedup similarities)
|
2020-03-15 18:07:59 +01:00 |
Claudio Atzori
|
9c84e21b87
|
added workflow to migrate latest version of each actionset content from DM to OCEAN cluster, mapping the targetValues from the old protobuf data model to the dhp.OAF datamodel
|
2020-03-13 15:56:52 +01:00 |
Claudio Atzori
|
8fe7ae1482
|
xml formatting
|
2020-03-13 15:53:56 +01:00 |
Claudio Atzori
|
23a929177d
|
updates to the graph require this to be an actual class
|
2020-03-13 14:56:35 +01:00 |
Sandro La Bruzzo
|
addaaa091f
|
migrate relation from RDD to Dataset
|
2020-03-13 09:13:20 +01:00 |
Claudio Atzori
|
7b6f0c8756
|
reading graph dump as text files, encoded as newline-delimited JSON records, as indicated in the wiki
|
2020-03-10 17:19:17 +01:00 |
Claudio Atzori
|
60aedb1110
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-10 17:09:44 +01:00 |
Claudio Atzori
|
a3f184fd3f
|
added field websiteurl in related organizations
|
2020-03-10 17:08:58 +01:00 |
Claudio Atzori
|
0e95544495
|
fixed serialization for datasource subjects
|
2020-03-10 17:07:44 +01:00 |
Sandro La Bruzzo
|
7b28783fb4
|
updated unpaywall mapping
|
2020-03-08 17:00:19 +01:00 |
Michele Artini
|
b6efa9d6ab
|
Configuration of the SequenceFile Writer
|
2020-03-05 15:49:14 +01:00 |
Claudio Atzori
|
ccb153de78
|
updated image
|
2020-03-05 15:11:42 +01:00 |
Claudio Atzori
|
5e342a555c
|
no need to compute the inverse relClass, fixed text() in xpath expressions
|
2020-03-05 12:51:48 +01:00 |
Claudio Atzori
|
6ec04d4e02
|
specified column used to perform the join operation in the javadoc
|
2020-03-05 12:50:38 +01:00 |
Claudio Atzori
|
960619de98
|
updated image
|
2020-03-04 16:51:55 +01:00 |
Claudio Atzori
|
e89aa52e58
|
updated image
|
2020-03-04 16:18:49 +01:00 |
Claudio Atzori
|
5474e8ac9f
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-04 14:54:46 +01:00 |
Claudio Atzori
|
d7137e566e
|
added dhp-doc-resources, aimed to include all the documentation resources used in the wiki pages
|
2020-03-04 14:54:41 +01:00 |
Michele Artini
|
7a2a466161
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-04 14:50:59 +01:00 |
Michele Artini
|
755eade2fb
|
fix creation ids
|
2020-03-04 14:49:45 +01:00 |
Claudio Atzori
|
6379f32466
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-04 10:57:06 +01:00 |
Claudio Atzori
|
0233987603
|
introduced post processing step following the hive DB creation/population
|
2020-03-04 10:56:50 +01:00 |
Claudio Atzori
|
1e563bc15e
|
introduced distinct properties driving the resouce usage for the XML record creation and the indexing phase
|
2020-03-04 10:55:11 +01:00 |
Claudio Atzori
|
9af3e904be
|
close the SparkSession at the end
|
2020-03-04 10:53:31 +01:00 |
Michele Artini
|
086af63158
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-04 10:46:40 +01:00 |
Michele Artini
|
e7167b996a
|
logs and closeable
|
2020-03-04 10:46:36 +01:00 |
Claudio Atzori
|
25ceec29ab
|
code formatting
|
2020-03-04 10:44:24 +01:00 |
Claudio Atzori
|
63c00c5e88
|
fixed typo
|
2020-03-04 10:43:44 +01:00 |
Claudio Atzori
|
9cf5ce2e66
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-02 17:03:10 +01:00 |
Claudio Atzori
|
bc7cfd5975
|
indexing workflow WIP: fixed projects fundingtree xml conversion, prioritized links between results and projects when limiting them to 100 in the join procedure
|
2020-03-02 17:03:07 +01:00 |
Michele Artini
|
4b29a121b0
|
migration using spark in step2
|
2020-03-02 16:12:14 +01:00 |
Michele Artini
|
5445a57102
|
migration using spark in step2
|
2020-03-02 16:11:59 +01:00 |
Sandro La Bruzzo
|
b32655e48e
|
changed code to save intermediate result
|
2020-02-27 10:18:46 +01:00 |
Claudio Atzori
|
60bc2b1a20
|
drop the hive DB before populating it from scratch
|
2020-02-27 10:10:55 +01:00 |
Sandro La Bruzzo
|
f09e065865
|
incremented number of repartition
|
2020-02-26 19:26:19 +01:00 |
Sandro La Bruzzo
|
071f5c3e52
|
fixed NPE
|
2020-02-26 15:42:20 +01:00 |