Claudio Atzori
a5f23d8a4c
WIP: materialize graph as Hive DB
2020-08-07 15:33:07 +02:00
Claudio Atzori
cce21eafc2
WIP: materialize graph as Hive DB, configured spark actions to include hive support]
2020-08-06 21:48:29 +02:00
Claudio Atzori
fde8738ed2
added hiveMetastoreUris parameter to default parameters
2020-08-06 21:46:47 +02:00
Claudio Atzori
3be0e5c2cd
Merge branch 'master' into graph_db
2020-08-05 12:54:48 +02:00
Alessia Bardi
a29565ff57
code formatting
2020-08-04 12:55:27 +02:00
Alessia Bardi
01db29e208
fixes redmine issue #5846 : datacite and its different namespace declarations
2020-08-04 12:53:48 +02:00
Alessia Bardi
b4e4e5f858
do not duplicate result PIDs
2020-08-04 12:52:14 +02:00
Alessia Bardi
09a323d18d
testing a dataset from Nakala
2020-08-04 12:50:52 +02:00
Alessia Bardi
c35bf486cc
added handle among the possible PIDs
2020-08-04 12:50:12 +02:00
Claudio Atzori
f3ce97ecf9
WIP: materialize graph as Hive DB, mergeAggregatorGraphs [added workflow node to drop the DB]
2020-08-04 12:29:42 +02:00
Claudio Atzori
771bf8bcc4
WIP: materialize graph as Hive DB, mergeAggregatorGraphs
2020-08-04 12:26:09 +02:00
Claudio Atzori
0da1d2c0c9
introduced GraphFormat.DEFAULT, indicating a common value to be used across the workflows
2020-08-04 12:25:31 +02:00
Claudio Atzori
1fcc28968e
integrated changes from master
2020-08-04 10:57:44 +02:00
Claudio Atzori
da2f8af72d
adjusted MergeClaimsApplication param specs
2020-08-03 19:56:16 +02:00
Alessia Bardi
8cc067fe76
specific test for claims
2020-08-03 11:17:50 +02:00
Claudio Atzori
a89b6cc3ba
Merge pull request 'nsprefix_blacklist' ( #34 ) from nsprefix_blacklist into master
2020-07-31 11:52:23 +02:00
Sandro La Bruzzo
0c3bc9ea4b
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-31 09:07:18 +02:00
Sandro La Bruzzo
168bfb496a
adopted dedup to the new schema
2020-07-31 09:06:57 +02:00
Michele Artini
652b13abb6
Merge branch 'master' into nsprefix_blacklist
2020-07-31 07:58:37 +02:00
Claudio Atzori
cd631bb5bc
defaults fixed in the cleaning workflow forces result.publisher to NULL when result.publisher.value in empty
2020-07-30 17:03:53 +02:00
Claudio Atzori
4bbfcf1ac6
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-07-30 16:25:06 +02:00
Claudio Atzori
4ff8007518
added function to set the missing vocabulary names, used in the cleaning workflow as a pre-cleaning step
2020-07-30 16:24:39 +02:00
Michele Artini
bdece15ca0
blacklist of nsprefix
2020-07-30 16:13:38 +02:00
Sandro La Bruzzo
c97c8f0c44
implemented new oozie job to extract entities in a separate dataset
2020-07-30 12:13:58 +02:00
Claudio Atzori
4ff184973b
code formatting
2020-07-30 11:33:03 +02:00
Sandro La Bruzzo
3010a362bc
updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset
2020-07-30 09:25:56 +02:00
Sandro La Bruzzo
487226f669
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-30 09:25:39 +02:00
Sandro La Bruzzo
16ae3c9ccf
updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset
2020-07-30 09:25:32 +02:00
Claudio Atzori
9e594cf4c2
WIP: materialize graph as Hive DB, aggregator graph
2020-07-29 19:25:11 +02:00
Claudio Atzori
ee0b2191f8
adjusted test assertions to reflect update ordering defined in SortableRelationKey
2020-07-29 14:37:35 +02:00
Claudio Atzori
fd289d389c
adjusted dedup stats block test assertions to reflect updated configuration
2020-07-29 14:36:52 +02:00
Michele Artini
8ba94833bd
added an es prop
2020-07-29 14:16:08 +02:00
Claudio Atzori
2dbac631c9
WIP: factoring out utilities into dhp-workflows-common
2020-07-29 13:08:20 +02:00
Claudio Atzori
91811ab43a
Merge branch 'master' into graph_db
2020-07-28 15:15:27 +02:00
Claudio Atzori
6f11c0496e
fixed typo in module name dhp-worfklow-profiles -> dhp-workflow-profiles
2020-07-28 15:01:58 +02:00
Claudio Atzori
b0b6c6bd47
WIP dhp-workflows-common
2020-07-28 14:59:14 +02:00
Claudio Atzori
f680eb3e12
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-07-28 14:10:56 +02:00
Claudio Atzori
985b360c31
fixed typo in module name dhp-worfklow-profiles -> dhp-workflow-profiles
2020-07-28 14:10:52 +02:00
Michele Artini
3acd632123
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-28 12:02:30 +02:00
Michele Artini
35e6e9c064
tests
2020-07-28 12:02:15 +02:00
Claudio Atzori
ee832f358e
Merge pull request 'stats_wf_extensions_and_corrections' ( #28 ) from spyros/dnet-hadoop:stats_wf_extensions_and_corrections into master
...
Thank you Guys! The update workflow will be made available to the beta & production orchestration systems under the HDFS path
```/lib/dnet/oa/graph/stats/oozie_app```
2020-07-27 16:02:03 +02:00
Antonis Lempesis
4ac8ebe427
correctly calculating the project duration
2020-07-24 19:50:40 +03:00
Antonis Lempesis
18d9464b52
creating shadow db only if it not exists...
2020-07-24 19:50:40 +03:00
Antonis Lempesis
e217d496ab
added the dest db...
2020-07-24 19:50:40 +03:00
Antonis Lempesis
b16bb68b9f
added the target db name...
2020-07-24 19:50:40 +03:00
Antonis Lempesis
1ee7eeedf3
added the source db name...
2020-07-24 19:50:40 +03:00
Antonis Lempesis
cecbbfa0fc
added missing tables and views: contexts, creation_date, funder
2020-07-24 19:50:40 +03:00
Antonis Lempesis
25b7a615f5
moved datasource_sources table creating in the datasource section
2020-07-24 19:50:40 +03:00
Antonis Lempesis
a8da4ab9c0
years in projects are now integers
2020-07-24 19:50:40 +03:00
Antonis Lempesis
c9cfc165d9
not using impala since the resulting tables are not visible
2020-07-24 19:50:40 +03:00