Commit Graph

1370 Commits

Author SHA1 Message Date
Claudio Atzori 1fcc28968e integrated changes from master 2020-08-04 10:57:44 +02:00
Claudio Atzori da2f8af72d adjusted MergeClaimsApplication param specs 2020-08-03 19:56:16 +02:00
Alessia Bardi 8cc067fe76 specific test for claims 2020-08-03 11:17:50 +02:00
Claudio Atzori a89b6cc3ba Merge pull request 'nsprefix_blacklist' (#34) from nsprefix_blacklist into master 2020-07-31 11:52:23 +02:00
Sandro La Bruzzo 0c3bc9ea4b Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-31 09:07:18 +02:00
Sandro La Bruzzo 168bfb496a adopted dedup to the new schema 2020-07-31 09:06:57 +02:00
Michele Artini 652b13abb6 Merge branch 'master' into nsprefix_blacklist 2020-07-31 07:58:37 +02:00
Claudio Atzori cd631bb5bc defaults fixed in the cleaning workflow forces result.publisher to NULL when result.publisher.value in empty 2020-07-30 17:03:53 +02:00
Claudio Atzori 4bbfcf1ac6 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-07-30 16:25:06 +02:00
Claudio Atzori 4ff8007518 added function to set the missing vocabulary names, used in the cleaning workflow as a pre-cleaning step 2020-07-30 16:24:39 +02:00
Michele Artini bdece15ca0 blacklist of nsprefix 2020-07-30 16:13:38 +02:00
Sandro La Bruzzo c97c8f0c44 implemented new oozie job to extract entities in a separate dataset 2020-07-30 12:13:58 +02:00
Claudio Atzori 4ff184973b code formatting 2020-07-30 11:33:03 +02:00
Sandro La Bruzzo 3010a362bc updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset 2020-07-30 09:25:56 +02:00
Sandro La Bruzzo 487226f669 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-30 09:25:39 +02:00
Sandro La Bruzzo 16ae3c9ccf updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset 2020-07-30 09:25:32 +02:00
Claudio Atzori 9e594cf4c2 WIP: materialize graph as Hive DB, aggregator graph 2020-07-29 19:25:11 +02:00
Claudio Atzori ee0b2191f8 adjusted test assertions to reflect update ordering defined in SortableRelationKey 2020-07-29 14:37:35 +02:00
Claudio Atzori fd289d389c adjusted dedup stats block test assertions to reflect updated configuration 2020-07-29 14:36:52 +02:00
Michele Artini 8ba94833bd added an es prop 2020-07-29 14:16:08 +02:00
Claudio Atzori 2dbac631c9 WIP: factoring out utilities into dhp-workflows-common 2020-07-29 13:08:20 +02:00
Claudio Atzori 91811ab43a Merge branch 'master' into graph_db 2020-07-28 15:15:27 +02:00
Claudio Atzori 6f11c0496e fixed typo in module name dhp-worfklow-profiles -> dhp-workflow-profiles 2020-07-28 15:01:58 +02:00
Claudio Atzori b0b6c6bd47 WIP dhp-workflows-common 2020-07-28 14:59:14 +02:00
Claudio Atzori f680eb3e12 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-07-28 14:10:56 +02:00
Claudio Atzori 985b360c31 fixed typo in module name dhp-worfklow-profiles -> dhp-workflow-profiles 2020-07-28 14:10:52 +02:00
Claudio Atzori 7fc27bfdd1 Merge pull request 'islookup_timeout' (#30) from islookup_timeout into master
Thanks, Michele!
2020-07-28 13:53:12 +02:00
Michele Artini 3acd632123 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-07-28 12:02:30 +02:00
Michele Artini 35e6e9c064 tests 2020-07-28 12:02:15 +02:00
Claudio Atzori 2c4196ab22 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into islookup_timeout 2020-07-27 17:40:58 +02:00
Claudio Atzori ee832f358e Merge pull request 'stats_wf_extensions_and_corrections' (#28) from spyros/dnet-hadoop:stats_wf_extensions_and_corrections into master
Thank you Guys! The update workflow will be made available to the beta & production orchestration systems under the HDFS path

```/lib/dnet/oa/graph/stats/oozie_app```
2020-07-27 16:02:03 +02:00
Antonis Lempesis 4ac8ebe427 correctly calculating the project duration 2020-07-24 19:50:40 +03:00
Antonis Lempesis 18d9464b52 creating shadow db only if it not exists... 2020-07-24 19:50:40 +03:00
Antonis Lempesis e217d496ab added the dest db... 2020-07-24 19:50:40 +03:00
Antonis Lempesis b16bb68b9f added the target db name... 2020-07-24 19:50:40 +03:00
Antonis Lempesis 1ee7eeedf3 added the source db name... 2020-07-24 19:50:40 +03:00
Antonis Lempesis cecbbfa0fc added missing tables and views: contexts, creation_date, funder 2020-07-24 19:50:40 +03:00
Antonis Lempesis 25b7a615f5 moved datasource_sources table creating in the datasource section 2020-07-24 19:50:40 +03:00
Antonis Lempesis a8da4ab9c0 years in projects are now integers 2020-07-24 19:50:40 +03:00
Antonis Lempesis c9cfc165d9 not using impala since the resulting tables are not visible 2020-07-24 19:50:40 +03:00
Antonis Lempesis dd3d6a6e15 compute stats for the used and new impala tables 2020-07-24 19:50:40 +03:00
Antonis Lempesis e6f50de6ef Separated impala from hive steps 2020-07-24 19:50:40 +03:00
Antonis Lempesis de49173420 fixed a typo in queries 2020-07-24 19:50:40 +03:00
antleb 391cf80fb8 Added peer-reviewed, green, gold tables and fields in result. Added shortcuts from result-country 2020-07-24 19:50:40 +03:00
antleb 68389d0125 Corrected the script used by the last step of the wf 2020-07-24 19:50:40 +03:00
antleb ec52141f1a changed refereed type from value to clssname 2020-07-24 19:50:40 +03:00
Spyros Zoupanos 63cd797aba Comment out step 15 to make it work with the new schema of Claudio 2020-07-24 19:50:40 +03:00
Spyros Zoupanos 138c6ddffa Insert statement to datasource table that takes into account the piwik_id of the openAIRE graph 2020-07-24 19:50:40 +03:00
Spyros Zoupanos 3630794cef Fix to consider the relationships that have been 'virtually deleted' for project_results - defect #5607 2020-07-24 19:50:40 +03:00
Spyros Zoupanos 5546f29e63 Corrections on the shadow schema and the impala table stats calculation 2020-07-24 19:50:40 +03:00