Commit Graph

1485 Commits

Author SHA1 Message Date
Claudio Atzori ee832f358e Merge pull request 'stats_wf_extensions_and_corrections' (#28) from spyros/dnet-hadoop:stats_wf_extensions_and_corrections into master
Thank you Guys! The update workflow will be made available to the beta & production orchestration systems under the HDFS path

```/lib/dnet/oa/graph/stats/oozie_app```
2020-07-27 16:02:03 +02:00
Antonis Lempesis 4ac8ebe427 correctly calculating the project duration 2020-07-24 19:50:40 +03:00
Antonis Lempesis 18d9464b52 creating shadow db only if it not exists... 2020-07-24 19:50:40 +03:00
Antonis Lempesis e217d496ab added the dest db... 2020-07-24 19:50:40 +03:00
Antonis Lempesis b16bb68b9f added the target db name... 2020-07-24 19:50:40 +03:00
Antonis Lempesis 1ee7eeedf3 added the source db name... 2020-07-24 19:50:40 +03:00
Antonis Lempesis cecbbfa0fc added missing tables and views: contexts, creation_date, funder 2020-07-24 19:50:40 +03:00
Antonis Lempesis 25b7a615f5 moved datasource_sources table creating in the datasource section 2020-07-24 19:50:40 +03:00
Antonis Lempesis a8da4ab9c0 years in projects are now integers 2020-07-24 19:50:40 +03:00
Antonis Lempesis c9cfc165d9 not using impala since the resulting tables are not visible 2020-07-24 19:50:40 +03:00
Antonis Lempesis dd3d6a6e15 compute stats for the used and new impala tables 2020-07-24 19:50:40 +03:00
Antonis Lempesis e6f50de6ef Separated impala from hive steps 2020-07-24 19:50:40 +03:00
Antonis Lempesis de49173420 fixed a typo in queries 2020-07-24 19:50:40 +03:00
antleb 391cf80fb8 Added peer-reviewed, green, gold tables and fields in result. Added shortcuts from result-country 2020-07-24 19:50:40 +03:00
antleb 68389d0125 Corrected the script used by the last step of the wf 2020-07-24 19:50:40 +03:00
antleb ec52141f1a changed refereed type from value to clssname 2020-07-24 19:50:40 +03:00
Spyros Zoupanos 63cd797aba Comment out step 15 to make it work with the new schema of Claudio 2020-07-24 19:50:40 +03:00
Spyros Zoupanos 138c6ddffa Insert statement to datasource table that takes into account the piwik_id of the openAIRE graph 2020-07-24 19:50:40 +03:00
Spyros Zoupanos 3630794cef Fix to consider the relationships that have been 'virtually deleted' for project_results - defect #5607 2020-07-24 19:50:40 +03:00
Spyros Zoupanos 5546f29e63 Corrections on the shadow schema and the impala table stats calculation 2020-07-24 19:50:40 +03:00
Spyros Zoupanos adf8a025d2 Adding more relations (Sources, Licences, Additional) and shadow schema as provided and discussed with Antonis Lempesis 2020-07-24 19:50:40 +03:00
Spyros Zoupanos 657a40536b Corrections by Spyros: Scipt cleanup, corrections and re-arrangement 2020-07-24 19:50:40 +03:00
Giorgos Alexiou 477fa6234d Script re-organisation and adding table invalidations needed for impala 2020-07-24 19:50:40 +03:00
Miriam Baglioni 6c2223d1fc added code to get the openaire id for contexts 2020-07-24 17:30:15 +02:00
Miriam Baglioni afd54c1684 removed not needed upload and refactoring 2020-07-24 17:28:56 +02:00
Miriam Baglioni 7b0569d989 changed to map also the result associated to the whole graph 2020-07-24 17:28:11 +02:00
Miriam Baglioni 082225ad61 - 2020-07-24 17:27:26 +02:00
Miriam Baglioni 968c59d97a added teh logic to dump also the products for the whole graph. They will miss collected from and context information that will be materialized as new relations 2020-07-24 17:25:19 +02:00
Miriam Baglioni 00f2b8410a changed the definition of the model to intesert porvenance information to some classes 2020-07-24 17:23:57 +02:00
Miriam Baglioni c0f3059676 added needed structures to ModelSupport 2020-07-24 17:22:39 +02:00
Miriam Baglioni 332258d199 split the classes related to the communities dump and to the whole graph dump 2020-07-24 17:21:48 +02:00
Claudio Atzori 56bbfdc65d introduced parameter 'numParitions', driving the hive DB table data partitioning. Currently specified only for table 'project' 2020-07-23 08:54:10 +02:00
Sandro La Bruzzo 9ab594ccf6 fixed test 2020-07-21 10:36:21 +02:00
Claudio Atzori ebf60020ac map results as OPRs in case of missing //CobjCategory/@type and the vocabulary dnet:result_typologies doesn't resolve the super type 2020-07-20 19:01:10 +02:00
Miriam Baglioni 355d7e426e added dumo for project - not finished 2020-07-20 18:54:43 +02:00
Miriam Baglioni a2f01e5259 added getter and setter 2020-07-20 18:54:17 +02:00
Miriam Baglioni 40bbe94f7c merge with master fork 2020-07-20 18:10:03 +02:00
Miriam Baglioni 2a15494b16 merge upstream 2020-07-20 18:05:01 +02:00
Miriam Baglioni 23160b4d29 realignment of the workflow classes with the changes in the structure of the module 2020-07-20 18:04:30 +02:00
Miriam Baglioni b904e0699a - 2020-07-20 18:02:53 +02:00
Miriam Baglioni 3aab7680f6 changed the test results 2020-07-20 18:00:43 +02:00
Miriam Baglioni cde0300801 moved from projects to project 2020-07-20 17:57:35 +02:00
Miriam Baglioni 5076e4f320 changed test to comply with the modifications 2020-07-20 17:55:18 +02:00
Miriam Baglioni 08dbd99455 changed to dump the whole results graph by usign classes already implemented for communities. Added class to dump also organization 2020-07-20 17:54:28 +02:00
Miriam Baglioni e47ea9349c extended some types by adding provenance as the couple (provenance, trust) and moved some classes to be used by the complete graph dump also 2020-07-20 17:46:27 +02:00
Claudio Atzori 32f5e466e3 imports cleanup 2020-07-20 17:42:58 +02:00
Claudio Atzori 54ac583923 code formatting 2020-07-20 17:37:08 +02:00
Claudio Atzori 124e7ce19c in case of missing attribute //dr:CobjCategory/@type the resulttype is derived by looking up the vocabulary dnet:result_typologies with the 1st instance type available 2020-07-20 17:33:37 +02:00
Claudio Atzori 050dda223d Merge pull request 'removed duplicated fields' (#25) from unique_field_in_lists into master
Looks good as a temporary workaround. I agree the model could seamlessly make the distinct operation by using HashSets instead of Linked (or Array) Lists.

The task to update the model in such a way is added on #9#issuecomment-1583

Thanks!
2020-07-20 12:12:50 +02:00
Claudio Atzori e0c4cf6f7b added parameter to drive the graph merge strategy: priority (BETA|PROD) 2020-07-20 10:48:01 +02:00