Changes to indicators and funders definition #372

Merged
claudio.atzori merged 23 commits from antonis.lempesis/dnet-hadoop:beta into beta 2024-03-26 08:46:21 +01:00
Contributor
  • Changes result_refereed definition
  • Added result_country indicator
  • Added indi_pub_green_with_license indicator
  • Added country from jurisdiction to funders
- Changes result_refereed definition - Added result_country indicator - Added indi_pub_green_with_license indicator - Added country from jurisdiction to funders
dimitris.pierrakos added 1 commit 2023-12-22 09:31:01 +01:00
40b98d8182 Changes to indicators and funders definition
- Changes result_refereed definition
- Added result_country indicator
- Added indi_pub_green_with_license indicator
- Added country from jurisdiction to funders
dimitris.pierrakos added 1 commit 2023-12-22 10:05:29 +01:00
ffdd03d2f4 Monitor Irish Stats WF
Parameters (with examples):
stats_db_name=openaire_beta_stats_20231208
monitor_irish_db_name=openaire_beta_stats_monitor_ie_20231208b
monitor_irish_db_prod_name=openaire_beta_stats_monitor_ie
graph_db_name=openaire_beta_20231208
monitor_irish_db_shadow_name=openaire_beta_stats_monitor_ie_shadow
hive_timeout=150000
hadoop_user_name=dnet.beta
resumeFrom=Step1-buildIrishMonitorDB
dimitris.pierrakos added 1 commit 2024-01-04 14:11:09 +01:00
75bfde043c Historical Snapshots Workflow
Create historical snapshots db with parameters:

hist_db_name=openaire_beta_historical_snapshots_xxx
hist_db_name_prev=openaire_beta_historical_snapshots_xxx (previous run of wf)
stats_db_name=openaire_beta_stats_xxx
stats_irish_db_name=openaire_beta_stats_monitor_ie_xxx
monitor_db_name=openaire_beta_stats_monitor_xxx
monitor_db_prod_name=openaire_beta_stats_monitor
monitor_irish_db_name=openaire_beta_stats_monitor_ie_xxx
monitor_irish_db_prod_name=openaire_beta_stats_monitor_ie
hist_db_prod_name=openaire_beta_historical_snapshots
hist_db_shadow_name=openaire_beta_historical_snapshots_shadow
hist_date=122023
hive_timeout=150000
hadoop_user_name=xxx
resumeFrom=CreateDB
dimitris.pierrakos added 1 commit 2024-01-07 21:54:44 +01:00
antonis.lempesis added 1 commit 2024-01-08 15:01:39 +01:00
dimitris.pierrakos force-pushed beta from b139de8286 to 8b2cbb611e 2024-01-08 23:45:22 +01:00 Compare
dimitris.pierrakos added 1 commit 2024-01-08 23:47:14 +01:00
antonis.lempesis added 1 commit 2024-01-10 22:25:54 +01:00
claudio.atzori requested changes 2024-01-11 11:36:28 +01:00
claudio.atzori left a comment
Owner

Dear Dimitris, this PR mentions a few changes in the definition of one table and the addition of three indicators. I would assume these changes to be confined to the modules

  • dhp-stats-update and/or
  • dhp-stats-promote

However, I see two new submodules are included in the PR:

  • dhp-stats-monitor-irish
  • dhp-stats-hist-snaps

Were they intended to be included in this PR? If so, please describe them. I need to know

  • what is their purpose (high level description)
  • how they fit with the existing pipeline (when they need to be lauched)
  • the set of parameters expected for them

Thank you!

Dear Dimitris, this PR mentions a few changes in the definition of one table and the addition of three indicators. I would assume these changes to be confined to the modules * dhp-stats-update and/or * dhp-stats-promote However, I see two *new* submodules are included in the PR: * dhp-stats-monitor-irish * dhp-stats-hist-snaps Were they intended to be included in this PR? If so, please describe them. I need to know * what is their purpose (high level description) * how they fit with the existing pipeline (when they need to be lauched) * the set of parameters expected for them Thank you!

Regarding the 2 new wfs:

dhp-stats-monitor-irish

Builds the db for the Irish Monitor and is meant to run after the stats-update one. They are separate wfs because we want to be able to run this on demand, more often than the main wf, at least while we actively develop the monitor and add more and more stuff.

The parameters it expects are (descripion and example value in parentheses):

  • stats_db_name (stats database) ("openaire_prod_stats_20240125")
  • graph_db_name(graph database) ("openaire_prod_20240125")
  • monitor_irish_db_name (target monitor db) ("openaire_prod_stats_monitor_ie_20240125")
  • monitor_irish_db_prod_name (production monitor db) ("openaire_prod_stats_monitor_ie")
  • monitor_irish_db_shadow_name (shadow monitor db) ("openaire_prod_stats_monitor_ie_shadow")
  • hive_metastore_uris (hive server metastore URIs)
  • hive_jdbc_url (hive server jdbc url)
  • hive_timeout (...)
  • hadoop_user_name (user name of the wf owner)

dhp-stats-hist-snaps

Creates and updates a db of some historical snapshots from the graph. It should run after the stats-update and the dhp-stats-monitor-irish.

I have a few questions myself about the parameters, so I'll Dimitris and comment again. In an case, you can schedule the irish monitor one now (the params are pretty standard) and we'll come back about the historical one.

Regarding the 2 new wfs: ## dhp-stats-monitor-irish Builds the db for the Irish Monitor and is meant to run after the stats-update one. They are separate wfs because we want to be able to run this on demand, more often than the main wf, at least while we actively develop the monitor and add more and more stuff. The parameters it expects are (descripion and example value in parentheses): * stats_db_name (stats database) ("openaire_prod_stats_20240125") * graph_db_name(graph database) ("openaire_prod_20240125") * monitor_irish_db_name (target monitor db) ("openaire_prod_stats_monitor_ie_20240125") * monitor_irish_db_prod_name (production monitor db) ("openaire_prod_stats_monitor_ie") * monitor_irish_db_shadow_name (shadow monitor db) ("openaire_prod_stats_monitor_ie_shadow") * hive_metastore_uris (hive server metastore URIs) * hive_jdbc_url (hive server jdbc url) * hive_timeout (...) * hadoop_user_name (user name of the wf owner) ## dhp-stats-hist-snaps Creates and updates a db of some historical snapshots from the graph. It should run after the stats-update and the dhp-stats-monitor-irish. I have a few questions myself about the parameters, so I'll Dimitris and comment again. In an case, you can schedule the irish monitor one now (the params are pretty standard) and we'll come back about the historical one.
antonis.lempesis added 1 commit 2024-01-25 15:06:37 +01:00
antonis.lempesis added 1 commit 2024-01-26 01:05:14 +01:00
antonis.lempesis added 1 commit 2024-01-29 20:52:43 +01:00
antonis.lempesis added 1 commit 2024-01-30 15:55:13 +01:00
antonis.lempesis added 1 commit 2024-02-08 11:58:08 +01:00
antonis.lempesis added 3 commits 2024-03-07 11:15:57 +01:00
antonis.lempesis added 1 commit 2024-03-07 13:09:11 +01:00
dimitris.pierrakos added 1 commit 2024-03-19 08:42:27 +01:00
antonis.lempesis added 1 commit 2024-03-21 10:45:26 +01:00
antonis.lempesis added 5 commits 2024-03-22 09:17:14 +01:00
claudio.atzori merged commit d72e7b7487 into beta 2024-03-26 08:46:21 +01:00

@antonis.lempesis although I'm still waiting for the clarification you mentioned above about the parameters for the workflow dhp-stats-hist-snaps, I integrated the PR in the beta codebase. While doing so, I noticed the changes include yet another submodule defining another oozie workflow that was not described yet:

dhp-stats-monitor-update

I can imagine this workflow implements the update of the monitor DBs, hence I take it as a refactoring of parts defined in the dhp-stats-update workflow. Can you confirm this the case?

If so, please

  1. describe the parameters and
  2. summarise the ordering in which all the various stats-related workflows should be run

Thanks!

ps. in the meantime I will prepare the automated deployment procedures for all the new oozie workflows.

@antonis.lempesis although I'm still waiting for the clarification you mentioned above about the parameters for the workflow `dhp-stats-hist-snaps`, I integrated the PR in the beta codebase. While doing so, I noticed the changes include yet another submodule defining another oozie workflow that was not described yet: ``` dhp-stats-monitor-update ``` I can imagine this workflow implements the update of the monitor DBs, hence I take it as a refactoring of parts defined in the `dhp-stats-update` workflow. Can you confirm this the case? If so, please 1. describe the parameters and 2. summarise the ordering in which all the various stats-related workflows should be run Thanks! ps. in the meantime I will prepare the automated deployment procedures for all the new oozie workflows.

@antonis.lempesis although I'm still waiting for the clarification you mentioned above about the parameters for the workflow dhp-stats-hist-snaps, I integrated the PR in the beta codebase. While doing so, I noticed the changes include yet another submodule defining another oozie workflow that was not described yet:

dhp-stats-monitor-update

I can imagine this workflow implements the update of the monitor DBs, hence I take it as a refactoring of parts defined in the dhp-stats-update workflow. Can you confirm this the case?

If so, please

  1. describe the parameters and
  2. summarise the ordering in which all the various stats-related workflows should be run

Thanks!

ps. in the meantime I will prepare the automated deployment procedures for all the new oozie workflows.

Please ignore this wf. It's updating the beta institutional monitor db whenever we need to add a couple of more orgs in the monitor, and we run it manually between beta runs. I'll move it to a new project when we split the whole thing.

> @antonis.lempesis although I'm still waiting for the clarification you mentioned above about the parameters for the workflow `dhp-stats-hist-snaps`, I integrated the PR in the beta codebase. While doing so, I noticed the changes include yet another submodule defining another oozie workflow that was not described yet: > > ``` > dhp-stats-monitor-update > ``` > > I can imagine this workflow implements the update of the monitor DBs, hence I take it as a refactoring of parts defined in the `dhp-stats-update` workflow. Can you confirm this the case? > > If so, please > > 1. describe the parameters and > 2. summarise the ordering in which all the various stats-related workflows should be run > > Thanks! > > ps. in the meantime I will prepare the automated deployment procedures for all the new oozie workflows. > Please ignore this wf. It's updating the beta institutional monitor db whenever we need to add a couple of more orgs in the monitor, and we run it manually between beta runs. I'll move it to a new project when we split the whole thing.
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#372
No description provided.