[Stats wf] #372, #405 to production #406

Merged
claudio.atzori merged 28 commits from antonis.lempesis/dnet-hadoop:beta into master 2024-03-26 12:18:27 +01:00

This PR echoes the changes from #372 and #405 already merged in the beta branch to the master one.

It features various fixes to the procedures and 2 new wfs:

dhp-stats-monitor-irish

Builds the db for the Irish Monitor and is meant to run after the stats-update one. They are separate wfs because we want to be able to run this on demand, more often than the main wf, at least while we actively develop the monitor and add more and more stuff.

The parameters it expects are (descripion and example value in parentheses):

  • stats_db_name (stats database) ("openaire_prod_stats_20240125")
  • graph_db_name(graph database) ("openaire_prod_20240125")
  • monitor_irish_db_name (target monitor db) ("openaire_prod_stats_monitor_ie_20240125")
  • monitor_irish_db_prod_name (production monitor db) ("openaire_prod_stats_monitor_ie")
  • monitor_irish_db_shadow_name (shadow monitor db) ("openaire_prod_stats_monitor_ie_shadow")
  • hive_metastore_uris (hive server metastore URIs)
  • hive_jdbc_url (hive server jdbc url)
  • hive_timeout (...)
  • hadoop_user_name (user name of the wf owner)

dhp-stats-hist-snaps

Creates and updates a db of some historical snapshots from the graph. It should run after the stats-update and the dhp-stats-monitor-irish.

The parameters for this wf must still be described.

This PR echoes the changes from #372 and #405 already merged in the beta branch to the master one. It features various fixes to the procedures and 2 new wfs: ## dhp-stats-monitor-irish Builds the db for the Irish Monitor and is meant to run after the stats-update one. They are separate wfs because we want to be able to run this on demand, more often than the main wf, at least while we actively develop the monitor and add more and more stuff. The parameters it expects are (descripion and example value in parentheses): * stats_db_name (stats database) ("openaire_prod_stats_20240125") * graph_db_name(graph database) ("openaire_prod_20240125") * monitor_irish_db_name (target monitor db) ("openaire_prod_stats_monitor_ie_20240125") * monitor_irish_db_prod_name (production monitor db) ("openaire_prod_stats_monitor_ie") * monitor_irish_db_shadow_name (shadow monitor db) ("openaire_prod_stats_monitor_ie_shadow") * hive_metastore_uris (hive server metastore URIs) * hive_jdbc_url (hive server jdbc url) * hive_timeout (...) * hadoop_user_name (user name of the wf owner) ## dhp-stats-hist-snaps Creates and updates a db of some historical snapshots from the graph. It should run after the stats-update and the dhp-stats-monitor-irish. The parameters for this wf must still be described.
claudio.atzori added 28 commits 2024-03-26 12:13:00 +01:00
40b98d8182 Changes to indicators and funders definition
- Changes result_refereed definition
- Added result_country indicator
- Added indi_pub_green_with_license indicator
- Added country from jurisdiction to funders
ffdd03d2f4 Monitor Irish Stats WF
Parameters (with examples):
stats_db_name=openaire_beta_stats_20231208
monitor_irish_db_name=openaire_beta_stats_monitor_ie_20231208b
monitor_irish_db_prod_name=openaire_beta_stats_monitor_ie
graph_db_name=openaire_beta_20231208
monitor_irish_db_shadow_name=openaire_beta_stats_monitor_ie_shadow
hive_timeout=150000
hadoop_user_name=dnet.beta
resumeFrom=Step1-buildIrishMonitorDB
75bfde043c Historical Snapshots Workflow
Create historical snapshots db with parameters:

hist_db_name=openaire_beta_historical_snapshots_xxx
hist_db_name_prev=openaire_beta_historical_snapshots_xxx (previous run of wf)
stats_db_name=openaire_beta_stats_xxx
stats_irish_db_name=openaire_beta_stats_monitor_ie_xxx
monitor_db_name=openaire_beta_stats_monitor_xxx
monitor_db_prod_name=openaire_beta_stats_monitor
monitor_irish_db_name=openaire_beta_stats_monitor_ie_xxx
monitor_irish_db_prod_name=openaire_beta_stats_monitor_ie
hist_db_prod_name=openaire_beta_historical_snapshots
hist_db_shadow_name=openaire_beta_historical_snapshots_shadow
hist_date=122023
hive_timeout=150000
hadoop_user_name=xxx
resumeFrom=CreateDB
claudio.atzori merged commit 09a6d17059 into master 2024-03-26 12:18:27 +01:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#406
No description provided.