dnet-hadoop/dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app
Lampros Smyrnaios d46b78b659 dhp-stats-update:
- Set Steps 2-7 and 9 to limit the amount of files generated by Spark, from 8000, down to 100, to improve file-transfer and querying performance.
- Allow the workflow to run up to Step10. The Step11 seems to have some issues even when using hive-action.
2024-04-18 15:40:27 +03:00
..
scripts dhp-stats-update: 2024-04-18 15:40:27 +03:00
config-default.xml Added memory to hive 2023-06-07 18:18:23 +03:00
contexts.sh Generate tables with parquet-files, instead of csv, in "dhp-stats-update/.../contexts.sh" script. 2024-03-26 13:29:04 +02:00
copyDataToImpalaCluster.sh Minor updates to the copying operation to Impala Cluster: 2024-04-12 18:12:06 +03:00
createPDFsAggregated.sh Changes for tables and creation of the new indicator indi_is_result_accessible 2023-11-15 14:32:18 +02:00
finalizeImpalaCluster.sh Changes 2023-07-13 15:25:00 +03:00
finalizedb.sh Changes 06022023 2023-02-06 13:18:53 +02:00
indicators.sh Bug fixes 2023-02-20 09:29:20 +02:00
monitor-post.sh Add monitor post step 2023-02-09 13:44:14 +02:00
monitor.sh - Update the code which acquires the "IMPALA_HDFS_NODE", to test the "tmp"-dir, instead of the base-dir and introduce retries, to overcome potential file-system failures. This change was suggested by "Sebastian Tymkow" and "Grzegorz Bakalarski". 2024-04-03 13:15:37 +03:00
observatory-post.sh Add monitor post step 2023-02-09 13:44:14 +02:00
observatory-pre.sh Changes 06022023 2023-02-06 13:18:53 +02:00
updateCache.sh [stats] reducing the step22 wait time 2021-10-20 14:14:53 +02:00
workflow.xml dhp-stats-update: 2024-04-18 15:40:27 +03:00