dnet-hadoop

sabeel

Author	SHA1	Message	Date
Claudio Atzori	66548e6a83	Merge pull request 'changes in copy script' (#438 ) from antonis.lempesis/dnet-hadoop:beta into beta Reviewed-on: D-Net/dnet-hadoop#438	2024-05-27 11:54:03 +02:00
Lampros Smyrnaios	b48ed6e617	Change configuration in the copy-operation to Impala Cluster: Set the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR" parameter to "false".	2024-05-23 16:58:12 +03:00
Lampros Smyrnaios	68322843e2	Small updates to the copy-operation to Impala Cluster: - Add a configuration-"switch" to control whether the script exits upon an error or not. - Allow the script to exit when a table could not be created. - Show the elapsed time for processing each database.	2024-05-23 15:07:49 +03:00
Lampros Smyrnaios	c7b32bbacc	Update CopyDataToImpalaCluster: Update the code of acquiring the entities from Ocean cluster, through hive, in order to optimize the process and account for additional reserved keywords in Impala. Co-authored-by: Antonis Lempesis <antleb@di.uoa.gr>	2024-05-23 13:00:19 +03:00
Sandro La Bruzzo	66c1ffc866	merged again from beta (I hope for the last time)	2024-05-22 11:02:46 +02:00
Lampros Smyrnaios	49af2e5740	Miscellaneous updates to the copying operation to Impala Cluster: - Update the algorithm for creating views that depend on other views; overcome some bash-instabilities. - Upon any error, fail the whole process, not just the current DB-creation, as those errors usually indicate a bug in the initial DB-creation, that should be fixed immediately. - Enhance parallel-copy of large files by "hadoop distcp" command. - Reduce the "invalidate metadata" commands to just the current DB's tables, in order to eliminate the general overhead on Impala. - Show the number of tables and views in the logs. - Fix some log-messages.	2024-04-23 17:15:04 +03:00
Sandro La Bruzzo	b84ad0c06e	merged beta	2024-04-19 14:39:59 +02:00
Claudio Atzori	589bce3520	Merge pull request '[pBETA] Improvements to copying data from ocean to impala' (#421 ) from antonis.lempesis/dnet-hadoop:beta into beta Reviewed-on: D-Net/dnet-hadoop#421	2024-04-16 14:22:32 +02:00
Lampros Smyrnaios	d7da4f814b	Minor updates to the copying operation to Impala Cluster: - Improve logging. - Code optimization/polishing.	2024-04-12 18:12:06 +03:00
Lampros Smyrnaios	14719dcd62	Miscellaneous updates to the copying operation to Impala Cluster: - Update the algorithm for creating views that depend on other views. - Add check for successful execution of the "hadoop distcp" command. - Add a check for successful copy operation of all entities. - Upon facing an error in a DB, exit the method, instead of the whole script. - Improve logging. - Code polishing.	2024-04-12 15:36:13 +03:00
Lampros Smyrnaios	abf0b69f29	Upgrade the copying operation to Impala Cluster: - Use only hive commands in the Ocean Cluster, as the "impala-shell" will be removed from there to free-up resources. - Hugely improve the performance in every aspect of the copying process: a) speedup file-transferring and DB-deletion, b) eliminate permissions-assignment, "load" operations and "use $db" queries, c) retry only the "create view" statements and only as long as they depend on other non-created views, instead of trying to recreate all tables and views 5 consecutive times. - Add error-checks for the creation of tables and views.	2024-04-11 17:12:12 +03:00
Claudio Atzori	26b97aa5ed	Merge pull request '[BETA] fixed the result_country definition and updated the stats DB copy procedure' (#416 ) from antonis.lempesis/dnet-hadoop:beta into beta Reviewed-on: D-Net/dnet-hadoop#416	2024-04-03 12:36:03 +02:00
Lampros Smyrnaios	b7c8acc563	- Update the code which acquires the "IMPALA_HDFS_NODE", to test the "tmp"-dir, instead of the base-dir and introduce retries, to overcome potential file-system failures. This change was suggested by "Sebastian Tymkow" and "Grzegorz Bakalarski". - Fix typos.	2024-04-03 13:15:37 +03:00
Claudio Atzori	730eaffc85	Merge pull request 'correctly selecting the active hdfs node for the impala cluster' (#405 ) from antonis.lempesis/dnet-hadoop:beta into beta Reviewed-on: D-Net/dnet-hadoop#405	2024-03-26 12:07:46 +01:00
Lampros Smyrnaios	bc8c97182d	Automatically select the ACTIVE HDFS NODE for Impala cluster, in all "copyDataToImpalaCluster.sh" scripts.	2024-03-26 13:01:12 +02:00
Claudio Atzori	ef52128c55	included new stats* workflows in parent pom list of modules, code formatting	2024-03-26 10:42:10 +01:00
Antonis Lempesis	3c79720342	fixed the irish result subset	2024-03-07 14:08:57 +02:00
dimitrispie	6b823100ae	Update buildIrishMonitorDB.sql New indicators added	2024-01-07 22:54:39 +02:00
dimitrispie	ffdd03d2f4	Monitor Irish Stats WF Parameters (with examples): stats_db_name=openaire_beta_stats_20231208 monitor_irish_db_name=openaire_beta_stats_monitor_ie_20231208b monitor_irish_db_prod_name=openaire_beta_stats_monitor_ie graph_db_name=openaire_beta_20231208 monitor_irish_db_shadow_name=openaire_beta_stats_monitor_ie_shadow hive_timeout=150000 hadoop_user_name=dnet.beta resumeFrom=Step1-buildIrishMonitorDB	2023-12-22 11:05:24 +02:00

19 Commits