dnet-hadoop

Commit Graph

Author	SHA1	Message	Date
Lampros Smyrnaios	14719dcd62	Miscellaneous updates to the copying operation to Impala Cluster: - Update the algorithm for creating views that depend on other views. - Add check for successful execution of the "hadoop distcp" command. - Add a check for successful copy operation of all entities. - Upon facing an error in a DB, exit the method, instead of the whole script. - Improve logging. - Code polishing.	2024-04-12 15:36:13 +03:00
Lampros Smyrnaios	abf0b69f29	Upgrade the copying operation to Impala Cluster: - Use only hive commands in the Ocean Cluster, as the "impala-shell" will be removed from there to free-up resources. - Hugely improve the performance in every aspect of the copying process: a) speedup file-transferring and DB-deletion, b) eliminate permissions-assignment, "load" operations and "use $db" queries, c) retry only the "create view" statements and only as long as they depend on other non-created views, instead of trying to recreate all tables and views 5 consecutive times. - Add error-checks for the creation of tables and views.	2024-04-11 17:12:12 +03:00
Lampros Smyrnaios	b7c8acc563	- Update the code which acquires the "IMPALA_HDFS_NODE", to test the "tmp"-dir, instead of the base-dir and introduce retries, to overcome potential file-system failures. This change was suggested by "Sebastian Tymkow" and "Grzegorz Bakalarski". - Fix typos.	2024-04-03 13:15:37 +03:00
Lampros Smyrnaios	92cc27e7eb	Use the ACTIVE HDFS NODE for Impala cluster, in "copyDataToImpalaCluster.sh" script.	2024-03-26 12:34:11 +02:00
dimitrispie	a94a54a2d0	Changes for tables and creation of the new indicator indi_is_result_accessible - Drop table statements for all tables to avoid duplicates in case of wf rerun - Add pdfsaggregated step to create the indi_is_result_accessible table. This step is executed on the new impala cluster only, since the pdfaggregation_i is updated on this cluster.	2023-11-15 14:32:18 +02:00
dimitrispie	163b2ee2a8	Changes 1. Monitor updates 2. Bug fixes during copy to impala cluster	2023-07-13 15:25:00 +03:00
dimitrispie	42b8ce2ba4	Update copyDataToImpalaCluster.sh	2023-06-14 19:23:42 +03:00
dimitrispie	2032b0df40	Bug fixes 1. Remove tables/views from old databases in the new cluster, before dropping the dbs 2. Fix id in result_accessroute, indi_impact_measures, indi_pub_bronze_oa	2023-06-14 19:09:09 +03:00
dimitrispie	2324670714	Split Monitor DBs-Interdisciplinarity indicators - Split DBs Monitor for faster rendering of visualizations - Add interdisciplinarity indicators from result_fos	2023-06-02 13:34:16 +03:00
dimitrispie	86f4f63daf	Updates to steps related to transfer data to impala cluster 1. Remove external table definitions in stats_ext 2. Fix the issue where some views are not created. 3. Added two workflow parameters for copying also the usage stats dbs	2023-05-18 09:33:05 +03:00
dimitrispie	b3f9633205	Update copyDataToImpalaCluster.sh Added option --user to impala-shell command	2023-05-15 12:51:44 +03:00
dimitrispie	00d0d162b6	Update copyDataToImpalaCluster.sh Added a temporary folder to copy the files to avoid permission issues	2023-05-12 12:31:13 +03:00
dimitrispie	032a401cbf	Bug fixes	2023-02-20 09:29:20 +02:00
dimitrispie	3400133c2f	Bug fix	2023-02-13 09:44:00 +02:00
dimitrispie	7b78b15c81	Changes for copying to Impala Cluster	2023-02-13 09:27:00 +02:00
dimitrispie	35ba8bb328	Bug fixes	2023-02-09 12:57:57 +02:00
dimitrispie	2dc6d47270	Changes 06022023	2023-02-06 13:18:53 +02:00
Antonis Lempesis	1ddea4f442	removed 'stored as parquet' from views..	2022-12-21 12:41:33 +02:00

18 Commits