Improvements to copying data from ocean to impala #420

Merged
claudio.atzori merged 4 commits from antonis.lempesis/dnet-hadoop:beta into master 2024-04-16 14:17:48 +02:00

these 4 commits fix the namespace issue with the impala cluster and also greatly greatly improves the performance (hours to minutes)

these 4 commits fix the namespace issue with the impala cluster and also greatly greatly improves the performance (hours to minutes)
antonis.lempesis added 4 commits 2024-04-16 14:15:32 +02:00
abf0b69f29 Upgrade the copying operation to Impala Cluster:
- Use only hive commands in the Ocean Cluster, as the "impala-shell" will be removed from there to free-up resources.
- Hugely improve the performance in every aspect of the copying process: a) speedup file-transferring and DB-deletion, b) eliminate permissions-assignment, "load" operations and "use $db" queries, c) retry only the "create view" statements and only as long as they depend on other non-created views, instead of trying to recreate all tables and views 5 consecutive times.
- Add error-checks for the creation of tables and views.
14719dcd62 Miscellaneous updates to the copying operation to Impala Cluster:
- Update the algorithm for creating views that depend on other views.
- Add check for successful execution of the "hadoop distcp" command.
- Add a check for successful copy operation of all entities.
- Upon facing an error in a DB, exit the method, instead of the whole script.
- Improve logging.
- Code polishing.
d7da4f814b Minor updates to the copying operation to Impala Cluster:
- Improve logging.
- Code optimization/polishing.
claudio.atzori merged commit 013935c593 into master 2024-04-16 14:17:48 +02:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#420
No description provided.