forked from D-Net/dnet-hadoop
convert_hive_to_spark_actions #1
antonis.lempesis
commented 2024-09-23 13:53:08 +02:00
Owner
No description provided.
antonis.lempesis
added 20 commits 2024-09-23 13:53:08 +02:00
db33f7727c
Update "dhp-stats-update" workflow to use "spark"-actions, instead of "hive" ones.
0b897f2f66
Fix and add missing "DROP TABLE" statements, in "dhp-stats-update" sql-scripts.
ca091c0f1e
dhp-stats-update:
6f2ebb2a52
Revert Step8 and Step11 to use Hive again, since their "UPDATE" statements are not supported by Spark.
d46b78b659
dhp-stats-update:
ba533d9f34
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into convert_hive_to_spark_actions
2616971e2b
dhp-stats-update: remove leftover duplicate line
342223f75c
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into convert_hive_to_spark_actions
69a9ac7393
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into convert_hive_to_spark_actions
3c17183d10
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into convert_hive_to_spark_actions
e0ac494859
Merge branch 'beta' into convert_hive_to_spark_actions
888637773c
Add missing "/*EOS*/" comments.
a644a6f4fe
Catch Spark-sql errors and show a log with the statement that failed.
fe2275a9b0
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into convert_hive_to_spark_actions
54e11b6a43
Improve performance and efficiency by rewriting the creation process of "publication", "project", "dataset", "datasource", "software", "otherresearchproduct" and "result" tables, to be performed in a single query, for each one.
aa4d7d5e20
Prioritize the rest of the stats-queries over other tasks on the cluster, by putting them in the "analytics" queue.
7ce051d766
- Update the remaining hive-actions to spark-actions.
7b7dd32ad5
- Fix placement of some "set mapred.job.queue.name=analytics" statements and remove their unused "/*EOS*/" indicator.
ce0aee21cc
Improve performance of transferring the stats-DBs to another cluster and querying the DBs' tables, by ordering Spark to create up to 100 files per table, instead of thousands.
e9686365a2
Improve performance of creating the "result_fos" table, by using a temp-table to cache data, which is requested multiple times.
antonis.lempesis
merged commit c9241dba0d into beta 2024-09-23 13:53:29 +02:00
antonis.lempesis
referenced this issue from a commit 2024-09-23 13:53:30 +02:00
Merge pull request 'convert_hive_to_spark_actions' (#1) from convert_hive_to_spark_actions into beta
Loading…
Reference in New Issue
No description provided.
Delete Branch "convert_hive_to_spark_actions"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?