Antonis Lempesis
eed6f21025
broadcasting moar
2024-10-10 01:48:49 +03:00
Antonis Lempesis
6f426383e6
added broadcast hints
2024-10-10 01:27:52 +03:00
Antonis Lempesis
a3f1be5857
Analyzing correct table
2024-10-08 23:43:50 +03:00
Antonis Lempesis
a0a559eb81
Analyzing correct table
2024-10-08 13:29:28 +03:00
Antonis Lempesis
d9a8ead200
ANALYZE TABLE is not supported on views.
2024-10-08 01:17:32 +03:00
Antonis Lempesis
9c9b871b2b
deleted duplicate statement
2024-10-08 00:45:46 +03:00
Antonis Lempesis
f89a701467
deleted duplicate statement
2024-10-08 00:27:06 +03:00
Antonis Lempesis
413f8928cd
it's late...
2024-10-07 23:49:15 +03:00
Antonis Lempesis
d6f2d06a9c
finished the statement...
2024-10-07 23:07:55 +03:00
Antonis Lempesis
d645f869bc
it's analyZe not analySe you moron
2024-10-04 11:04:24 +03:00
Antonis Lempesis
e780b15619
it's analyZe not analySe you moron
2024-10-04 00:43:14 +03:00
Antonis Lempesis
d0a96c8a07
using refereed for result_peerreviewed
2024-10-03 23:26:21 +03:00
Antonis Lempesis
b6e9ffb7e3
computing stats after every table creation
2024-10-03 10:57:10 +03:00
Antonis Lempesis
2ed9c5504c
computing stats after every table creation
2024-10-03 10:55:56 +03:00
Antonis Lempesis
f3c179658a
datasource table creation split in steps
2024-09-30 17:12:21 +03:00
Antonis Lempesis
619aa34a15
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta
2024-09-23 15:25:59 +03:00
Antonis Lempesis
dbea7a4072
removed duplicate line
2024-09-23 14:57:11 +03:00
Antonis Lempesis
c9241dba0d
Merge pull request 'convert_hive_to_spark_actions' ( #1 ) from convert_hive_to_spark_actions into beta
...
Reviewed-on: #1
2024-09-23 13:53:28 +02:00
Antonis Lempesis
37ad259296
cleanup
2024-09-05 16:02:44 +03:00
Antonis Lempesis
b64c144abf
added new institutions
2024-09-05 16:00:09 +03:00
Antonis Lempesis
d0590e0e49
added latest institutions
2024-07-23 15:17:15 +03:00
Antonis Lempesis
7d2c0a3723
added new institutions
2024-07-23 15:10:17 +03:00
Lampros Smyrnaios
e9686365a2
Improve performance of creating the "result_fos" table, by using a temp-table to cache data, which is requested multiple times.
2024-07-03 20:24:36 +03:00
Lampros Smyrnaios
ce0aee21cc
Improve performance of transferring the stats-DBs to another cluster and querying the DBs' tables, by ordering Spark to create up to 100 files per table, instead of thousands.
2024-07-03 20:15:33 +03:00
Lampros Smyrnaios
7b7dd32ad5
- Fix placement of some "set mapred.job.queue.name=analytics" statements and remove their unused "/*EOS*/" indicator.
...
- Add stacktrace-info to failed actions.
2024-07-03 19:53:24 +03:00
Lampros Smyrnaios
7ce051d766
- Update the remaining hive-actions to spark-actions.
...
- Update the version of shell-actions.
- Fix missing "/*EOS*/" indicators.
2024-07-03 19:49:19 +03:00
Lampros Smyrnaios
aa4d7d5e20
Prioritize the rest of the stats-queries over other tasks on the cluster, by putting them in the "analytics" queue.
2024-07-03 19:14:25 +03:00
Lampros Smyrnaios
54e11b6a43
Improve performance and efficiency by rewriting the creation process of "publication", "project", "dataset", "datasource", "software", "otherresearchproduct" and "result" tables, to be performed in a single query, for each one.
2024-07-03 13:03:15 +03:00
Lampros Smyrnaios
fe2275a9b0
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into convert_hive_to_spark_actions
...
# Conflicts:
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step14.sql
2024-06-25 20:17:47 +03:00
Lampros Smyrnaios
66cd28f70a
- Fix not using the "export HADOOP_USER_NAME" statement in "createPDFsAggregated.sh", which caused permission-issues when creating tables with Impala.
...
- Remove unused "--user" parameter in "impala-shell" calls.
- Code polishing.
2024-06-20 14:33:46 +03:00
Lampros Smyrnaios
c6b1ab2a18
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2024-06-20 14:33:05 +03:00
Miriam Baglioni
d35edac212
[IrishFunderList]make changed according to 9635 comment 20, 21, 22 and 23
2024-06-20 12:28:28 +02:00
Miriam Baglioni
6421f8fece
Merge remote-tracking branch 'origin/beta' into beta
2024-06-19 11:12:15 +02:00
Miriam Baglioni
ac270f795b
[IrishFunderList]make changed according to 9635 comment 14, 15 and 16
2024-06-19 11:11:52 +02:00
Lampros Smyrnaios
236aed8954
Merge remote-tracking branch 'origin/beta' into beta
2024-06-18 17:12:35 +03:00
Claudio Atzori
dd541f8cf5
Merge pull request 'Miscellaneous updates to the copying operation to Impala Cluster.' ( #447 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#447
2024-06-18 15:52:30 +02:00
Lampros Smyrnaios
ff335578ea
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2024-06-18 14:52:31 +03:00
Lampros Smyrnaios
285416c74e
Merge branch 'beta' into beta
2024-06-18 13:50:38 +02:00
Lampros Smyrnaios
3095047e5e
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Fix not breaking out of the VIEWS-infinite-loop when the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR" is set to "false".
- Exit the script when no HDFS-active-node was found, independently of the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR".
- Fix view_name-recognition in a log-message, by using the more advanced "Perl-Compatible Regular Expressions" in "grep".
- Add error-handling for "compute stats" errors.
2024-06-18 14:40:41 +03:00
Antonis Lempesis
0456f1b788
Merge remote-tracking branch 'origin/beta' into beta
2024-06-14 15:11:30 +03:00
Antonis Lempesis
38636942c7
filtering out deletedbyinference and invinsible results from accessroute
2024-06-14 15:11:19 +03:00
Lampros Smyrnaios
d942a1101b
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Show some counts and the elapsed time for various sub-tasks.
- Code polishing.
2024-06-14 12:14:38 +03:00
Giambattista Bloisi
9bf2bda1c6
Fix: next returned a null value at end of stream
2024-06-12 13:28:51 +02:00
Giambattista Bloisi
d90cb099b8
Fix for paginationStart parameter management
2024-06-11 20:23:44 +02:00
Claudio Atzori
11fe3a4fe0
[graph resolution] use sparkExecutorMemory to define also the memoryOverhead
2024-06-11 14:21:17 +02:00
Claudio Atzori
a8d68c9d29
avoid NPEs
2024-06-11 14:19:24 +02:00
Miriam Baglioni
8fe934810f
Merge remote-tracking branch 'origin/beta' into beta
2024-06-11 10:28:51 +02:00
Miriam Baglioni
9da006e98c
[SDGFoSActionSet]remove datainfo for the result. It is not needed (qualifier.classid = UPDATE) useless since subject do not go at the level of the instance
2024-06-11 10:28:32 +02:00
Giambattista Bloisi
85c1eae7e0
Fixes for pagination strategy looping at end of download
2024-06-10 19:03:58 +02:00
Claudio Atzori
b0eba210c0
[actionset promotion] use sparkExecutorMemory to define also the memoryOverhead
2024-06-10 16:15:24 +02:00