Commit Graph

5267 Commits

Author SHA1 Message Date
Lampros Smyrnaios ce0aee21cc Improve performance of transferring the stats-DBs to another cluster and querying the DBs' tables, by ordering Spark to create up to 100 files per table, instead of thousands. 2024-07-03 20:15:33 +03:00
Lampros Smyrnaios 7b7dd32ad5 - Fix placement of some "set mapred.job.queue.name=analytics" statements and remove their unused "/*EOS*/" indicator.
- Add stacktrace-info to failed actions.
2024-07-03 19:53:24 +03:00
Lampros Smyrnaios 7ce051d766 - Update the remaining hive-actions to spark-actions.
- Update the version of shell-actions.
- Fix missing "/*EOS*/" indicators.
2024-07-03 19:49:19 +03:00
Lampros Smyrnaios aa4d7d5e20 Prioritize the rest of the stats-queries over other tasks on the cluster, by putting them in the "analytics" queue. 2024-07-03 19:14:25 +03:00
Lampros Smyrnaios 54e11b6a43 Improve performance and efficiency by rewriting the creation process of "publication", "project", "dataset", "datasource", "software", "otherresearchproduct" and "result" tables, to be performed in a single query, for each one. 2024-07-03 13:03:15 +03:00
Lampros Smyrnaios fe2275a9b0 Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into convert_hive_to_spark_actions
# Conflicts:
#	dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step14.sql
2024-06-25 20:17:47 +03:00
Lampros Smyrnaios 66cd28f70a - Fix not using the "export HADOOP_USER_NAME" statement in "createPDFsAggregated.sh", which caused permission-issues when creating tables with Impala.
- Remove unused "--user" parameter in "impala-shell" calls.
- Code polishing.
2024-06-20 14:33:46 +03:00
Lampros Smyrnaios c6b1ab2a18 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2024-06-20 14:33:05 +03:00
Miriam Baglioni d35edac212 [IrishFunderList]make changed according to 9635 comment 20, 21, 22 and 23 2024-06-20 12:28:28 +02:00
Miriam Baglioni 6421f8fece Merge remote-tracking branch 'origin/beta' into beta 2024-06-19 11:12:15 +02:00
Miriam Baglioni ac270f795b [IrishFunderList]make changed according to 9635 comment 14, 15 and 16 2024-06-19 11:11:52 +02:00
Lampros Smyrnaios 236aed8954 Merge remote-tracking branch 'origin/beta' into beta 2024-06-18 17:12:35 +03:00
Claudio Atzori dd541f8cf5 Merge pull request 'Miscellaneous updates to the copying operation to Impala Cluster.' (#447) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #447
2024-06-18 15:52:30 +02:00
Lampros Smyrnaios ff335578ea Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2024-06-18 14:52:31 +03:00
Lampros Smyrnaios 285416c74e Merge branch 'beta' into beta 2024-06-18 13:50:38 +02:00
Lampros Smyrnaios 3095047e5e Miscellaneous updates to the copying operation to Impala Cluster:
- Fix not breaking out of the VIEWS-infinite-loop when the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR" is set to "false".
- Exit the script when no HDFS-active-node was found, independently of the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR".
- Fix view_name-recognition in a log-message, by using the more advanced "Perl-Compatible Regular Expressions" in "grep".
- Add error-handling for "compute stats" errors.
2024-06-18 14:40:41 +03:00
Antonis Lempesis 0456f1b788 Merge remote-tracking branch 'origin/beta' into beta 2024-06-14 15:11:30 +03:00
Antonis Lempesis 38636942c7 filtering out deletedbyinference and invinsible results from accessroute 2024-06-14 15:11:19 +03:00
Lampros Smyrnaios d942a1101b Miscellaneous updates to the copying operation to Impala Cluster:
- Show some counts and the elapsed time for various sub-tasks.
- Code polishing.
2024-06-14 12:14:38 +03:00
Giambattista Bloisi 9bf2bda1c6 Fix: next returned a null value at end of stream 2024-06-12 13:28:51 +02:00
Giambattista Bloisi d90cb099b8 Fix for paginationStart parameter management 2024-06-11 20:23:44 +02:00
Claudio Atzori 11fe3a4fe0 [graph resolution] use sparkExecutorMemory to define also the memoryOverhead 2024-06-11 14:21:17 +02:00
Claudio Atzori a8d68c9d29 avoid NPEs 2024-06-11 14:19:24 +02:00
Miriam Baglioni 8fe934810f Merge remote-tracking branch 'origin/beta' into beta 2024-06-11 10:28:51 +02:00
Miriam Baglioni 9da006e98c [SDGFoSActionSet]remove datainfo for the result. It is not needed (qualifier.classid = UPDATE) useless since subject do not go at the level of the instance 2024-06-11 10:28:32 +02:00
Giambattista Bloisi 85c1eae7e0 Fixes for pagination strategy looping at end of download 2024-06-10 19:03:58 +02:00
Claudio Atzori b0eba210c0 [actionset promotion] use sparkExecutorMemory to define also the memoryOverhead 2024-06-10 16:15:24 +02:00
Claudio Atzori 3776327a8c hostedby patching to work with the updated Crossref contents, resolved conflict 2024-06-10 15:24:12 +02:00
Claudio Atzori 0139f23d66 Merge pull request 'organization type from OpenOrgs' (#445) from import_openorg_type into beta
Reviewed-on: #445
2024-06-07 12:17:31 +02:00
Michele Artini c726572418 changed some parameters in OSF test 2024-06-07 12:03:26 +02:00
Claudio Atzori ec79405cc9 [graph raw] set organization type from openorgs 2024-06-07 11:30:31 +02:00
Miriam Baglioni 1477406ecc [bulkTag] fixed issue that made project disappear in graph_10_enriched 2024-06-06 10:45:41 +02:00
Claudio Atzori 92c3abd5a4 [graph cleaning] use sparkExecutorMemory to define also the memoryOverhead 2024-06-06 10:44:33 +02:00
Claudio Atzori ce2364743a applying changes from PR#442: Fix for missing collectedfrom after dedup 2024-06-06 10:43:43 +02:00
Claudio Atzori f70dc76b61 minor 2024-06-06 10:43:10 +02:00
Claudio Atzori 73bd1938a5 [graph2hive] use sparkExecutorMemory to define also the memoryOverhead 2024-06-05 12:17:35 +02:00
Claudio Atzori da5c1e73a4 Merge pull request 'Irish oaipmh exporter' (#443) from irish-oaipmh-exporter into beta
Reviewed-on: #443
2024-06-05 10:55:09 +02:00
Claudio Atzori 81090ad593 [IE OAIPHM] added oozie workflow, minor changes, code formatting 2024-06-05 10:03:33 +02:00
Claudio Atzori a02f3f0d2b code formatting 2024-05-30 10:21:18 +02:00
Alessia Bardi eadfd8d71d Merge pull request 'Updated XMLIterator for splitting on different nodes' (#436) from dblp_collection_plugin into beta
Reviewed-on: #436
2024-05-29 16:05:06 +02:00
Alessia Bardi 05ee783c07 Merge branch 'beta' into dblp_collection_plugin 2024-05-29 16:04:39 +02:00
Alessia Bardi fe9fb59c90 Merge pull request 'Rest collector plugin on hadoop supports a new param to pass request headers' (#441) from rest-collector-request-header-map into beta
Reviewed-on: #441
2024-05-29 15:54:39 +02:00
Claudio Atzori c272c4ad68 code formatting 2024-05-29 15:50:07 +02:00
Alessia Bardi c5f4da16a4 Merge branch 'beta' into rest-collector-request-header-map 2024-05-29 15:46:23 +02:00
Alessia 1b165a14a0 Rest collector plugin on hadoop supports a new param to pass request headers 2024-05-29 15:41:36 +02:00
Michele Artini e996787be2 OSF test 2024-05-29 15:05:17 +02:00
Claudio Atzori 62716141c5 Merge pull request 'Miscellaneous updates to the copying operation to Impala Cluster' (#440) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #440
2024-05-29 14:34:51 +02:00
Miriam Baglioni 5d85b70e1f [NOAMI] removed Ireland funder id 501100011103. ticket 9635 2024-05-29 11:55:00 +02:00
Lampros Smyrnaios a644a6f4fe Catch Spark-sql errors and show a log with the statement that failed. 2024-05-29 12:10:11 +03:00
Lampros Smyrnaios e3f28338c1 Miscellaneous updates to the copying operation to Impala Cluster:
- Assign the WRITE and EXECUTE permissions to the DBs' HDFS-directories, in order to be able to create tables on top of them, in the Impala Cluster.
- Make sure the "copydb" function returns early, when it encounters a fatal error, while respecting the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR" config.
2024-05-28 17:51:45 +03:00