Claudio Atzori
1180d78b71
make entity level pids unique by pidType:pidValue
2024-07-04 09:41:12 +02:00
Lampros Smyrnaios
e9686365a2
Improve performance of creating the "result_fos" table, by using a temp-table to cache data, which is requested multiple times.
2024-07-03 20:24:36 +03:00
Lampros Smyrnaios
ce0aee21cc
Improve performance of transferring the stats-DBs to another cluster and querying the DBs' tables, by ordering Spark to create up to 100 files per table, instead of thousands.
2024-07-03 20:15:33 +03:00
Lampros Smyrnaios
7b7dd32ad5
- Fix placement of some "set mapred.job.queue.name=analytics" statements and remove their unused "/*EOS*/" indicator.
...
- Add stacktrace-info to failed actions.
2024-07-03 19:53:24 +03:00
Lampros Smyrnaios
7ce051d766
- Update the remaining hive-actions to spark-actions.
...
- Update the version of shell-actions.
- Fix missing "/*EOS*/" indicators.
2024-07-03 19:49:19 +03:00
Lampros Smyrnaios
aa4d7d5e20
Prioritize the rest of the stats-queries over other tasks on the cluster, by putting them in the "analytics" queue.
2024-07-03 19:14:25 +03:00
Claudio Atzori
bb12d0b4df
removed legacy actionmanager dependencies
2024-07-03 16:26:39 +02:00
Lampros Smyrnaios
54e11b6a43
Improve performance and efficiency by rewriting the creation process of "publication", "project", "dataset", "datasource", "software", "otherresearchproduct" and "result" tables, to be performed in a single query, for each one.
2024-07-03 13:03:15 +03:00
Claudio Atzori
7d3292551b
ignore dates containing 'null's
2024-07-02 15:44:31 +02:00
Claudio Atzori
c7634c55c7
Merge pull request '[beta] implementation of countryMatch and addition of workflow parameters' ( #451 ) from openorgs_fixes into beta
...
Reviewed-on: #451
2024-07-01 09:22:56 +02:00
Miriam Baglioni
a2b708bb71
[AffiliationIngestion]refactoring
2024-06-29 18:36:47 +02:00
Miriam Baglioni
9cbe966b4a
[AffiliationIngestion]refactoring
2024-06-29 18:35:49 +02:00
Miriam Baglioni
236b64d830
[AffiliationIngestion]Extended the ingestion of affiliation from open aire to include also links derived from Web Crawl. Extended the test. Inserted in Constatns the id and name of the webcrawl datasource to be used here and also in the ingestion of links from web crawl
2024-06-29 18:29:20 +02:00
Miriam Baglioni
67ff783e65
[Person]First implementation to include Person entity in the graph
2024-06-29 17:13:01 +02:00
Michele De Bonis
a10e8d9f05
implementation of countryMatch and addition of workflow parameters
2024-06-28 16:46:52 +02:00
Claudio Atzori
14539f9c8b
[graph provision] publicFormat worfklow parameter defined as optional
2024-06-28 14:55:18 +02:00
Claudio Atzori
1bc8c5d173
[graph provision] fixed serialization of the instancetypes
2024-06-28 14:54:28 +02:00
Claudio Atzori
1ccf01cdb8
Using the updated Solr JSON payload model classes
2024-06-28 12:38:07 +02:00
Claudio Atzori
b79cb155ba
Merge pull request 'Fix permissions-issue in Stats-workflow, step22a-createPDFsAggregated.' ( #450 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #450
2024-06-26 10:11:34 +02:00
Claudio Atzori
33a02c5b9e
Merge pull request 'Change the selection criteria for the pivot record of a group so that by best pid type becomes the first criteria. This will have the effect to converge to records having DOI pid' ( #446 ) from pivotselectionbypid into beta
...
Reviewed-on: #446
2024-06-26 10:10:13 +02:00
Claudio Atzori
1182bca9eb
Merge pull request 'Add support to cretate/update solr collection aliases' ( #449 ) from 9872-create-solr-collection-aliases into beta
...
Reviewed-on: #449
2024-06-26 10:09:51 +02:00
Lampros Smyrnaios
fe2275a9b0
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into convert_hive_to_spark_actions
...
# Conflicts:
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step14.sql
2024-06-25 20:17:47 +03:00
Claudio Atzori
1c30eacac2
updated index feeding procedure to exploit the collection aliases
2024-06-25 15:27:38 +02:00
Claudio Atzori
6055212f77
merged from the json_payload branch
2024-06-25 12:39:02 +02:00
Claudio Atzori
0031cf849e
Merge branch 'beta' into 9872-create-solr-collection-aliases
2024-06-25 09:58:01 +02:00
Serafeim Chatzopoulos
9f6e16a03c
Add support to cretate/update solr collection aliases
2024-06-20 16:03:15 +03:00
Lampros Smyrnaios
66cd28f70a
- Fix not using the "export HADOOP_USER_NAME" statement in "createPDFsAggregated.sh", which caused permission-issues when creating tables with Impala.
...
- Remove unused "--user" parameter in "impala-shell" calls.
- Code polishing.
2024-06-20 14:33:46 +03:00
Lampros Smyrnaios
c6b1ab2a18
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2024-06-20 14:33:05 +03:00
Miriam Baglioni
d35edac212
[IrishFunderList]make changed according to 9635 comment 20, 21, 22 and 23
2024-06-20 12:28:28 +02:00
Miriam Baglioni
6421f8fece
Merge remote-tracking branch 'origin/beta' into beta
2024-06-19 11:12:15 +02:00
Miriam Baglioni
ac270f795b
[IrishFunderList]make changed according to 9635 comment 14, 15 and 16
2024-06-19 11:11:52 +02:00
Lampros Smyrnaios
236aed8954
Merge remote-tracking branch 'origin/beta' into beta
2024-06-18 17:12:35 +03:00
Claudio Atzori
dd541f8cf5
Merge pull request 'Miscellaneous updates to the copying operation to Impala Cluster.' ( #447 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #447
2024-06-18 15:52:30 +02:00
Lampros Smyrnaios
ff335578ea
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2024-06-18 14:52:31 +03:00
Lampros Smyrnaios
285416c74e
Merge branch 'beta' into beta
2024-06-18 13:50:38 +02:00
Lampros Smyrnaios
3095047e5e
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Fix not breaking out of the VIEWS-infinite-loop when the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR" is set to "false".
- Exit the script when no HDFS-active-node was found, independently of the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR".
- Fix view_name-recognition in a log-message, by using the more advanced "Perl-Compatible Regular Expressions" in "grep".
- Add error-handling for "compute stats" errors.
2024-06-18 14:40:41 +03:00
Antonis Lempesis
0456f1b788
Merge remote-tracking branch 'origin/beta' into beta
2024-06-14 15:11:30 +03:00
Antonis Lempesis
38636942c7
filtering out deletedbyinference and invinsible results from accessroute
2024-06-14 15:11:19 +03:00
Lampros Smyrnaios
d942a1101b
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Show some counts and the elapsed time for various sub-tasks.
- Code polishing.
2024-06-14 12:14:38 +03:00
Giambattista Bloisi
9bf2bda1c6
Fix: next returned a null value at end of stream
2024-06-12 13:28:51 +02:00
Giambattista Bloisi
d90cb099b8
Fix for paginationStart parameter management
2024-06-11 20:23:44 +02:00
Giambattista Bloisi
4f2a61e10f
Change the selection criteria for the pivot record of a group so that by best pid type becomes the first criteria. This will have the effect to slowly converge to records having DOI pid
2024-06-11 15:33:56 +02:00
Claudio Atzori
11fe3a4fe0
[graph resolution] use sparkExecutorMemory to define also the memoryOverhead
2024-06-11 14:21:17 +02:00
Claudio Atzori
a8d68c9d29
avoid NPEs
2024-06-11 14:19:24 +02:00
Miriam Baglioni
8fe934810f
Merge remote-tracking branch 'origin/beta' into beta
2024-06-11 10:28:51 +02:00
Miriam Baglioni
9da006e98c
[SDGFoSActionSet]remove datainfo for the result. It is not needed (qualifier.classid = UPDATE) useless since subject do not go at the level of the instance
2024-06-11 10:28:32 +02:00
Giambattista Bloisi
85c1eae7e0
Fixes for pagination strategy looping at end of download
2024-06-10 19:03:58 +02:00
Claudio Atzori
b0eba210c0
[actionset promotion] use sparkExecutorMemory to define also the memoryOverhead
2024-06-10 16:15:24 +02:00
Claudio Atzori
3776327a8c
hostedby patching to work with the updated Crossref contents, resolved conflict
2024-06-10 15:24:12 +02:00
Claudio Atzori
0139f23d66
Merge pull request 'organization type from OpenOrgs' ( #445 ) from import_openorg_type into beta
...
Reviewed-on: #445
2024-06-07 12:17:31 +02:00