Claudio Atzori
908ed9da7a
Merge pull request 'Various fixes in the stats wf' ( #430 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #430
2024-05-08 13:41:02 +02:00
Antonis Lempesis
0cada3cc8f
every step is run in the analytics queue. Hardcoded for now, will make a parameter later
2024-05-08 13:42:53 +03:00
Antonis Lempesis
90a4fb3547
fixed typos
2024-05-08 13:17:58 +03:00
Claudio Atzori
26363060ed
fixed id prefix creation for the fosnodoi records, again
2024-05-03 15:53:52 +02:00
Claudio Atzori
0486227185
[cleaning] deactivating the cleaning of FOS subjects found in the metadata provided by repositories
2024-05-03 14:31:12 +02:00
Claudio Atzori
e1a0fb8933
fixed id prefix creation for the fosnodoi records
2024-05-03 14:14:18 +02:00
Claudio Atzori
4355f64810
reverted to version 1.2.5-SNAPSHOT
2024-05-02 11:23:53 +02:00
Claudio Atzori
66680b8b9a
refactoring of common utilities
2024-05-02 11:16:58 +02:00
Claudio Atzori
dcf23b3d06
Merge branch 'beta' into beta-release-1.2.5
2024-05-02 10:01:49 +02:00
Claudio Atzori
11bd89e132
[enrichment] use sparkExecutorMemory to define also the memoryOverhead
2024-05-01 08:32:59 +02:00
Claudio Atzori
e96c2c1606
[ranking wf] set spark.executor.memoryOverhead to fine tune the resource consumption
2024-04-30 16:23:25 +02:00
Claudio Atzori
50c18f7a0b
[dedup wf] revised memory settings to address the increased volume of input contents
2024-04-30 12:34:16 +02:00
Claudio Atzori
e2937db385
Merge branch 'beta' into misc_fixes_merge_entities
2024-04-24 08:55:28 +02:00
Giambattista Bloisi
1878199dae
Miscellaneous fixes:
...
- in Merge By ID pick by preference those records coming from delegated Authorities
- fix various tests
- close spark session in SparkCreateSimRels
2024-04-24 08:12:45 +02:00
Lampros Smyrnaios
49af2e5740
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Update the algorithm for creating views that depend on other views; overcome some bash-instabilities.
- Upon any error, fail the whole process, not just the current DB-creation, as those errors usually indicate a bug in the initial DB-creation, that should be fixed immediately.
- Enhance parallel-copy of large files by "hadoop distcp" command.
- Reduce the "invalidate metadata" commands to just the current DB's tables, in order to eliminate the general overhead on Impala.
- Show the number of tables and views in the logs.
- Fix some log-messages.
2024-04-23 17:15:04 +03:00
Antonis Lempesis
d2649a1429
increased the jvm ram
2024-04-23 16:03:16 +03:00
Claudio Atzori
c3053ef34d
using version 1.2.5-beta for the release
2024-04-23 14:52:32 +02:00
Claudio Atzori
b5bcab13ec
using version 1.2.5-beta for the release
2024-04-23 14:36:39 +02:00
Claudio Atzori
425c9afc36
using version 1.2.5-beta for the release
2024-04-23 14:30:04 +02:00
Claudio Atzori
93dd9cc639
code formatting
2024-04-23 11:28:00 +02:00
Miriam Baglioni
6189879643
[NOAMI] removed entry for Irish Research eLibray (IReL) Care Board from the list of funders.
2024-04-23 11:09:18 +02:00
Miriam Baglioni
7de114bda0
[WebCrawl] addressing comments from PR
2024-04-22 13:52:50 +02:00
Miriam Baglioni
776c898c4b
[WebCrawl] adding affiliation relations from web information
2024-04-22 11:04:17 +02:00
Claudio Atzori
0656ab2838
code formatting
2024-04-20 08:10:58 +02:00
Claudio Atzori
ab7f0855af
fixed query reading projects from the aggregator DB
2024-04-20 08:10:32 +02:00
Claudio Atzori
e5879b68c7
[transformative agreement] including reuslt-funder relations to the information imported from the TRs
2024-04-19 17:14:18 +02:00
Claudio Atzori
3a027e97a7
[graph indexing] sets spark memoryOverhead in the join operations to the same value used for the memory executor
2024-04-19 16:59:58 +02:00
Sandro La Bruzzo
b72c3139e2
updated Ignore annotation that is deprecated to Disabled
2024-04-19 14:52:40 +02:00
Antonis Lempesis
b52a5a753b
Merge remote-tracking branch 'upstream/beta' into beta
2024-04-19 15:28:28 +03:00
Antonis Lempesis
c3fe9662b2
all indicator tables are now stored as parquet
2024-04-19 12:45:36 +03:00
Claudio Atzori
57c678d904
integrating changes from PR#424
2024-04-18 11:38:35 +02:00
Claudio Atzori
5ab8cd1794
Various fixes for the stats DB update workflow, step16-createIndicatorsTables.sql
2024-04-18 11:28:18 +02:00
Antonis Lempesis
0c71c58df6
fixed the definition of gold_oa
2024-04-18 12:01:27 +03:00
Antonis Lempesis
43d05dbebb
fixed the definition of result_country
2024-04-18 11:53:50 +03:00
Antonis Lempesis
e728a0897c
fixed the definition of indi_pub_bronze_oa
2024-04-18 11:07:55 +03:00
Antonis Lempesis
308ae580a9
slight optimization in indi_pub_gold_oa definition
2024-04-18 10:57:52 +03:00
Antonis Lempesis
27d22bd8f9
slight optimization in indi_pub_gold_oa definition
2024-04-17 23:59:52 +03:00
Antonis Lempesis
1f5aba12fa
slight optimization in indi_pub_gold_oa definition
2024-04-17 23:54:23 +03:00
Claudio Atzori
ac8747582c
Merge branch 'beta' into doidoost_dismiss
2024-04-17 12:01:01 +02:00
Giambattista Bloisi
8ac167e420
Refinements to PR #404 : refactoring the Oaf records merge utilities into dhp-common
2024-04-16 17:18:28 +02:00
Miriam Baglioni
0625b9061f
removed the funder id : 100011062 Asian Spinal Cord Network, wrongly associated to Ireland
2024-04-16 15:26:53 +02:00
Miriam Baglioni
9eeb9f5d32
mergin with branch beta
2024-04-16 15:24:40 +02:00
Claudio Atzori
589bce3520
Merge pull request '[pBETA] Improvements to copying data from ocean to impala' ( #421 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #421
2024-04-16 14:22:32 +02:00
Sandro La Bruzzo
a5ddd8dfbb
Added Action set generation for the MAG organization
2024-04-16 13:39:15 +02:00
Giambattista Bloisi
da333e9f4d
Merge pull request 'Enhance Dedup authors matching with algorithms used for ORCID enhancements (task 9690)' ( #419 ) from dedup_authorsmatch_bytoken into beta
...
Reviewed-on: #419
2024-04-16 10:24:11 +02:00
Michele Artini
78b9d84e4a
test
2024-04-16 09:41:16 +02:00
Giambattista Bloisi
43b454399f
- Bug fix in matchOrderedTokenAndAbbreviations algorithms where tokens with same initial character were always considered equal
...
- AuthorsMatch exploits the new matching strategy used for ORCID enhancements in #PR398: split author names in tokens, order the tokens, then check for matches of ordered full tokens or abbreviations
2024-04-15 18:19:29 +02:00
Lampros Smyrnaios
d7da4f814b
Minor updates to the copying operation to Impala Cluster:
...
- Improve logging.
- Code optimization/polishing.
2024-04-12 18:12:06 +03:00
Lampros Smyrnaios
14719dcd62
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Update the algorithm for creating views that depend on other views.
- Add check for successful execution of the "hadoop distcp" command.
- Add a check for successful copy operation of all entities.
- Upon facing an error in a DB, exit the method, instead of the whole script.
- Improve logging.
- Code polishing.
2024-04-12 15:36:13 +03:00
Sandro La Bruzzo
41a42dde64
code formatted
2024-04-11 17:43:48 +02:00