Miriam Baglioni
6189879643
[NOAMI] removed entry for Irish Research eLibray (IReL) Care Board from the list of funders.
2024-04-23 11:09:18 +02:00
Claudio Atzori
c57cff2d6d
Merge pull request '[WebCrawl] adding affiliation relations from web information' ( #428 ) from WebCrowlBeta into beta
...
Reviewed-on: D-Net/dnet-hadoop#428
2024-04-23 09:36:15 +02:00
Lampros Smyrnaios
69a9ac7393
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into convert_hive_to_spark_actions
2024-04-22 17:07:11 +03:00
Miriam Baglioni
7de114bda0
[WebCrawl] addressing comments from PR
2024-04-22 13:52:50 +02:00
Claudio Atzori
eb4692e4ee
Merge branch 'beta' into WebCrowlBeta
2024-04-22 11:40:24 +02:00
Claudio Atzori
24a83fc24f
avoid NPEs in common Oaf merge utilities
2024-04-22 11:39:44 +02:00
Sandro La Bruzzo
073f320c6a
Added module containing all the dependencies, useful for spark deploy on k8.
2024-04-22 11:32:31 +02:00
Miriam Baglioni
776c898c4b
[WebCrawl] adding affiliation relations from web information
2024-04-22 11:04:17 +02:00
Claudio Atzori
5857fd38c1
avoid NPEs in common Oaf merge utilities
2024-04-21 08:29:09 +02:00
Claudio Atzori
0656ab2838
code formatting
2024-04-20 08:10:58 +02:00
Claudio Atzori
ab7f0855af
fixed query reading projects from the aggregator DB
2024-04-20 08:10:32 +02:00
Claudio Atzori
7a7e313157
updated schema version
2024-04-19 17:30:25 +02:00
Claudio Atzori
e5879b68c7
[transformative agreement] including reuslt-funder relations to the information imported from the TRs
2024-04-19 17:14:18 +02:00
Claudio Atzori
3a027e97a7
[graph indexing] sets spark memoryOverhead in the join operations to the same value used for the memory executor
2024-04-19 16:59:58 +02:00
Sandro La Bruzzo
b72c3139e2
updated Ignore annotation that is deprecated to Disabled
2024-04-19 14:52:40 +02:00
Sandro La Bruzzo
b84ad0c06e
merged beta
2024-04-19 14:39:59 +02:00
Antonis Lempesis
b52a5a753b
Merge remote-tracking branch 'upstream/beta' into beta
2024-04-19 15:28:28 +03:00
Sandro La Bruzzo
8dd9cf84e2
code formatted
2024-04-19 12:30:59 +02:00
Lampros Smyrnaios
342223f75c
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into convert_hive_to_spark_actions
2024-04-19 13:18:34 +03:00
Sandro La Bruzzo
342cb6189b
fixed problem on changed signature on RowEncoder
...
removed property dhp.schema.artifact
2024-04-19 12:13:26 +02:00
Antonis Lempesis
c3fe9662b2
all indicator tables are now stored as parquet
2024-04-19 12:45:36 +03:00
Lampros Smyrnaios
2616971e2b
dhp-stats-update: remove leftover duplicate line
2024-04-18 16:18:16 +03:00
Lampros Smyrnaios
ba533d9f34
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into convert_hive_to_spark_actions
2024-04-18 15:47:56 +03:00
Lampros Smyrnaios
d46b78b659
dhp-stats-update:
...
- Set Steps 2-7 and 9 to limit the amount of files generated by Spark, from 8000, down to 100, to improve file-transfer and querying performance.
- Allow the workflow to run up to Step10. The Step11 seems to have some issues even when using hive-action.
2024-04-18 15:40:27 +03:00
Lampros Smyrnaios
6f2ebb2a52
Revert Step8 and Step11 to use Hive again, since their "UPDATE" statements are not supported by Spark.
2024-04-18 15:35:03 +03:00
Claudio Atzori
57c678d904
integrating changes from PR#424
2024-04-18 11:38:35 +02:00
Claudio Atzori
5ab8cd1794
Various fixes for the stats DB update workflow, step16-createIndicatorsTables.sql
2024-04-18 11:28:18 +02:00
Antonis Lempesis
0c71c58df6
fixed the definition of gold_oa
2024-04-18 12:01:27 +03:00
Antonis Lempesis
43d05dbebb
fixed the definition of result_country
2024-04-18 11:53:50 +03:00
Antonis Lempesis
e728a0897c
fixed the definition of indi_pub_bronze_oa
2024-04-18 11:07:55 +03:00
Antonis Lempesis
308ae580a9
slight optimization in indi_pub_gold_oa definition
2024-04-18 10:57:52 +03:00
Antonis Lempesis
27d22bd8f9
slight optimization in indi_pub_gold_oa definition
2024-04-17 23:59:52 +03:00
Antonis Lempesis
1f5aba12fa
slight optimization in indi_pub_gold_oa definition
2024-04-17 23:54:23 +03:00
Lampros Smyrnaios
ca091c0f1e
dhp-stats-update:
...
- Fix not passing some parameters to some Spark actions.
- Allow the workflow to run up to Step7. The first 7 steps seem to work out of the box.
2024-04-17 14:03:59 +03:00
Claudio Atzori
b554c41cc7
Merge pull request 'doidoost_dismiss' ( #418 ) from doidoost_dismiss into beta
...
Reviewed-on: D-Net/dnet-hadoop#418
2024-04-17 12:01:11 +02:00
Claudio Atzori
ac8747582c
Merge branch 'beta' into doidoost_dismiss
2024-04-17 12:01:01 +02:00
Claudio Atzori
0db7e4ae9a
Merge pull request 'Refinements to PR #404 : refactoring the Oaf records merge utilities into dhp-common' ( #422 ) from revised_merge_logic into beta
...
Reviewed-on: D-Net/dnet-hadoop#422
2024-04-17 11:58:26 +02:00
Giambattista Bloisi
8ac167e420
Refinements to PR #404 : refactoring the Oaf records merge utilities into dhp-common
2024-04-16 17:18:28 +02:00
Lampros Smyrnaios
0b897f2f66
Fix and add missing "DROP TABLE" statements, in "dhp-stats-update" sql-scripts.
2024-04-16 18:17:54 +03:00
Miriam Baglioni
0625b9061f
removed the funder id : 100011062 Asian Spinal Cord Network, wrongly associated to Ireland
2024-04-16 15:26:53 +02:00
Miriam Baglioni
9eeb9f5d32
mergin with branch beta
2024-04-16 15:24:40 +02:00
Claudio Atzori
589bce3520
Merge pull request '[pBETA] Improvements to copying data from ocean to impala' ( #421 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#421
2024-04-16 14:22:32 +02:00
Sandro La Bruzzo
a5ddd8dfbb
Added Action set generation for the MAG organization
2024-04-16 13:39:15 +02:00
Giambattista Bloisi
da333e9f4d
Merge pull request 'Enhance Dedup authors matching with algorithms used for ORCID enhancements (task 9690)' ( #419 ) from dedup_authorsmatch_bytoken into beta
...
Reviewed-on: D-Net/dnet-hadoop#419
2024-04-16 10:24:11 +02:00
Claudio Atzori
43fd1de681
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2024-04-16 09:42:05 +02:00
Claudio Atzori
d070db4a32
added a couple more invalid author names
2024-04-16 09:41:59 +02:00
Michele Artini
78b9d84e4a
test
2024-04-16 09:41:16 +02:00
Giambattista Bloisi
43b454399f
- Bug fix in matchOrderedTokenAndAbbreviations algorithms where tokens with same initial character were always considered equal
...
- AuthorsMatch exploits the new matching strategy used for ORCID enhancements in #PR398: split author names in tokens, order the tokens, then check for matches of ordered full tokens or abbreviations
2024-04-15 18:19:29 +02:00
Lampros Smyrnaios
db33f7727c
Update "dhp-stats-update" workflow to use "spark"-actions, instead of "hive" ones.
...
Note: Currently the code is set to only test the "Step1".
2024-04-15 16:22:40 +03:00
Lampros Smyrnaios
d7da4f814b
Minor updates to the copying operation to Impala Cluster:
...
- Improve logging.
- Code optimization/polishing.
2024-04-12 18:12:06 +03:00