Miriam Baglioni
7de114bda0
[WebCrawl] addressing comments from PR
2024-04-22 13:52:50 +02:00
Claudio Atzori
eb4692e4ee
Merge branch 'beta' into WebCrowlBeta
2024-04-22 11:40:24 +02:00
Claudio Atzori
24a83fc24f
avoid NPEs in common Oaf merge utilities
2024-04-22 11:39:44 +02:00
Miriam Baglioni
776c898c4b
[WebCrawl] adding affiliation relations from web information
2024-04-22 11:04:17 +02:00
Claudio Atzori
5857fd38c1
avoid NPEs in common Oaf merge utilities
2024-04-21 08:29:09 +02:00
Claudio Atzori
0656ab2838
code formatting
2024-04-20 08:10:58 +02:00
Claudio Atzori
ab7f0855af
fixed query reading projects from the aggregator DB
2024-04-20 08:10:32 +02:00
Claudio Atzori
7a7e313157
updated schema version
2024-04-19 17:30:25 +02:00
Claudio Atzori
e5879b68c7
[transformative agreement] including reuslt-funder relations to the information imported from the TRs
2024-04-19 17:14:18 +02:00
Claudio Atzori
3a027e97a7
[graph indexing] sets spark memoryOverhead in the join operations to the same value used for the memory executor
2024-04-19 16:59:58 +02:00
Sandro La Bruzzo
b72c3139e2
updated Ignore annotation that is deprecated to Disabled
2024-04-19 14:52:40 +02:00
Claudio Atzori
57c678d904
integrating changes from PR#424
2024-04-18 11:38:35 +02:00
Claudio Atzori
5ab8cd1794
Various fixes for the stats DB update workflow, step16-createIndicatorsTables.sql
2024-04-18 11:28:18 +02:00
Claudio Atzori
b554c41cc7
Merge pull request 'doidoost_dismiss' ( #418 ) from doidoost_dismiss into beta
...
Reviewed-on: #418
2024-04-17 12:01:11 +02:00
Claudio Atzori
ac8747582c
Merge branch 'beta' into doidoost_dismiss
2024-04-17 12:01:01 +02:00
Claudio Atzori
0db7e4ae9a
Merge pull request 'Refinements to PR #404 : refactoring the Oaf records merge utilities into dhp-common' ( #422 ) from revised_merge_logic into beta
...
Reviewed-on: #422
2024-04-17 11:58:26 +02:00
Giambattista Bloisi
8ac167e420
Refinements to PR #404 : refactoring the Oaf records merge utilities into dhp-common
2024-04-16 17:18:28 +02:00
Miriam Baglioni
0625b9061f
removed the funder id : 100011062 Asian Spinal Cord Network, wrongly associated to Ireland
2024-04-16 15:26:53 +02:00
Miriam Baglioni
9eeb9f5d32
mergin with branch beta
2024-04-16 15:24:40 +02:00
Claudio Atzori
589bce3520
Merge pull request '[pBETA] Improvements to copying data from ocean to impala' ( #421 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #421
2024-04-16 14:22:32 +02:00
Sandro La Bruzzo
a5ddd8dfbb
Added Action set generation for the MAG organization
2024-04-16 13:39:15 +02:00
Giambattista Bloisi
da333e9f4d
Merge pull request 'Enhance Dedup authors matching with algorithms used for ORCID enhancements (task 9690)' ( #419 ) from dedup_authorsmatch_bytoken into beta
...
Reviewed-on: #419
2024-04-16 10:24:11 +02:00
Claudio Atzori
43fd1de681
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2024-04-16 09:42:05 +02:00
Claudio Atzori
d070db4a32
added a couple more invalid author names
2024-04-16 09:41:59 +02:00
Michele Artini
78b9d84e4a
test
2024-04-16 09:41:16 +02:00
Giambattista Bloisi
43b454399f
- Bug fix in matchOrderedTokenAndAbbreviations algorithms where tokens with same initial character were always considered equal
...
- AuthorsMatch exploits the new matching strategy used for ORCID enhancements in #PR398: split author names in tokens, order the tokens, then check for matches of ordered full tokens or abbreviations
2024-04-15 18:19:29 +02:00
Lampros Smyrnaios
d7da4f814b
Minor updates to the copying operation to Impala Cluster:
...
- Improve logging.
- Code optimization/polishing.
2024-04-12 18:12:06 +03:00
Lampros Smyrnaios
14719dcd62
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Update the algorithm for creating views that depend on other views.
- Add check for successful execution of the "hadoop distcp" command.
- Add a check for successful copy operation of all entities.
- Upon facing an error in a DB, exit the method, instead of the whole script.
- Improve logging.
- Code polishing.
2024-04-12 15:36:13 +03:00
Sandro La Bruzzo
41a42dde64
code formatted
2024-04-11 17:43:48 +02:00
Sandro La Bruzzo
843dc95340
resolved conflict
2024-04-11 17:38:16 +02:00
Sandro La Bruzzo
1e30454ee0
added vocabulary tu instanceTypeMApping of Mag
2024-04-11 17:32:30 +02:00
Sandro La Bruzzo
2581672c11
updated wf of MAG and crossref to use transaction
2024-04-11 17:27:49 +02:00
Lampros Smyrnaios
22745027c8
Use the "HADOOP_USER_NAME" value from the "workflow-property", in "copyDataToImpalaCluster.sh", in "stats-monitor-updates".
2024-04-11 17:46:33 +03:00
Lampros Smyrnaios
abf0b69f29
Upgrade the copying operation to Impala Cluster:
...
- Use only hive commands in the Ocean Cluster, as the "impala-shell" will be removed from there to free-up resources.
- Hugely improve the performance in every aspect of the copying process: a) speedup file-transferring and DB-deletion, b) eliminate permissions-assignment, "load" operations and "use $db" queries, c) retry only the "create view" statements and only as long as they depend on other non-created views, instead of trying to recreate all tables and views 5 consecutive times.
- Add error-checks for the creation of tables and views.
2024-04-11 17:12:12 +03:00
Claudio Atzori
3cad4a415d
fixed duplicated property dhp-schemas.version
2024-04-11 15:44:12 +02:00
Sandro La Bruzzo
a0642bd190
added instanceTypeMapping field on MAG
2024-04-11 13:10:12 +02:00
Sandro La Bruzzo
98dc042db5
mapping generated for MAG,
...
missing generation of Organization Action set
2024-04-05 18:12:53 +02:00
Sandro La Bruzzo
ef582948a7
Updated mapping
2024-04-05 11:10:44 +02:00
Sandro La Bruzzo
5142f462b5
completed mapping from paper to OAF, not tested
2024-04-04 21:06:04 +02:00
Miriam Baglioni
0794e0667b
Merge branch 'doidoost_dismiss' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doidoost_dismiss
2024-04-04 09:16:18 +02:00
Miriam Baglioni
4b1de076ac
[DataciteHostedByMap] added entry for EBRAINS
2024-04-04 09:16:14 +02:00
Miriam Baglioni
c8a88b2187
[DataciteHostedByMap] added entry for EBRAINS
2024-04-04 09:14:58 +02:00
Sandro La Bruzzo
31e152d2bb
Merge remote-tracking branch 'origin/doidoost_dismiss' into doidoost_dismiss
2024-04-03 17:08:35 +02:00
Sandro La Bruzzo
6f3e925cae
Implemented first part of the new MAG mapping
2024-04-03 17:07:14 +02:00
Miriam Baglioni
f0f6abf892
[MapToFunderLink]added references for HFRI and Erasmus+ for the creation of links for funders
2024-04-03 14:59:09 +02:00
Claudio Atzori
26b97aa5ed
Merge pull request '[BETA] fixed the result_country definition and updated the stats DB copy procedure' ( #416 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #416
2024-04-03 12:36:03 +02:00
Lampros Smyrnaios
b7c8acc563
- Update the code which acquires the "IMPALA_HDFS_NODE", to test the "tmp"-dir, instead of the base-dir and introduce retries, to overcome potential file-system failures. This change was suggested by "Sebastian Tymkow" and "Grzegorz Bakalarski".
...
- Fix typos.
2024-04-03 13:15:37 +03:00
Miriam Baglioni
50fbebf186
[NOAMI] removed entry for Health and Social Care Board from the list of funders. Modified IRC putting 1596 and 1597 as synonyms, as required in ticket 9635
2024-04-03 11:45:40 +02:00
Michele Artini
71d6e02886
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2024-04-03 09:50:41 +02:00
Michele Artini
02c9a311c8
base datainfo with trust=0.89
2024-04-03 09:50:21 +02:00