Antonis Lempesis
308ae580a9
slight optimization in indi_pub_gold_oa definition
2024-04-18 10:57:52 +03:00
Antonis Lempesis
27d22bd8f9
slight optimization in indi_pub_gold_oa definition
2024-04-17 23:59:52 +03:00
Antonis Lempesis
1f5aba12fa
slight optimization in indi_pub_gold_oa definition
2024-04-17 23:54:23 +03:00
Claudio Atzori
43e123c624
added column alias
2024-04-17 16:40:29 +02:00
Claudio Atzori
62a07b7add
added missing end of statement /*EOS*/
2024-04-17 15:13:28 +02:00
Claudio Atzori
96bddcc921
revised query implementation for indi_pub_gold_oa
2024-04-17 15:06:50 +02:00
Claudio Atzori
b554c41cc7
Merge pull request 'doidoost_dismiss' ( #418 ) from doidoost_dismiss into beta
...
Reviewed-on: #418
2024-04-17 12:01:11 +02:00
Claudio Atzori
ac8747582c
Merge branch 'beta' into doidoost_dismiss
2024-04-17 12:01:01 +02:00
Claudio Atzori
0db7e4ae9a
Merge pull request 'Refinements to PR #404 : refactoring the Oaf records merge utilities into dhp-common' ( #422 ) from revised_merge_logic into beta
...
Reviewed-on: #422
2024-04-17 11:58:26 +02:00
Giambattista Bloisi
8ac167e420
Refinements to PR #404 : refactoring the Oaf records merge utilities into dhp-common
2024-04-16 17:18:28 +02:00
Miriam Baglioni
0486cea4c4
removed the funder id : 100011062 Asian Spinal Cord Network, wrongly associated to Ireland
2024-04-16 15:36:40 +02:00
Miriam Baglioni
0625b9061f
removed the funder id : 100011062 Asian Spinal Cord Network, wrongly associated to Ireland
2024-04-16 15:26:53 +02:00
Miriam Baglioni
9eeb9f5d32
mergin with branch beta
2024-04-16 15:24:40 +02:00
Claudio Atzori
589bce3520
Merge pull request '[pBETA] Improvements to copying data from ocean to impala' ( #421 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #421
2024-04-16 14:22:32 +02:00
Claudio Atzori
013935c593
Merge pull request 'Improvements to copying data from ocean to impala' ( #420 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: #420
2024-04-16 14:17:47 +02:00
Sandro La Bruzzo
a5ddd8dfbb
Added Action set generation for the MAG organization
2024-04-16 13:39:15 +02:00
Giambattista Bloisi
da333e9f4d
Merge pull request 'Enhance Dedup authors matching with algorithms used for ORCID enhancements (task 9690)' ( #419 ) from dedup_authorsmatch_bytoken into beta
...
Reviewed-on: #419
2024-04-16 10:24:11 +02:00
Claudio Atzori
43fd1de681
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2024-04-16 09:42:05 +02:00
Claudio Atzori
d070db4a32
added a couple more invalid author names
2024-04-16 09:41:59 +02:00
Michele Artini
78b9d84e4a
test
2024-04-16 09:41:16 +02:00
Giambattista Bloisi
43b454399f
- Bug fix in matchOrderedTokenAndAbbreviations algorithms where tokens with same initial character were always considered equal
...
- AuthorsMatch exploits the new matching strategy used for ORCID enhancements in #PR398: split author names in tokens, order the tokens, then check for matches of ordered full tokens or abbreviations
2024-04-15 18:19:29 +02:00
Lampros Smyrnaios
d7da4f814b
Minor updates to the copying operation to Impala Cluster:
...
- Improve logging.
- Code optimization/polishing.
2024-04-12 18:12:06 +03:00
Lampros Smyrnaios
14719dcd62
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Update the algorithm for creating views that depend on other views.
- Add check for successful execution of the "hadoop distcp" command.
- Add a check for successful copy operation of all entities.
- Upon facing an error in a DB, exit the method, instead of the whole script.
- Improve logging.
- Code polishing.
2024-04-12 15:36:13 +03:00
Sandro La Bruzzo
41a42dde64
code formatted
2024-04-11 17:43:48 +02:00
Sandro La Bruzzo
843dc95340
resolved conflict
2024-04-11 17:38:16 +02:00
Sandro La Bruzzo
1e30454ee0
added vocabulary tu instanceTypeMApping of Mag
2024-04-11 17:32:30 +02:00
Sandro La Bruzzo
2581672c11
updated wf of MAG and crossref to use transaction
2024-04-11 17:27:49 +02:00
Lampros Smyrnaios
22745027c8
Use the "HADOOP_USER_NAME" value from the "workflow-property", in "copyDataToImpalaCluster.sh", in "stats-monitor-updates".
2024-04-11 17:46:33 +03:00
Lampros Smyrnaios
abf0b69f29
Upgrade the copying operation to Impala Cluster:
...
- Use only hive commands in the Ocean Cluster, as the "impala-shell" will be removed from there to free-up resources.
- Hugely improve the performance in every aspect of the copying process: a) speedup file-transferring and DB-deletion, b) eliminate permissions-assignment, "load" operations and "use $db" queries, c) retry only the "create view" statements and only as long as they depend on other non-created views, instead of trying to recreate all tables and views 5 consecutive times.
- Add error-checks for the creation of tables and views.
2024-04-11 17:12:12 +03:00
Claudio Atzori
3cad4a415d
fixed duplicated property dhp-schemas.version
2024-04-11 15:44:12 +02:00
Sandro La Bruzzo
a0642bd190
added instanceTypeMapping field on MAG
2024-04-11 13:10:12 +02:00
Claudio Atzori
6132bd028e
Merge pull request 'Extend Crossref-funders mapping and datacite hostedbymap' ( #417 ) from CrossrefFundersMap into master
...
Reviewed-on: #417
2024-04-09 10:30:53 +02:00
Miriam Baglioni
519db1ddef
Extended mapping of funder from crossref ( #9169 , #9277 ) and change the correspondece files for the irish fundrs ( #9635 ). Extended the datacite map to include the association between metadata and the EBRAINS datasource (SciLake)
2024-04-09 09:33:09 +02:00
Sandro La Bruzzo
98dc042db5
mapping generated for MAG,
...
missing generation of Organization Action set
2024-04-05 18:12:53 +02:00
Sandro La Bruzzo
ef582948a7
Updated mapping
2024-04-05 11:10:44 +02:00
Sandro La Bruzzo
5142f462b5
completed mapping from paper to OAF, not tested
2024-04-04 21:06:04 +02:00
Miriam Baglioni
0794e0667b
Merge branch 'doidoost_dismiss' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doidoost_dismiss
2024-04-04 09:16:18 +02:00
Miriam Baglioni
4b1de076ac
[DataciteHostedByMap] added entry for EBRAINS
2024-04-04 09:16:14 +02:00
Miriam Baglioni
c8a88b2187
[DataciteHostedByMap] added entry for EBRAINS
2024-04-04 09:14:58 +02:00
Sandro La Bruzzo
31e152d2bb
Merge remote-tracking branch 'origin/doidoost_dismiss' into doidoost_dismiss
2024-04-03 17:08:35 +02:00
Sandro La Bruzzo
6f3e925cae
Implemented first part of the new MAG mapping
2024-04-03 17:07:14 +02:00
Miriam Baglioni
f0f6abf892
[MapToFunderLink]added references for HFRI and Erasmus+ for the creation of links for funders
2024-04-03 14:59:09 +02:00
Claudio Atzori
26b97aa5ed
Merge pull request '[BETA] fixed the result_country definition and updated the stats DB copy procedure' ( #416 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #416
2024-04-03 12:36:03 +02:00
Claudio Atzori
5add51f38c
Merge pull request 'fixed the result_country definition and updated the stats DB copy procedure' ( #412 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: #412
2024-04-03 12:34:17 +02:00
Lampros Smyrnaios
b7c8acc563
- Update the code which acquires the "IMPALA_HDFS_NODE", to test the "tmp"-dir, instead of the base-dir and introduce retries, to overcome potential file-system failures. This change was suggested by "Sebastian Tymkow" and "Grzegorz Bakalarski".
...
- Fix typos.
2024-04-03 13:15:37 +03:00
Miriam Baglioni
50fbebf186
[NOAMI] removed entry for Health and Social Care Board from the list of funders. Modified IRC putting 1596 and 1597 as synonyms, as required in ticket 9635
2024-04-03 11:45:40 +02:00
Michele Artini
71d6e02886
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2024-04-03 09:50:41 +02:00
Michele Artini
02c9a311c8
base datainfo with trust=0.89
2024-04-03 09:50:21 +02:00
Miriam Baglioni
42846d3b91
[OpenCitation] add compression option when writing the sequence file
2024-04-03 09:25:00 +02:00
Miriam Baglioni
4f0a044245
Merge pull request 'Add action set creation for Datacite affiliations' ( #413 ) from 9647_datacite_affiliations into beta
...
Reviewed-on: #413
2024-04-02 17:33:38 +02:00