Commit Graph

5429 Commits

Author SHA1 Message Date
Antonis Lempesis e728a0897c fixed the definition of indi_pub_bronze_oa 2024-04-18 11:07:55 +03:00
Antonis Lempesis 308ae580a9 slight optimization in indi_pub_gold_oa definition 2024-04-18 10:57:52 +03:00
Antonis Lempesis 27d22bd8f9 slight optimization in indi_pub_gold_oa definition 2024-04-17 23:59:52 +03:00
Antonis Lempesis 1f5aba12fa slight optimization in indi_pub_gold_oa definition 2024-04-17 23:54:23 +03:00
Claudio Atzori 43e123c624 added column alias 2024-04-17 16:40:29 +02:00
Claudio Atzori 62a07b7add added missing end of statement /*EOS*/ 2024-04-17 15:13:28 +02:00
Claudio Atzori 96bddcc921 revised query implementation for indi_pub_gold_oa 2024-04-17 15:06:50 +02:00
Claudio Atzori b554c41cc7 Merge pull request 'doidoost_dismiss' (#418) from doidoost_dismiss into beta
Reviewed-on: #418
2024-04-17 12:01:11 +02:00
Claudio Atzori ac8747582c Merge branch 'beta' into doidoost_dismiss 2024-04-17 12:01:01 +02:00
Claudio Atzori 0db7e4ae9a Merge pull request 'Refinements to PR #404: refactoring the Oaf records merge utilities into dhp-common' (#422) from revised_merge_logic into beta
Reviewed-on: #422
2024-04-17 11:58:26 +02:00
Giambattista Bloisi 8ac167e420 Refinements to PR #404: refactoring the Oaf records merge utilities into dhp-common 2024-04-16 17:18:28 +02:00
Miriam Baglioni 0486cea4c4 removed the funder id : 100011062 Asian Spinal Cord Network, wrongly associated to Ireland 2024-04-16 15:36:40 +02:00
Miriam Baglioni 0625b9061f removed the funder id : 100011062 Asian Spinal Cord Network, wrongly associated to Ireland 2024-04-16 15:26:53 +02:00
Miriam Baglioni 9eeb9f5d32 mergin with branch beta 2024-04-16 15:24:40 +02:00
Claudio Atzori 589bce3520 Merge pull request '[pBETA] Improvements to copying data from ocean to impala' (#421) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #421
2024-04-16 14:22:32 +02:00
Claudio Atzori 013935c593 Merge pull request 'Improvements to copying data from ocean to impala' (#420) from antonis.lempesis/dnet-hadoop:beta into master
Reviewed-on: #420
2024-04-16 14:17:47 +02:00
Sandro La Bruzzo a5ddd8dfbb Added Action set generation for the MAG organization 2024-04-16 13:39:15 +02:00
Giambattista Bloisi da333e9f4d Merge pull request 'Enhance Dedup authors matching with algorithms used for ORCID enhancements (task 9690)' (#419) from dedup_authorsmatch_bytoken into beta
Reviewed-on: #419
2024-04-16 10:24:11 +02:00
Claudio Atzori 43fd1de681 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2024-04-16 09:42:05 +02:00
Claudio Atzori d070db4a32 added a couple more invalid author names 2024-04-16 09:41:59 +02:00
Michele Artini 78b9d84e4a test 2024-04-16 09:41:16 +02:00
Giambattista Bloisi 43b454399f - Bug fix in matchOrderedTokenAndAbbreviations algorithms where tokens with same initial character were always considered equal
- AuthorsMatch exploits the new matching strategy used for ORCID enhancements in #PR398: split author names in tokens, order the tokens, then check for matches of ordered full tokens or abbreviations
2024-04-15 18:19:29 +02:00
Lampros Smyrnaios d7da4f814b Minor updates to the copying operation to Impala Cluster:
- Improve logging.
- Code optimization/polishing.
2024-04-12 18:12:06 +03:00
Lampros Smyrnaios 14719dcd62 Miscellaneous updates to the copying operation to Impala Cluster:
- Update the algorithm for creating views that depend on other views.
- Add check for successful execution of the "hadoop distcp" command.
- Add a check for successful copy operation of all entities.
- Upon facing an error in a DB, exit the method, instead of the whole script.
- Improve logging.
- Code polishing.
2024-04-12 15:36:13 +03:00
Sandro La Bruzzo 41a42dde64 code formatted 2024-04-11 17:43:48 +02:00
Sandro La Bruzzo 843dc95340 resolved conflict 2024-04-11 17:38:16 +02:00
Sandro La Bruzzo 1e30454ee0 added vocabulary tu instanceTypeMApping of Mag 2024-04-11 17:32:30 +02:00
Sandro La Bruzzo 2581672c11 updated wf of MAG and crossref to use transaction 2024-04-11 17:27:49 +02:00
Lampros Smyrnaios 22745027c8 Use the "HADOOP_USER_NAME" value from the "workflow-property", in "copyDataToImpalaCluster.sh", in "stats-monitor-updates". 2024-04-11 17:46:33 +03:00
Lampros Smyrnaios abf0b69f29 Upgrade the copying operation to Impala Cluster:
- Use only hive commands in the Ocean Cluster, as the "impala-shell" will be removed from there to free-up resources.
- Hugely improve the performance in every aspect of the copying process: a) speedup file-transferring and DB-deletion, b) eliminate permissions-assignment, "load" operations and "use $db" queries, c) retry only the "create view" statements and only as long as they depend on other non-created views, instead of trying to recreate all tables and views 5 consecutive times.
- Add error-checks for the creation of tables and views.
2024-04-11 17:12:12 +03:00
Claudio Atzori 3cad4a415d fixed duplicated property dhp-schemas.version 2024-04-11 15:44:12 +02:00
Sandro La Bruzzo a0642bd190 added instanceTypeMapping field on MAG 2024-04-11 13:10:12 +02:00
Claudio Atzori 6132bd028e Merge pull request 'Extend Crossref-funders mapping and datacite hostedbymap' (#417) from CrossrefFundersMap into master
Reviewed-on: #417
2024-04-09 10:30:53 +02:00
Miriam Baglioni 519db1ddef Extended mapping of funder from crossref (#9169, #9277) and change the correspondece files for the irish fundrs (#9635). Extended the datacite map to include the association between metadata and the EBRAINS datasource (SciLake) 2024-04-09 09:33:09 +02:00
Sandro La Bruzzo 98dc042db5 mapping generated for MAG,
missing generation of Organization Action set
2024-04-05 18:12:53 +02:00
Sandro La Bruzzo ef582948a7 Updated mapping 2024-04-05 11:10:44 +02:00
Sandro La Bruzzo 5142f462b5 completed mapping from paper to OAF, not tested 2024-04-04 21:06:04 +02:00
Miriam Baglioni 0794e0667b Merge branch 'doidoost_dismiss' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doidoost_dismiss 2024-04-04 09:16:18 +02:00
Miriam Baglioni 4b1de076ac [DataciteHostedByMap] added entry for EBRAINS 2024-04-04 09:16:14 +02:00
Miriam Baglioni c8a88b2187 [DataciteHostedByMap] added entry for EBRAINS 2024-04-04 09:14:58 +02:00
Sandro La Bruzzo 31e152d2bb Merge remote-tracking branch 'origin/doidoost_dismiss' into doidoost_dismiss 2024-04-03 17:08:35 +02:00
Sandro La Bruzzo 6f3e925cae Implemented first part of the new MAG mapping 2024-04-03 17:07:14 +02:00
Miriam Baglioni f0f6abf892 [MapToFunderLink]added references for HFRI and Erasmus+ for the creation of links for funders 2024-04-03 14:59:09 +02:00
Claudio Atzori 26b97aa5ed Merge pull request '[BETA] fixed the result_country definition and updated the stats DB copy procedure' (#416) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #416
2024-04-03 12:36:03 +02:00
Claudio Atzori 5add51f38c Merge pull request 'fixed the result_country definition and updated the stats DB copy procedure' (#412) from antonis.lempesis/dnet-hadoop:beta into master
Reviewed-on: #412
2024-04-03 12:34:17 +02:00
Lampros Smyrnaios b7c8acc563 - Update the code which acquires the "IMPALA_HDFS_NODE", to test the "tmp"-dir, instead of the base-dir and introduce retries, to overcome potential file-system failures. This change was suggested by "Sebastian Tymkow" and "Grzegorz Bakalarski".
- Fix typos.
2024-04-03 13:15:37 +03:00
Miriam Baglioni 50fbebf186 [NOAMI] removed entry for Health and Social Care Board from the list of funders. Modified IRC putting 1596 and 1597 as synonyms, as required in ticket 9635 2024-04-03 11:45:40 +02:00
Michele Artini 71d6e02886 Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta 2024-04-03 09:50:41 +02:00
Michele Artini 02c9a311c8 base datainfo with trust=0.89 2024-04-03 09:50:21 +02:00
Miriam Baglioni 42846d3b91 [OpenCitation] add compression option when writing the sequence file 2024-04-03 09:25:00 +02:00