9cd3bc0f10Added a new generation of the dump for scholexplorer tested with last version of spark, and strongly refactoredSandro La Bruzzo2024-04-26 16:02:07 +0200
c08a58bba8Merge pull request 'Miscellaneous related to changes in MergeUtils' (#429) from misc_fixes_merge_entities into betaClaudio Atzori2024-04-24 08:55:37 +0200
1878199daeMiscellaneous fixes: - in Merge By ID pick by preference those records coming from delegated Authorities - fix various tests - close spark session in SparkCreateSimRelsGiambattista Bloisi2024-04-24 08:12:45 +0200
49af2e5740Miscellaneous updates to the copying operation to Impala Cluster: - Update the algorithm for creating views that depend on other views; overcome some bash-instabilities. - Upon any error, fail the whole process, not just the current DB-creation, as those errors usually indicate a bug in the initial DB-creation, that should be fixed immediately. - Enhance parallel-copy of large files by "hadoop distcp" command. - Reduce the "invalidate metadata" commands to just the current DB's tables, in order to eliminate the general overhead on Impala. - Show the number of tables and views in the logs. - Fix some log-messages.
#423
#248
#238
Lampros Smyrnaios2024-04-23 17:15:04 +0300
6189879643[NOAMI] removed entry for Irish Research eLibray (IReL) Care Board from the list of funders.Miriam Baglioni2024-04-23 11:09:18 +0200
c57cff2d6dMerge pull request '[WebCrawl] adding affiliation relations from web information' (#428) from WebCrowlBeta into betaClaudio Atzori2024-04-23 09:36:15 +0200
3a027e97a7[graph indexing] sets spark memoryOverhead in the join operations to the same value used for the memory executorClaudio Atzori2024-04-19 16:57:55 +0200
795e1b2629Merge pull request '[graph indexing] sets spark memoryOverhead in the join operations to the same value used for the memory executor' (#426) from provision_memoryOverhead into master
#59
Claudio Atzori2024-04-19 16:59:45 +0200
5ab8cd1794Various fixes for the stats DB update workflow, step16-createIndicatorsTables.sqlClaudio Atzori2024-04-18 11:28:18 +0200
8fdd0244adMerge pull request 'Various fixes for the stats DB update workflow, step16-createIndicatorsTables.sql' (#425) from stats_step16_fix into masterClaudio Atzori2024-04-18 11:25:24 +0200
0db7e4ae9aMerge pull request 'Refinements to PR #404: refactoring the Oaf records merge utilities into dhp-common' (#422) from revised_merge_logic into betaClaudio Atzori2024-04-17 11:58:26 +0200
589bce3520Merge pull request '[pBETA] Improvements to copying data from ocean to impala' (#421) from antonis.lempesis/dnet-hadoop:beta into betaClaudio Atzori2024-04-16 14:22:32 +0200
013935c593Merge pull request 'Improvements to copying data from ocean to impala' (#420) from antonis.lempesis/dnet-hadoop:beta into masterClaudio Atzori2024-04-16 14:17:47 +0200
a5ddd8dfbbAdded Action set generation for the MAG organization
Sandro La Bruzzo
2024-04-16 13:39:15 +0200
da333e9f4dMerge pull request 'Enhance Dedup authors matching with algorithms used for ORCID enhancements (task 9690)' (#419) from dedup_authorsmatch_bytoken into betaGiambattista Bloisi2024-04-16 10:24:11 +0200
43b454399f- Bug fix in matchOrderedTokenAndAbbreviations algorithms where tokens with same initial character were always considered equal - AuthorsMatch exploits the new matching strategy used for ORCID enhancements in #PR398: split author names in tokens, order the tokens, then check for matches of ordered full tokens or abbreviations
#419
Giambattista Bloisi2024-04-15 18:19:29 +0200
d7da4f814bMinor updates to the copying operation to Impala Cluster: - Improve logging. - Code optimization/polishing.
#421
#420
Lampros Smyrnaios2024-04-12 18:12:06 +0300
14719dcd62Miscellaneous updates to the copying operation to Impala Cluster: - Update the algorithm for creating views that depend on other views. - Add check for successful execution of the "hadoop distcp" command. - Add a check for successful copy operation of all entities. - Upon facing an error in a DB, exit the method, instead of the whole script. - Improve logging. - Code polishing.Lampros Smyrnaios2024-04-12 15:36:13 +0300
22745027c8Use the "HADOOP_USER_NAME" value from the "workflow-property", in "copyDataToImpalaCluster.sh", in "stats-monitor-updates".Lampros Smyrnaios2024-04-11 17:46:33 +0300
abf0b69f29Upgrade the copying operation to Impala Cluster: - Use only hive commands in the Ocean Cluster, as the "impala-shell" will be removed from there to free-up resources. - Hugely improve the performance in every aspect of the copying process: a) speedup file-transferring and DB-deletion, b) eliminate permissions-assignment, "load" operations and "use $db" queries, c) retry only the "create view" statements and only as long as they depend on other non-created views, instead of trying to recreate all tables and views 5 consecutive times. - Add error-checks for the creation of tables and views.Lampros Smyrnaios2024-04-11 17:12:12 +0300
6132bd028eMerge pull request 'Extend Crossref-funders mapping and datacite hostedbymap' (#417) from CrossrefFundersMap into masterClaudio Atzori2024-04-09 10:30:53 +0200
519db1ddefExtended mapping of funder from crossref (#9169, #9277) and change the correspondece files for the irish fundrs (#9635). Extended the datacite map to include the association between metadata and the EBRAINS datasource (SciLake)
#417
CrossrefFundersMap
Miriam Baglioni2024-04-09 09:33:09 +0200
98dc042db5mapping generated for MAG, missing generation of Organization Action set
Sandro La Bruzzo
2024-04-05 18:12:53 +0200
ef582948a7Updated mapping
Sandro La Bruzzo
2024-04-05 11:10:44 +0200
31e152d2bbMerge remote-tracking branch 'origin/doidoost_dismiss' into doidoost_dismiss
Sandro La Bruzzo
2024-04-03 17:08:35 +0200
6f3e925caeImplemented first part of the new MAG mapping
Sandro La Bruzzo
2024-04-03 17:07:14 +0200
f0f6abf892[MapToFunderLink]added references for HFRI and Erasmus+ for the creation of links for fundersMiriam Baglioni2024-04-03 14:59:09 +0200
26b97aa5edMerge pull request '[BETA] fixed the result_country definition and updated the stats DB copy procedure' (#416) from antonis.lempesis/dnet-hadoop:beta into betaClaudio Atzori2024-04-03 12:36:03 +0200
5add51f38cMerge pull request 'fixed the result_country definition and updated the stats DB copy procedure' (#412) from antonis.lempesis/dnet-hadoop:beta into masterClaudio Atzori2024-04-03 12:34:17 +0200
b7c8acc563- Update the code which acquires the "IMPALA_HDFS_NODE", to test the "tmp"-dir, instead of the base-dir and introduce retries, to overcome potential file-system failures. This change was suggested by "Sebastian Tymkow" and "Grzegorz Bakalarski". - Fix typos.
#416
#412
Lampros Smyrnaios2024-04-03 13:15:37 +0300
50fbebf186[NOAMI] removed entry for Health and Social Care Board from the list of funders. Modified IRC putting 1596 and 1597 as synonyms, as required in ticket 9635Miriam Baglioni2024-04-03 11:45:40 +0200
71d6e02886Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into betaMichele Artini2024-04-03 09:50:41 +0200
42846d3b91[OpenCitation] add compression option when writing the sequence fileMiriam Baglioni2024-04-03 09:25:00 +0200
4f0a044245Merge pull request 'Add action set creation for Datacite affiliations' (#413) from 9647_datacite_affiliations into betaMiriam Baglioni2024-04-02 17:33:38 +0200
4bb504e693Merge pull request '[UsageCount] fixed error' (#415) from UsageStatsRecordDS into betaMiriam Baglioni2024-04-02 17:06:12 +0200
2c4440951fMerge pull request '[UsageCount] add check in case the datasource is not matched against those present in the graph' (#414) from UsageStatsRecordDS into betaMiriam Baglioni2024-04-02 16:30:39 +0200