Commit Graph

5234 Commits

Author SHA1 Message Date
Lampros Smyrnaios 49af2e5740 Miscellaneous updates to the copying operation to Impala Cluster:
- Update the algorithm for creating views that depend on other views; overcome some bash-instabilities.
- Upon any error, fail the whole process, not just the current DB-creation, as those errors usually indicate a bug in the initial DB-creation, that should be fixed immediately.
- Enhance parallel-copy of large files by "hadoop distcp" command.
- Reduce the "invalidate metadata" commands to just the current DB's tables, in order to eliminate the general overhead on Impala.
- Show the number of tables and views in the logs.
- Fix some log-messages.
2024-04-23 17:15:04 +03:00
Antonis Lempesis d2649a1429 increased the jvm ram 2024-04-23 16:03:16 +03:00
Claudio Atzori c3053ef34d using version 1.2.5-beta for the release 2024-04-23 14:52:32 +02:00
Claudio Atzori b5bcab13ec using version 1.2.5-beta for the release 2024-04-23 14:36:39 +02:00
Claudio Atzori 425c9afc36 using version 1.2.5-beta for the release 2024-04-23 14:30:04 +02:00
Claudio Atzori 93dd9cc639 code formatting 2024-04-23 11:28:00 +02:00
Miriam Baglioni 6189879643 [NOAMI] removed entry for Irish Research eLibray (IReL) Care Board from the list of funders. 2024-04-23 11:09:18 +02:00
Claudio Atzori c57cff2d6d Merge pull request '[WebCrawl] adding affiliation relations from web information' (#428) from WebCrowlBeta into beta
Reviewed-on: #428
2024-04-23 09:36:15 +02:00
Miriam Baglioni 7de114bda0 [WebCrawl] addressing comments from PR 2024-04-22 13:52:50 +02:00
Claudio Atzori eb4692e4ee Merge branch 'beta' into WebCrowlBeta 2024-04-22 11:40:24 +02:00
Claudio Atzori 24a83fc24f avoid NPEs in common Oaf merge utilities 2024-04-22 11:39:44 +02:00
Sandro La Bruzzo 073f320c6a Added module containing all the dependencies, useful for spark deploy on k8. 2024-04-22 11:32:31 +02:00
Miriam Baglioni 776c898c4b [WebCrawl] adding affiliation relations from web information 2024-04-22 11:04:17 +02:00
Claudio Atzori 5857fd38c1 avoid NPEs in common Oaf merge utilities 2024-04-21 08:29:09 +02:00
Claudio Atzori 0656ab2838 code formatting 2024-04-20 08:10:58 +02:00
Claudio Atzori ab7f0855af fixed query reading projects from the aggregator DB 2024-04-20 08:10:32 +02:00
Claudio Atzori 7a7e313157 updated schema version 2024-04-19 17:30:25 +02:00
Claudio Atzori e5879b68c7 [transformative agreement] including reuslt-funder relations to the information imported from the TRs 2024-04-19 17:14:18 +02:00
Claudio Atzori 3a027e97a7 [graph indexing] sets spark memoryOverhead in the join operations to the same value used for the memory executor 2024-04-19 16:59:58 +02:00
Sandro La Bruzzo b72c3139e2 updated Ignore annotation that is deprecated to Disabled 2024-04-19 14:52:40 +02:00
Sandro La Bruzzo b84ad0c06e merged beta 2024-04-19 14:39:59 +02:00
Antonis Lempesis b52a5a753b Merge remote-tracking branch 'upstream/beta' into beta 2024-04-19 15:28:28 +03:00
Sandro La Bruzzo 8dd9cf84e2 code formatted 2024-04-19 12:30:59 +02:00
Sandro La Bruzzo 342cb6189b fixed problem on changed signature on RowEncoder
removed property dhp.schema.artifact
2024-04-19 12:13:26 +02:00
Antonis Lempesis c3fe9662b2 all indicator tables are now stored as parquet 2024-04-19 12:45:36 +03:00
Claudio Atzori 57c678d904 integrating changes from PR#424 2024-04-18 11:38:35 +02:00
Claudio Atzori 5ab8cd1794 Various fixes for the stats DB update workflow, step16-createIndicatorsTables.sql 2024-04-18 11:28:18 +02:00
Antonis Lempesis 0c71c58df6 fixed the definition of gold_oa 2024-04-18 12:01:27 +03:00
Antonis Lempesis 43d05dbebb fixed the definition of result_country 2024-04-18 11:53:50 +03:00
Antonis Lempesis e728a0897c fixed the definition of indi_pub_bronze_oa 2024-04-18 11:07:55 +03:00
Antonis Lempesis 308ae580a9 slight optimization in indi_pub_gold_oa definition 2024-04-18 10:57:52 +03:00
Antonis Lempesis 27d22bd8f9 slight optimization in indi_pub_gold_oa definition 2024-04-17 23:59:52 +03:00
Antonis Lempesis 1f5aba12fa slight optimization in indi_pub_gold_oa definition 2024-04-17 23:54:23 +03:00
Claudio Atzori b554c41cc7 Merge pull request 'doidoost_dismiss' (#418) from doidoost_dismiss into beta
Reviewed-on: #418
2024-04-17 12:01:11 +02:00
Claudio Atzori ac8747582c Merge branch 'beta' into doidoost_dismiss 2024-04-17 12:01:01 +02:00
Claudio Atzori 0db7e4ae9a Merge pull request 'Refinements to PR #404: refactoring the Oaf records merge utilities into dhp-common' (#422) from revised_merge_logic into beta
Reviewed-on: #422
2024-04-17 11:58:26 +02:00
Giambattista Bloisi 8ac167e420 Refinements to PR #404: refactoring the Oaf records merge utilities into dhp-common 2024-04-16 17:18:28 +02:00
Miriam Baglioni 0625b9061f removed the funder id : 100011062 Asian Spinal Cord Network, wrongly associated to Ireland 2024-04-16 15:26:53 +02:00
Miriam Baglioni 9eeb9f5d32 mergin with branch beta 2024-04-16 15:24:40 +02:00
Claudio Atzori 589bce3520 Merge pull request '[pBETA] Improvements to copying data from ocean to impala' (#421) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #421
2024-04-16 14:22:32 +02:00
Sandro La Bruzzo a5ddd8dfbb Added Action set generation for the MAG organization 2024-04-16 13:39:15 +02:00
Giambattista Bloisi da333e9f4d Merge pull request 'Enhance Dedup authors matching with algorithms used for ORCID enhancements (task 9690)' (#419) from dedup_authorsmatch_bytoken into beta
Reviewed-on: #419
2024-04-16 10:24:11 +02:00
Claudio Atzori 43fd1de681 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2024-04-16 09:42:05 +02:00
Claudio Atzori d070db4a32 added a couple more invalid author names 2024-04-16 09:41:59 +02:00
Michele Artini 78b9d84e4a test 2024-04-16 09:41:16 +02:00
Giambattista Bloisi 43b454399f - Bug fix in matchOrderedTokenAndAbbreviations algorithms where tokens with same initial character were always considered equal
- AuthorsMatch exploits the new matching strategy used for ORCID enhancements in #PR398: split author names in tokens, order the tokens, then check for matches of ordered full tokens or abbreviations
2024-04-15 18:19:29 +02:00
Lampros Smyrnaios d7da4f814b Minor updates to the copying operation to Impala Cluster:
- Improve logging.
- Code optimization/polishing.
2024-04-12 18:12:06 +03:00
Lampros Smyrnaios 14719dcd62 Miscellaneous updates to the copying operation to Impala Cluster:
- Update the algorithm for creating views that depend on other views.
- Add check for successful execution of the "hadoop distcp" command.
- Add a check for successful copy operation of all entities.
- Upon facing an error in a DB, exit the method, instead of the whole script.
- Improve logging.
- Code polishing.
2024-04-12 15:36:13 +03:00
Sandro La Bruzzo 41a42dde64 code formatted 2024-04-11 17:43:48 +02:00
Sandro La Bruzzo 843dc95340 resolved conflict 2024-04-11 17:38:16 +02:00