Commit Graph

5159 Commits

Author SHA1 Message Date
Miriam Baglioni aecf3b4f2e [WebCrawl] first implementation 2024-04-19 17:06:41 +02:00
Claudio Atzori 8fdd0244ad Merge pull request 'Various fixes for the stats DB update workflow, step16-createIndicatorsTables.sql' (#425) from stats_step16_fix into master
Reviewed-on: #425
2024-04-18 11:25:24 +02:00
Claudio Atzori 18fdaaf548 integrating suggestion from #9699 to improve the result_country table construction 2024-04-18 11:23:43 +02:00
Claudio Atzori 43e123c624 added column alias 2024-04-17 16:40:29 +02:00
Claudio Atzori 62a07b7add added missing end of statement /*EOS*/ 2024-04-17 15:13:28 +02:00
Claudio Atzori 96bddcc921 revised query implementation for indi_pub_gold_oa 2024-04-17 15:06:50 +02:00
Miriam Baglioni 0486cea4c4 removed the funder id : 100011062 Asian Spinal Cord Network, wrongly associated to Ireland 2024-04-16 15:36:40 +02:00
Claudio Atzori 013935c593 Merge pull request 'Improvements to copying data from ocean to impala' (#420) from antonis.lempesis/dnet-hadoop:beta into master
Reviewed-on: #420
2024-04-16 14:17:47 +02:00
Lampros Smyrnaios d7da4f814b Minor updates to the copying operation to Impala Cluster:
- Improve logging.
- Code optimization/polishing.
2024-04-12 18:12:06 +03:00
Lampros Smyrnaios 14719dcd62 Miscellaneous updates to the copying operation to Impala Cluster:
- Update the algorithm for creating views that depend on other views.
- Add check for successful execution of the "hadoop distcp" command.
- Add a check for successful copy operation of all entities.
- Upon facing an error in a DB, exit the method, instead of the whole script.
- Improve logging.
- Code polishing.
2024-04-12 15:36:13 +03:00
Lampros Smyrnaios 22745027c8 Use the "HADOOP_USER_NAME" value from the "workflow-property", in "copyDataToImpalaCluster.sh", in "stats-monitor-updates". 2024-04-11 17:46:33 +03:00
Lampros Smyrnaios abf0b69f29 Upgrade the copying operation to Impala Cluster:
- Use only hive commands in the Ocean Cluster, as the "impala-shell" will be removed from there to free-up resources.
- Hugely improve the performance in every aspect of the copying process: a) speedup file-transferring and DB-deletion, b) eliminate permissions-assignment, "load" operations and "use $db" queries, c) retry only the "create view" statements and only as long as they depend on other non-created views, instead of trying to recreate all tables and views 5 consecutive times.
- Add error-checks for the creation of tables and views.
2024-04-11 17:12:12 +03:00
Claudio Atzori 6132bd028e Merge pull request 'Extend Crossref-funders mapping and datacite hostedbymap' (#417) from CrossrefFundersMap into master
Reviewed-on: #417
2024-04-09 10:30:53 +02:00
Miriam Baglioni 519db1ddef Extended mapping of funder from crossref (#9169, #9277) and change the correspondece files for the irish fundrs (#9635). Extended the datacite map to include the association between metadata and the EBRAINS datasource (SciLake) 2024-04-09 09:33:09 +02:00
Claudio Atzori 5add51f38c Merge pull request 'fixed the result_country definition and updated the stats DB copy procedure' (#412) from antonis.lempesis/dnet-hadoop:beta into master
Reviewed-on: #412
2024-04-03 12:34:17 +02:00
Lampros Smyrnaios b7c8acc563 - Update the code which acquires the "IMPALA_HDFS_NODE", to test the "tmp"-dir, instead of the base-dir and introduce retries, to overcome potential file-system failures. This change was suggested by "Sebastian Tymkow" and "Grzegorz Bakalarski".
- Fix typos.
2024-04-03 13:15:37 +03:00
Antonis Lempesis df6e3bda04 added new orgs in monitor 2024-04-01 22:45:29 +03:00
Antonis Lempesis 573b081f1d added new orgs in monitor 2024-04-01 22:24:46 +03:00
Antonis Lempesis 0bf2a7a359 fixed the result_country definition 2024-04-01 15:23:22 +03:00
Claudio Atzori f01390702e Merge pull request 'fixed typo in indicator query' (#410) from antonis.lempesis/dnet-hadoop:beta into master
Reviewed-on: #410
2024-03-27 13:42:07 +01:00
Antonis Lempesis 9ff44eed96 fixed typo in indicator query
added more institutions
2024-03-27 14:39:01 +02:00
Claudio Atzori 5592ccc37a Merge pull request 'added missing EOS, Generate tables with parquet-files, instead of csv in the contexts.sh script' (#408) from antonis.lempesis/dnet-hadoop:beta into master
Reviewed-on: #408
2024-03-27 12:02:57 +01:00
Antonis Lempesis 1fee4124e0 added missing EOS 2024-03-27 12:58:25 +02:00
Claudio Atzori d16c15da8d adjusted pom files 2024-03-26 14:00:44 +01:00
Lampros Smyrnaios 036ba03fcd Generate tables with parquet-files, instead of csv, in "dhp-stats-update/.../contexts.sh" script. 2024-03-26 13:29:04 +02:00
Claudio Atzori 09a6d17059 Merge pull request '[Stats wf] #372, #405 to production' (#406) from antonis.lempesis/dnet-hadoop:beta into master
Reviewed-on: #406
2024-03-26 12:18:26 +01:00
Claudio Atzori d70793847d resolving conflicts on step16-createIndicatorsTables.sql 2024-03-26 12:17:52 +01:00
Lampros Smyrnaios bc8c97182d Automatically select the ACTIVE HDFS NODE for Impala cluster, in all "copyDataToImpalaCluster.sh" scripts. 2024-03-26 13:01:12 +02:00
Lampros Smyrnaios 92cc27e7eb Use the ACTIVE HDFS NODE for Impala cluster, in "copyDataToImpalaCluster.sh" script. 2024-03-26 12:34:11 +02:00
Michele De Bonis f6601ea7d1 default parameters for openorgs updated 2024-03-25 13:07:04 +01:00
Michele De Bonis cd4c3c934d openorgs wf updated 2024-03-22 15:42:37 +01:00
Antonis Lempesis 4c40c96e30 code cleanup 2024-03-22 10:16:49 +02:00
Antonis Lempesis 459167ac2f Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta 2024-03-21 12:44:58 +02:00
Antonis Lempesis 07f634a46d code cleanup 2024-03-21 12:44:30 +02:00
Antonis Lempesis 9521625a07 code cleanup 2024-03-21 11:45:08 +02:00
Antonis Lempesis 67a5aa0a38 Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta 2024-03-19 11:24:54 +02:00
dimitrispie a3a570e9a0 Commit monitor-updates-wf 2024-03-19 09:42:21 +02:00
Michele Artini a99942f7cf filter by base types 2024-03-13 12:12:42 +01:00
Michele Artini 7f7083f53e updated sql query for filtering BASE records 2024-03-13 11:57:26 +01:00
Michele Artini d9b23a76c5 comments 2024-03-12 14:53:34 +01:00
Michele Artini 841ca92246 Merge pull request 'new plugin to collect from a dump of BASE' (#400) from base-collector-plugin into master
Reviewed-on: #400
2024-03-12 12:22:42 +01:00
Michele Artini 3bcfc40293 new plugin to collect from a dump of BASE 2024-03-12 12:17:58 +01:00
Antonis Lempesis f74c7e8689 selecting distinct peer_reviewed 2024-03-12 02:13:04 +02:00
Antonis Lempesis 3c79720342 fixed the irish result subset 2024-03-07 14:08:57 +02:00
Antonis Lempesis 5ae4b4286c Merge branch 'beta' of https://code-repo.d3science.org/antonis.lempesis/dnet-hadoop into beta 2024-03-07 12:15:19 +02:00
Antonis Lempesis 316d585c8a using distinct apcs per publication to avoid huge sums 2024-03-07 02:07:59 +02:00
Giambattista Bloisi 3067ea390d Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf 2024-03-04 11:13:34 +01:00
Miriam Baglioni c94d94035c [BulkTagging] added check to verify if field is present in the pathMap 2024-02-28 09:41:42 +01:00
Michele Artini 4374d7449e mapping of project PIDs 2024-02-22 14:44:35 +01:00
Claudio Atzori 07d009007b Merge pull request 'Fixed problem on missing author in crossref Mapping' (#384) from crossref_missing_author_fix_master into master
Reviewed-on: #384
2024-02-15 15:06:17 +01:00