Commit Graph

4282 Commits (master)

Author SHA1 Message Date
Claudio Atzori 0c05abe50b [graph indexing] sets spark memoryOverhead in the join operations to the same value used for the memory executor 1 week ago
Claudio Atzori 18fdaaf548 integrating suggestion from #9699 to improve the result_country table construction 1 week ago
Claudio Atzori 43e123c624 added column alias 1 week ago
Claudio Atzori 62a07b7add added missing end of statement /*EOS*/ 1 week ago
Claudio Atzori 96bddcc921 revised query implementation for indi_pub_gold_oa 1 week ago
Miriam Baglioni 0486cea4c4 removed the funder id : 100011062 Asian Spinal Cord Network, wrongly associated to Ireland 1 week ago
Claudio Atzori 013935c593 Merge pull request 'Improvements to copying data from ocean to impala' (#420) from antonis.lempesis/dnet-hadoop:beta into master
Reviewed-on: #420
1 week ago
Lampros Smyrnaios d7da4f814b Minor updates to the copying operation to Impala Cluster:
- Improve logging.
- Code optimization/polishing.
2 weeks ago
Lampros Smyrnaios 14719dcd62 Miscellaneous updates to the copying operation to Impala Cluster:
- Update the algorithm for creating views that depend on other views.
- Add check for successful execution of the "hadoop distcp" command.
- Add a check for successful copy operation of all entities.
- Upon facing an error in a DB, exit the method, instead of the whole script.
- Improve logging.
- Code polishing.
2 weeks ago
Lampros Smyrnaios 22745027c8 Use the "HADOOP_USER_NAME" value from the "workflow-property", in "copyDataToImpalaCluster.sh", in "stats-monitor-updates". 2 weeks ago
Lampros Smyrnaios abf0b69f29 Upgrade the copying operation to Impala Cluster:
- Use only hive commands in the Ocean Cluster, as the "impala-shell" will be removed from there to free-up resources.
- Hugely improve the performance in every aspect of the copying process: a) speedup file-transferring and DB-deletion, b) eliminate permissions-assignment, "load" operations and "use $db" queries, c) retry only the "create view" statements and only as long as they depend on other non-created views, instead of trying to recreate all tables and views 5 consecutive times.
- Add error-checks for the creation of tables and views.
2 weeks ago
Miriam Baglioni 519db1ddef Extended mapping of funder from crossref (#9169, #9277) and change the correspondece files for the irish fundrs (#9635). Extended the datacite map to include the association between metadata and the EBRAINS datasource (SciLake) 3 weeks ago
Claudio Atzori 5add51f38c Merge pull request 'fixed the result_country definition and updated the stats DB copy procedure' (#412) from antonis.lempesis/dnet-hadoop:beta into master
Reviewed-on: #412
3 weeks ago
Lampros Smyrnaios b7c8acc563 - Update the code which acquires the "IMPALA_HDFS_NODE", to test the "tmp"-dir, instead of the base-dir and introduce retries, to overcome potential file-system failures. This change was suggested by "Sebastian Tymkow" and "Grzegorz Bakalarski".
- Fix typos.
3 weeks ago
Antonis Lempesis df6e3bda04 added new orgs in monitor 4 weeks ago
Antonis Lempesis 573b081f1d added new orgs in monitor 4 weeks ago
Antonis Lempesis 0bf2a7a359 fixed the result_country definition 4 weeks ago
Claudio Atzori f01390702e Merge pull request 'fixed typo in indicator query' (#410) from antonis.lempesis/dnet-hadoop:beta into master
Reviewed-on: #410
1 month ago
Antonis Lempesis 9ff44eed96 fixed typo in indicator query
added more institutions
1 month ago
Claudio Atzori 5592ccc37a Merge pull request 'added missing EOS, Generate tables with parquet-files, instead of csv in the contexts.sh script' (#408) from antonis.lempesis/dnet-hadoop:beta into master
Reviewed-on: #408
1 month ago
Antonis Lempesis 1fee4124e0 added missing EOS 1 month ago
Claudio Atzori d16c15da8d adjusted pom files 1 month ago
Lampros Smyrnaios 036ba03fcd Generate tables with parquet-files, instead of csv, in "dhp-stats-update/.../contexts.sh" script. 1 month ago
Claudio Atzori 09a6d17059 Merge pull request '[Stats wf] #372, #405 to production' (#406) from antonis.lempesis/dnet-hadoop:beta into master
Reviewed-on: #406
1 month ago
Claudio Atzori d70793847d resolving conflicts on step16-createIndicatorsTables.sql 1 month ago
Lampros Smyrnaios bc8c97182d Automatically select the ACTIVE HDFS NODE for Impala cluster, in all "copyDataToImpalaCluster.sh" scripts. 1 month ago
Lampros Smyrnaios 92cc27e7eb Use the ACTIVE HDFS NODE for Impala cluster, in "copyDataToImpalaCluster.sh" script. 1 month ago
Michele De Bonis f6601ea7d1 default parameters for openorgs updated 1 month ago
Michele De Bonis cd4c3c934d openorgs wf updated 1 month ago
Antonis Lempesis 4c40c96e30 code cleanup 1 month ago
Antonis Lempesis 459167ac2f Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta 1 month ago
Antonis Lempesis 07f634a46d code cleanup 1 month ago
Antonis Lempesis 9521625a07 code cleanup 1 month ago
Antonis Lempesis 67a5aa0a38 Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta 1 month ago
dimitrispie a3a570e9a0 Commit monitor-updates-wf 1 month ago
Michele Artini a99942f7cf filter by base types 1 month ago
Michele Artini 7f7083f53e updated sql query for filtering BASE records 1 month ago
Michele Artini d9b23a76c5 comments 2 months ago
Michele Artini 3bcfc40293 new plugin to collect from a dump of BASE 2 months ago
Antonis Lempesis f74c7e8689 selecting distinct peer_reviewed 2 months ago
Antonis Lempesis 3c79720342 fixed the irish result subset 2 months ago
Antonis Lempesis 5ae4b4286c Merge branch 'beta' of https://code-repo.d3science.org/antonis.lempesis/dnet-hadoop into beta 2 months ago
Antonis Lempesis 316d585c8a using distinct apcs per publication to avoid huge sums 2 months ago
Giambattista Bloisi 3067ea390d Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf 2 months ago
Miriam Baglioni c94d94035c [BulkTagging] added check to verify if field is present in the pathMap 2 months ago
Michele Artini 4374d7449e mapping of project PIDs 2 months ago
Claudio Atzori b3ddbaed58 fixed import of ORPs stored on HDFS in the internal graph format (e.g. Datacite) 2 months ago
Claudio Atzori 1416f16b35 [graph raw] fixed mapping of the original resource type from the Datacite format 3 months ago
Giambattista Bloisi 079085286c Merge branch 'master' into fix_dedupaliases_deletedbyinference 3 months ago
Giambattista Bloisi 8dd666aedd Dedup aliases, created when a dedup in a previous build has been merged in a new dedup, need to be marked as "deletedbyinference", since they are "merged" in the new dedup 3 months ago