Claudio Atzori
57c678d904
integrating changes from PR#424
2024-04-18 11:38:35 +02:00
Claudio Atzori
5ab8cd1794
Various fixes for the stats DB update workflow, step16-createIndicatorsTables.sql
2024-04-18 11:28:18 +02:00
Lampros Smyrnaios
d7da4f814b
Minor updates to the copying operation to Impala Cluster:
...
- Improve logging.
- Code optimization/polishing.
2024-04-12 18:12:06 +03:00
Lampros Smyrnaios
14719dcd62
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Update the algorithm for creating views that depend on other views.
- Add check for successful execution of the "hadoop distcp" command.
- Add a check for successful copy operation of all entities.
- Upon facing an error in a DB, exit the method, instead of the whole script.
- Improve logging.
- Code polishing.
2024-04-12 15:36:13 +03:00
Lampros Smyrnaios
abf0b69f29
Upgrade the copying operation to Impala Cluster:
...
- Use only hive commands in the Ocean Cluster, as the "impala-shell" will be removed from there to free-up resources.
- Hugely improve the performance in every aspect of the copying process: a) speedup file-transferring and DB-deletion, b) eliminate permissions-assignment, "load" operations and "use $db" queries, c) retry only the "create view" statements and only as long as they depend on other non-created views, instead of trying to recreate all tables and views 5 consecutive times.
- Add error-checks for the creation of tables and views.
2024-04-11 17:12:12 +03:00
Lampros Smyrnaios
b7c8acc563
- Update the code which acquires the "IMPALA_HDFS_NODE", to test the "tmp"-dir, instead of the base-dir and introduce retries, to overcome potential file-system failures. This change was suggested by "Sebastian Tymkow" and "Grzegorz Bakalarski".
...
- Fix typos.
2024-04-03 13:15:37 +03:00
Antonis Lempesis
df6e3bda04
added new orgs in monitor
2024-04-01 22:45:29 +03:00
Antonis Lempesis
573b081f1d
added new orgs in monitor
2024-04-01 22:24:46 +03:00
Antonis Lempesis
0bf2a7a359
fixed the result_country definition
2024-04-01 15:23:22 +03:00
Antonis Lempesis
9ff44eed96
fixed typo in indicator query
...
added more institutions
2024-03-27 14:39:01 +02:00
Antonis Lempesis
1fee4124e0
added missing EOS
2024-03-27 12:58:25 +02:00
Lampros Smyrnaios
036ba03fcd
Generate tables with parquet-files, instead of csv, in "dhp-stats-update/.../contexts.sh" script.
2024-03-26 13:29:04 +02:00
Lampros Smyrnaios
92cc27e7eb
Use the ACTIVE HDFS NODE for Impala cluster, in "copyDataToImpalaCluster.sh" script.
2024-03-26 12:34:11 +02:00
Antonis Lempesis
4c40c96e30
code cleanup
2024-03-22 10:16:49 +02:00
Antonis Lempesis
459167ac2f
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta
2024-03-21 12:44:58 +02:00
Antonis Lempesis
07f634a46d
code cleanup
2024-03-21 12:44:30 +02:00
Antonis Lempesis
9521625a07
code cleanup
2024-03-21 11:45:08 +02:00
Antonis Lempesis
f74c7e8689
selecting distinct peer_reviewed
2024-03-12 02:13:04 +02:00
Antonis Lempesis
5ae4b4286c
Merge branch 'beta' of https://code-repo.d3science.org/antonis.lempesis/dnet-hadoop into beta
2024-03-07 12:15:19 +02:00
Antonis Lempesis
316d585c8a
using distinct apcs per publication to avoid huge sums
2024-03-07 02:07:59 +02:00
Antonis Lempesis
dd4c27f4f3
added 2 new institutions in monitor
2024-02-08 12:57:57 +02:00
Antonis Lempesis
a512ead447
changed orcid ids to all capital
2024-01-30 16:54:47 +02:00
Antonis Lempesis
bb10a22290
merged changes from dnet-hadoop
2024-01-29 21:51:47 +02:00
Giambattista Bloisi
078df0b4d1
Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf
2024-01-26 21:56:55 +01:00
Antonis Lempesis
c548796463
Changed step16-createIndicatorsTables to use a spark oozie action instead of hive
2024-01-26 02:04:48 +02:00
Antonis Lempesis
a7115cfa9e
max mem of joins (hive.mapjoin.followby.gby.localtask.max.memory.usage) now 80%, up from 55%.
2024-01-25 15:13:16 +01:00
Antonis Lempesis
fd43b0e84a
max mem of joins (hive.mapjoin.followby.gby.localtask.max.memory.usage) now 80%, up from 55%.
2024-01-25 15:06:34 +01:00
Antonis Lempesis
e024718f73
creating result_instances even when no pids exist for the instance
2024-01-10 22:25:50 +01:00
dimitrispie
b920307bdd
Changes to indicators
2024-01-09 00:47:09 +02:00
Antonis Lempesis
2e4cab026c
fixed the result_country definition
2024-01-08 16:01:26 +02:00
dimitrispie
40b98d8182
Changes to indicators and funders definition
...
- Changes result_refereed definition
- Added result_country indicator
- Added indi_pub_green_with_license indicator
- Added country from jurisdiction to funders
2023-12-22 10:29:20 +02:00
Claudio Atzori
93a700742a
Merge pull request 'Changes for tables and creation of the new indicator indi_is_result_accessible' ( #363 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#363
2023-12-01 15:05:23 +01:00
dimitrispie
c9d995dde0
New institutions added
2023-12-01 15:44:35 +02:00
dimitrispie
a397112cb8
Add new indicator
...
Add indi_pub_publicly_funded
2023-12-01 15:00:18 +02:00
dimitrispie
76594ded23
Changes to indicators
...
Fixes on open access colours indicators
- indi_pub_green_oa
- indi_pub_gold_oa
- indi_pub_hybrid
- indi_pub_bronze_oa
- indi_pub_diamond
2023-12-01 13:38:19 +02:00
dimitrispie
a94a54a2d0
Changes for tables and creation of the new indicator indi_is_result_accessible
...
- Drop table statements for all tables to avoid duplicates in case of wf rerun
- Add pdfsaggregated step to create the indi_is_result_accessible table. This step is executed on the new impala cluster only, since the pdfaggregation_i is updated on this cluster.
2023-11-15 14:32:18 +02:00
Claudio Atzori
4e6fccf4f6
Merge pull request 'Beta stats wf updated' ( #332 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#332
2023-10-10 09:35:32 +02:00
dimitrispie
17586f0ff8
Update step20-createMonitorDB.sql
...
Add result_orcid table to monitor dbs
2023-10-09 14:21:31 +03:00
dimitrispie
489a082f04
Update step16-createIndicatorsTables.sql
...
Change scripts for gold, hybrid, bronze indicators
2023-10-09 14:00:50 +03:00
dimitrispie
9ef971a146
Update step16-createIndicatorsTables.sql
...
Fix int year for:
indi_org_openess_year
indi_org_fairness_year
indi_org_findable_year
2023-09-19 14:25:42 +03:00
dimitrispie
5f90cc11e9
Update step16-createIndicatorsTables.sql
...
Fix indi_pub_bronze_oa
2023-09-06 14:14:38 +03:00
dimitrispie
964c2f553e
Changes in indicators step, monitor step
...
- graduatedoctorates for observatory
- result_apc_affiliations table
- new indicators
indi_is_funder_plan_s
indi_funder_fairness
indi_ris_fairness
indi_funder_openess
indi_ris_openess
indi_funder_findable
indi_ris_findable
indi_is_project_result_after
- cast year to int in composite indicators
- new institutions
-- Universidade Católica Portuguesa
-- Iscte - Instituto Universitário de Lisboa
-- Munster Technological University
-- Cardiff University
-- Leibniz Institute of Ecological Urban and Regional Development
2023-09-01 10:57:02 +03:00
Giambattista Bloisi
bb5b845e3c
Use scala.binary.version property to resolve scala maven dependencies
...
Ensure consistent usage of maven properties
Profile for compiling with scala 2.12 and Spark 3.4
2023-07-24 11:13:48 +02:00
dimitrispie
be4856ef35
Update step15.sql
2023-07-17 15:33:58 +03:00
dimitrispie
163b2ee2a8
Changes
...
1. Monitor updates
2. Bug fixes during copy to impala cluster
2023-07-13 15:25:00 +03:00
Claudio Atzori
b0ebf56367
Merge pull request 'Update step15_5.sql' ( #314 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#314
2023-06-21 10:33:22 +02:00
dimitrispie
2b6370eaee
Update step15_5.sql
...
Bug fix
2023-06-21 11:31:10 +03:00
Claudio Atzori
35e42a86ed
Merge pull request 'Update step15_5.sql' ( #313 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#313
2023-06-21 10:26:16 +02:00
dimitrispie
74cb060bfe
Update step15_5.sql
...
Add "if not exists" clause
2023-06-21 11:24:06 +03:00
Claudio Atzori
85e016df17
Merge pull request 'Update step16-createIndicatorsTables.sql' ( #312 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#312
2023-06-21 09:52:33 +02:00