Miriam Baglioni
aecf3b4f2e
[WebCrawl] first implementation
2024-04-19 17:06:41 +02:00
Claudio Atzori
8fdd0244ad
Merge pull request 'Various fixes for the stats DB update workflow, step16-createIndicatorsTables.sql' ( #425 ) from stats_step16_fix into master
...
Reviewed-on: #425
2024-04-18 11:25:24 +02:00
Claudio Atzori
18fdaaf548
integrating suggestion from #9699 to improve the result_country table construction
2024-04-18 11:23:43 +02:00
Claudio Atzori
43e123c624
added column alias
2024-04-17 16:40:29 +02:00
Claudio Atzori
62a07b7add
added missing end of statement /*EOS*/
2024-04-17 15:13:28 +02:00
Claudio Atzori
96bddcc921
revised query implementation for indi_pub_gold_oa
2024-04-17 15:06:50 +02:00
Miriam Baglioni
0486cea4c4
removed the funder id : 100011062 Asian Spinal Cord Network, wrongly associated to Ireland
2024-04-16 15:36:40 +02:00
Claudio Atzori
013935c593
Merge pull request 'Improvements to copying data from ocean to impala' ( #420 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: #420
2024-04-16 14:17:47 +02:00
Lampros Smyrnaios
d7da4f814b
Minor updates to the copying operation to Impala Cluster:
...
- Improve logging.
- Code optimization/polishing.
2024-04-12 18:12:06 +03:00
Lampros Smyrnaios
14719dcd62
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Update the algorithm for creating views that depend on other views.
- Add check for successful execution of the "hadoop distcp" command.
- Add a check for successful copy operation of all entities.
- Upon facing an error in a DB, exit the method, instead of the whole script.
- Improve logging.
- Code polishing.
2024-04-12 15:36:13 +03:00
Lampros Smyrnaios
22745027c8
Use the "HADOOP_USER_NAME" value from the "workflow-property", in "copyDataToImpalaCluster.sh", in "stats-monitor-updates".
2024-04-11 17:46:33 +03:00
Lampros Smyrnaios
abf0b69f29
Upgrade the copying operation to Impala Cluster:
...
- Use only hive commands in the Ocean Cluster, as the "impala-shell" will be removed from there to free-up resources.
- Hugely improve the performance in every aspect of the copying process: a) speedup file-transferring and DB-deletion, b) eliminate permissions-assignment, "load" operations and "use $db" queries, c) retry only the "create view" statements and only as long as they depend on other non-created views, instead of trying to recreate all tables and views 5 consecutive times.
- Add error-checks for the creation of tables and views.
2024-04-11 17:12:12 +03:00
Claudio Atzori
6132bd028e
Merge pull request 'Extend Crossref-funders mapping and datacite hostedbymap' ( #417 ) from CrossrefFundersMap into master
...
Reviewed-on: #417
2024-04-09 10:30:53 +02:00
Miriam Baglioni
519db1ddef
Extended mapping of funder from crossref ( #9169 , #9277 ) and change the correspondece files for the irish fundrs ( #9635 ). Extended the datacite map to include the association between metadata and the EBRAINS datasource (SciLake)
2024-04-09 09:33:09 +02:00
Claudio Atzori
5add51f38c
Merge pull request 'fixed the result_country definition and updated the stats DB copy procedure' ( #412 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: #412
2024-04-03 12:34:17 +02:00
Lampros Smyrnaios
b7c8acc563
- Update the code which acquires the "IMPALA_HDFS_NODE", to test the "tmp"-dir, instead of the base-dir and introduce retries, to overcome potential file-system failures. This change was suggested by "Sebastian Tymkow" and "Grzegorz Bakalarski".
...
- Fix typos.
2024-04-03 13:15:37 +03:00
Antonis Lempesis
df6e3bda04
added new orgs in monitor
2024-04-01 22:45:29 +03:00
Antonis Lempesis
573b081f1d
added new orgs in monitor
2024-04-01 22:24:46 +03:00
Antonis Lempesis
0bf2a7a359
fixed the result_country definition
2024-04-01 15:23:22 +03:00
Claudio Atzori
f01390702e
Merge pull request 'fixed typo in indicator query' ( #410 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: #410
2024-03-27 13:42:07 +01:00
Antonis Lempesis
9ff44eed96
fixed typo in indicator query
...
added more institutions
2024-03-27 14:39:01 +02:00
Claudio Atzori
5592ccc37a
Merge pull request 'added missing EOS, Generate tables with parquet-files, instead of csv in the contexts.sh script' ( #408 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: #408
2024-03-27 12:02:57 +01:00
Antonis Lempesis
1fee4124e0
added missing EOS
2024-03-27 12:58:25 +02:00
Claudio Atzori
d16c15da8d
adjusted pom files
2024-03-26 14:00:44 +01:00
Lampros Smyrnaios
036ba03fcd
Generate tables with parquet-files, instead of csv, in "dhp-stats-update/.../contexts.sh" script.
2024-03-26 13:29:04 +02:00
Claudio Atzori
09a6d17059
Merge pull request '[Stats wf] #372 , #405 to production' ( #406 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: #406
2024-03-26 12:18:26 +01:00
Claudio Atzori
d70793847d
resolving conflicts on step16-createIndicatorsTables.sql
2024-03-26 12:17:52 +01:00
Lampros Smyrnaios
bc8c97182d
Automatically select the ACTIVE HDFS NODE for Impala cluster, in all "copyDataToImpalaCluster.sh" scripts.
2024-03-26 13:01:12 +02:00
Lampros Smyrnaios
92cc27e7eb
Use the ACTIVE HDFS NODE for Impala cluster, in "copyDataToImpalaCluster.sh" script.
2024-03-26 12:34:11 +02:00
Michele De Bonis
f6601ea7d1
default parameters for openorgs updated
2024-03-25 13:07:04 +01:00
Michele De Bonis
cd4c3c934d
openorgs wf updated
2024-03-22 15:42:37 +01:00
Antonis Lempesis
4c40c96e30
code cleanup
2024-03-22 10:16:49 +02:00
Antonis Lempesis
459167ac2f
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta
2024-03-21 12:44:58 +02:00
Antonis Lempesis
07f634a46d
code cleanup
2024-03-21 12:44:30 +02:00
Antonis Lempesis
9521625a07
code cleanup
2024-03-21 11:45:08 +02:00
Antonis Lempesis
67a5aa0a38
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta
2024-03-19 11:24:54 +02:00
dimitrispie
a3a570e9a0
Commit monitor-updates-wf
2024-03-19 09:42:21 +02:00
Michele Artini
a99942f7cf
filter by base types
2024-03-13 12:12:42 +01:00
Michele Artini
7f7083f53e
updated sql query for filtering BASE records
2024-03-13 11:57:26 +01:00
Michele Artini
d9b23a76c5
comments
2024-03-12 14:53:34 +01:00
Michele Artini
841ca92246
Merge pull request 'new plugin to collect from a dump of BASE' ( #400 ) from base-collector-plugin into master
...
Reviewed-on: #400
2024-03-12 12:22:42 +01:00
Michele Artini
3bcfc40293
new plugin to collect from a dump of BASE
2024-03-12 12:17:58 +01:00
Antonis Lempesis
f74c7e8689
selecting distinct peer_reviewed
2024-03-12 02:13:04 +02:00
Antonis Lempesis
3c79720342
fixed the irish result subset
2024-03-07 14:08:57 +02:00
Antonis Lempesis
5ae4b4286c
Merge branch 'beta' of https://code-repo.d3science.org/antonis.lempesis/dnet-hadoop into beta
2024-03-07 12:15:19 +02:00
Antonis Lempesis
316d585c8a
using distinct apcs per publication to avoid huge sums
2024-03-07 02:07:59 +02:00
Giambattista Bloisi
3067ea390d
Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf
2024-03-04 11:13:34 +01:00
Miriam Baglioni
c94d94035c
[BulkTagging] added check to verify if field is present in the pathMap
2024-02-28 09:41:42 +01:00
Michele Artini
4374d7449e
mapping of project PIDs
2024-02-22 14:44:35 +01:00
Claudio Atzori
07d009007b
Merge pull request 'Fixed problem on missing author in crossref Mapping' ( #384 ) from crossref_missing_author_fix_master into master
...
Reviewed-on: #384
2024-02-15 15:06:17 +01:00