Lampros Smyrnaios
d7da4f814b
Minor updates to the copying operation to Impala Cluster:
...
- Improve logging.
- Code optimization/polishing.
2024-04-12 18:12:06 +03:00
Lampros Smyrnaios
14719dcd62
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Update the algorithm for creating views that depend on other views.
- Add check for successful execution of the "hadoop distcp" command.
- Add a check for successful copy operation of all entities.
- Upon facing an error in a DB, exit the method, instead of the whole script.
- Improve logging.
- Code polishing.
2024-04-12 15:36:13 +03:00
Sandro La Bruzzo
41a42dde64
code formatted
2024-04-11 17:43:48 +02:00
Sandro La Bruzzo
843dc95340
resolved conflict
2024-04-11 17:38:16 +02:00
Sandro La Bruzzo
1e30454ee0
added vocabulary tu instanceTypeMApping of Mag
2024-04-11 17:32:30 +02:00
Sandro La Bruzzo
2581672c11
updated wf of MAG and crossref to use transaction
2024-04-11 17:27:49 +02:00
Lampros Smyrnaios
22745027c8
Use the "HADOOP_USER_NAME" value from the "workflow-property", in "copyDataToImpalaCluster.sh", in "stats-monitor-updates".
2024-04-11 17:46:33 +03:00
Lampros Smyrnaios
abf0b69f29
Upgrade the copying operation to Impala Cluster:
...
- Use only hive commands in the Ocean Cluster, as the "impala-shell" will be removed from there to free-up resources.
- Hugely improve the performance in every aspect of the copying process: a) speedup file-transferring and DB-deletion, b) eliminate permissions-assignment, "load" operations and "use $db" queries, c) retry only the "create view" statements and only as long as they depend on other non-created views, instead of trying to recreate all tables and views 5 consecutive times.
- Add error-checks for the creation of tables and views.
2024-04-11 17:12:12 +03:00
Claudio Atzori
3cad4a415d
fixed duplicated property dhp-schemas.version
2024-04-11 15:44:12 +02:00
Sandro La Bruzzo
a0642bd190
added instanceTypeMapping field on MAG
2024-04-11 13:10:12 +02:00
Sandro La Bruzzo
98dc042db5
mapping generated for MAG,
...
missing generation of Organization Action set
2024-04-05 18:12:53 +02:00
Sandro La Bruzzo
ef582948a7
Updated mapping
2024-04-05 11:10:44 +02:00
Sandro La Bruzzo
5142f462b5
completed mapping from paper to OAF, not tested
2024-04-04 21:06:04 +02:00
Miriam Baglioni
0794e0667b
Merge branch 'doidoost_dismiss' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doidoost_dismiss
2024-04-04 09:16:18 +02:00
Miriam Baglioni
4b1de076ac
[DataciteHostedByMap] added entry for EBRAINS
2024-04-04 09:16:14 +02:00
Miriam Baglioni
c8a88b2187
[DataciteHostedByMap] added entry for EBRAINS
2024-04-04 09:14:58 +02:00
Sandro La Bruzzo
31e152d2bb
Merge remote-tracking branch 'origin/doidoost_dismiss' into doidoost_dismiss
2024-04-03 17:08:35 +02:00
Sandro La Bruzzo
6f3e925cae
Implemented first part of the new MAG mapping
2024-04-03 17:07:14 +02:00
Miriam Baglioni
f0f6abf892
[MapToFunderLink]added references for HFRI and Erasmus+ for the creation of links for funders
2024-04-03 14:59:09 +02:00
Claudio Atzori
26b97aa5ed
Merge pull request '[BETA] fixed the result_country definition and updated the stats DB copy procedure' ( #416 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#416
2024-04-03 12:36:03 +02:00
Lampros Smyrnaios
b7c8acc563
- Update the code which acquires the "IMPALA_HDFS_NODE", to test the "tmp"-dir, instead of the base-dir and introduce retries, to overcome potential file-system failures. This change was suggested by "Sebastian Tymkow" and "Grzegorz Bakalarski".
...
- Fix typos.
2024-04-03 13:15:37 +03:00
Miriam Baglioni
50fbebf186
[NOAMI] removed entry for Health and Social Care Board from the list of funders. Modified IRC putting 1596 and 1597 as synonyms, as required in ticket 9635
2024-04-03 11:45:40 +02:00
Michele Artini
71d6e02886
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2024-04-03 09:50:41 +02:00
Michele Artini
02c9a311c8
base datainfo with trust=0.89
2024-04-03 09:50:21 +02:00
Miriam Baglioni
42846d3b91
[OpenCitation] add compression option when writing the sequence file
2024-04-03 09:25:00 +02:00
Miriam Baglioni
4f0a044245
Merge pull request 'Add action set creation for Datacite affiliations' ( #413 ) from 9647_datacite_affiliations into beta
...
Reviewed-on: D-Net/dnet-hadoop#413
2024-04-02 17:33:38 +02:00
Miriam Baglioni
4bb504e693
Merge pull request '[UsageCount] fixed error' ( #415 ) from UsageStatsRecordDS into beta
...
Reviewed-on: D-Net/dnet-hadoop#415
2024-04-02 17:06:12 +02:00
Serafeim Chatzopoulos
cbe13a5c61
Fix datacite input path in properties file
2024-04-02 18:00:35 +03:00
Miriam Baglioni
9c9a9562ae
[UsageCount] fixed error
2024-04-02 16:56:37 +02:00
Miriam Baglioni
2c4440951f
Merge pull request '[UsageCount] add check in case the datasource is not matched against those present in the graph' ( #414 ) from UsageStatsRecordDS into beta
...
Reviewed-on: D-Net/dnet-hadoop#414
2024-04-02 16:30:39 +02:00
Miriam Baglioni
b42bdd5fb3
[UsageCount] add check in case the datasource is not matched against those present in the graph
2024-04-02 16:28:27 +02:00
Miriam Baglioni
64cbd8abe9
Merge pull request '[UsageCount] Usage count per result split by datasource' ( #318 ) from UsageStatsRecordDS into beta
...
Reviewed-on: D-Net/dnet-hadoop#318
2024-04-02 10:21:39 +02:00
Antonis Lempesis
df6e3bda04
added new orgs in monitor
2024-04-01 22:45:29 +03:00
Antonis Lempesis
573b081f1d
added new orgs in monitor
2024-04-01 22:24:46 +03:00
Serafeim Chatzopoulos
0eb0701b26
Add action set creation for Datacite affiliations
2024-04-01 17:23:26 +03:00
Antonis Lempesis
0bf2a7a359
fixed the result_country definition
2024-04-01 15:23:22 +03:00
Claudio Atzori
24227ab598
Merge pull request '[BETA] fixed typo in indicator query' ( #411 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#411
2024-03-27 13:56:43 +01:00
Antonis Lempesis
9ff44eed96
fixed typo in indicator query
...
added more institutions
2024-03-27 14:39:01 +02:00
Claudio Atzori
cff6040424
Merge pull request '[BETA] added missing EOS, Generate tables with parquet-files, instead of csv in the contexts.sh script' ( #409 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#409
2024-03-27 12:04:04 +01:00
Antonis Lempesis
1fee4124e0
added missing EOS
2024-03-27 12:58:25 +02:00
Sandro La Bruzzo
73a67c0e4a
Improved Crossref mapping to include also unpaywall tested
2024-03-26 17:26:47 +01:00
Claudio Atzori
9e700a8b0d
Merge pull request 'adding context information to projects and datasources' ( #407 ) from taggingProjects into beta
...
Reviewed-on: D-Net/dnet-hadoop#407
2024-03-26 14:53:38 +01:00
Claudio Atzori
75551ad4ec
code formatting
2024-03-26 14:53:16 +01:00
Miriam Baglioni
94b931f7bd
[BulkTagging - tag datasource and projects]merging with branch beta
2024-03-26 14:25:19 +01:00
Miriam Baglioni
3b209261f2
[BulkTagging - tag datasource and projects]merging with branch beta
2024-03-26 14:21:27 +01:00
Lampros Smyrnaios
036ba03fcd
Generate tables with parquet-files, instead of csv, in "dhp-stats-update/.../contexts.sh" script.
2024-03-26 13:29:04 +02:00
Claudio Atzori
730eaffc85
Merge pull request 'correctly selecting the active hdfs node for the impala cluster' ( #405 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#405
2024-03-26 12:07:46 +01:00
Lampros Smyrnaios
bc8c97182d
Automatically select the ACTIVE HDFS NODE for Impala cluster, in all "copyDataToImpalaCluster.sh" scripts.
2024-03-26 13:01:12 +02:00
Lampros Smyrnaios
92cc27e7eb
Use the ACTIVE HDFS NODE for Impala cluster, in "copyDataToImpalaCluster.sh" script.
2024-03-26 12:34:11 +02:00
Claudio Atzori
ef52128c55
included new stats* workflows in parent pom list of modules, code formatting
2024-03-26 10:42:10 +01:00