Commit Graph

5482 Commits

Author SHA1 Message Date
Claudio Atzori 55f39f7850 [graph provision] adds the possibility to validate the XML records before storing them via the validateXML parameter 2024-05-09 14:06:04 +02:00
Claudio Atzori 39a2afe8b5 [graph provision] fixed XML serialization of the usage counts measures, renamed workflow actions to better reflect their role 2024-05-09 13:54:42 +02:00
Claudio Atzori 908ed9da7a Merge pull request 'Various fixes in the stats wf' (#430) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: #430
2024-05-08 13:41:02 +02:00
Antonis Lempesis 0cada3cc8f every step is run in the analytics queue. Hardcoded for now, will make a parameter later 2024-05-08 13:42:53 +03:00
Antonis Lempesis 90a4fb3547 fixed typos 2024-05-08 13:17:58 +03:00
Claudio Atzori 18aa323ee9 cleanup unused classes, adjustments in the oozie wf definition 2024-05-08 11:36:46 +02:00
Michele Artini c9a327bc50 refactoring of gzip method 2024-05-08 11:34:08 +02:00
Michele Artini e234848af8 oaf record: xpath for root 2024-05-08 10:00:53 +02:00
Claudio Atzori b4e3389432 fixed property mapping creating the RelatedEntity transient objects. spark cores & memory adjustments. Code formatting 2024-05-07 16:25:17 +02:00
Giambattista Bloisi 711048ceed PrepareRelationsJob rewritten to use Spark Dataframe API and Windowing functions 2024-05-07 15:44:33 +02:00
Michele Artini 70bf6ac415 oai exporter tests 2024-05-07 09:36:26 +02:00
Michele Artini aa40e53c19 oai exporter parameters 2024-05-07 08:01:19 +02:00
Michele Artini ed052a3476 job for the population of the oai database 2024-05-06 16:08:33 +02:00
Claudio Atzori 26363060ed fixed id prefix creation for the fosnodoi records, again 2024-05-03 15:53:52 +02:00
Claudio Atzori 0486227185 [cleaning] deactivating the cleaning of FOS subjects found in the metadata provided by repositories 2024-05-03 14:31:12 +02:00
Claudio Atzori a5d13d5d27 code formatting 2024-05-03 14:14:34 +02:00
Claudio Atzori e1a0fb8933 fixed id prefix creation for the fosnodoi records 2024-05-03 14:14:18 +02:00
Giambattista Bloisi 69c5efbd8b Fix: when applying enrichments with no instance information the resulting merge entity was generated with no instance instead of keeping the original information 2024-05-03 13:57:56 +02:00
Sandro La Bruzzo db358ad0d2 code formatted 2024-05-02 15:25:57 +02:00
Sandro La Bruzzo 26bf8e763a merged from beta 2024-05-02 15:20:23 +02:00
Sandro La Bruzzo a860c57bbc updated .gitignore 2024-05-02 15:16:00 +02:00
Sandro La Bruzzo 0646d0d064 Updated main sparkApplication to avoid to require master variable 2024-05-02 15:15:03 +02:00
Claudio Atzori 00ad21d814 Merge pull request 'preparations for dhp-common beta release 1.2.5' (#433) from beta-release-1.2.5 into beta
Reviewed-on: #433
2024-05-02 11:28:19 +02:00
Claudio Atzori 4355f64810 reverted to version 1.2.5-SNAPSHOT 2024-05-02 11:23:53 +02:00
Claudio Atzori 66680b8b9a refactoring of common utilities 2024-05-02 11:16:58 +02:00
Claudio Atzori dcf23b3d06 Merge branch 'beta' into beta-release-1.2.5 2024-05-02 10:01:49 +02:00
Michele Artini f4068de298 code reindent + tests 2024-05-02 09:51:33 +02:00
Claudio Atzori 11bd89e132 [enrichment] use sparkExecutorMemory to define also the memoryOverhead 2024-05-01 08:32:59 +02:00
Claudio Atzori e96c2c1606 [ranking wf] set spark.executor.memoryOverhead to fine tune the resource consumption 2024-04-30 16:23:25 +02:00
Claudio Atzori 50c18f7a0b [dedup wf] revised memory settings to address the increased volume of input contents 2024-04-30 12:34:16 +02:00
Michele Artini 2615136efc added a retry mechanism 2024-04-30 11:58:42 +02:00
Sandro La Bruzzo 133ead1e3e updated new version of scholexplorer Generation 2024-04-29 09:00:30 +02:00
Sandro La Bruzzo 052c6aac9d formatted code 2024-04-26 16:03:04 +02:00
Sandro La Bruzzo 9cd3bc0f10 Added a new generation of the dump for scholexplorer tested with last version of spark, and strongly refactored 2024-04-26 16:02:07 +02:00
Claudio Atzori c08a58bba8 Merge pull request 'Miscellaneous related to changes in MergeUtils' (#429) from misc_fixes_merge_entities into beta
Reviewed-on: #429
2024-04-24 08:55:37 +02:00
Claudio Atzori e2937db385 Merge branch 'beta' into misc_fixes_merge_entities 2024-04-24 08:55:28 +02:00
Giambattista Bloisi 1878199dae Miscellaneous fixes:
- in Merge By ID pick by preference those records coming from delegated Authorities
- fix various tests
- close spark session in SparkCreateSimRels
2024-04-24 08:12:45 +02:00
Sandro La Bruzzo 0d628cd62b merged again from beta 2024-04-23 17:34:55 +02:00
Lampros Smyrnaios 3c17183d10 Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into convert_hive_to_spark_actions 2024-04-23 17:18:16 +03:00
Lampros Smyrnaios 49af2e5740 Miscellaneous updates to the copying operation to Impala Cluster:
- Update the algorithm for creating views that depend on other views; overcome some bash-instabilities.
- Upon any error, fail the whole process, not just the current DB-creation, as those errors usually indicate a bug in the initial DB-creation, that should be fixed immediately.
- Enhance parallel-copy of large files by "hadoop distcp" command.
- Reduce the "invalidate metadata" commands to just the current DB's tables, in order to eliminate the general overhead on Impala.
- Show the number of tables and views in the logs.
- Fix some log-messages.
2024-04-23 17:15:04 +03:00
Antonis Lempesis d2649a1429 increased the jvm ram 2024-04-23 16:03:16 +03:00
Claudio Atzori c3053ef34d using version 1.2.5-beta for the release 2024-04-23 14:52:32 +02:00
Claudio Atzori b5bcab13ec using version 1.2.5-beta for the release 2024-04-23 14:36:39 +02:00
Claudio Atzori 425c9afc36 using version 1.2.5-beta for the release 2024-04-23 14:30:04 +02:00
Claudio Atzori 93dd9cc639 code formatting 2024-04-23 11:28:00 +02:00
Miriam Baglioni 6189879643 [NOAMI] removed entry for Irish Research eLibray (IReL) Care Board from the list of funders. 2024-04-23 11:09:18 +02:00
Claudio Atzori c57cff2d6d Merge pull request '[WebCrawl] adding affiliation relations from web information' (#428) from WebCrowlBeta into beta
Reviewed-on: #428
2024-04-23 09:36:15 +02:00
Lampros Smyrnaios 69a9ac7393 Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into convert_hive_to_spark_actions 2024-04-22 17:07:11 +03:00
Miriam Baglioni 7de114bda0 [WebCrawl] addressing comments from PR 2024-04-22 13:52:50 +02:00
Claudio Atzori eb4692e4ee Merge branch 'beta' into WebCrowlBeta 2024-04-22 11:40:24 +02:00