Sandro La Bruzzo
a87f9ea643
fixed scholexplorer bug
2024-05-17 14:16:43 +02:00
Sandro La Bruzzo
6efab4d88e
fixed scholexplorer bug
2024-05-16 16:19:18 +02:00
Claudio Atzori
92f018d196
[graph provision] fixed path pointing to an intermediate data store in the working directory
2024-05-15 15:39:18 +02:00
Claudio Atzori
0611c81a2f
[graph provision] using Qualifier.classNames to populate the correponsing fields in the JSON payload
2024-05-15 15:33:10 +02:00
Michele Artini
2b3b5fe9a1
oai finalization and test
2024-05-15 14:13:16 +02:00
Claudio Atzori
1efe7f7e39
[graph provision] upgrade to dhp-schema:6.1.2, included project.oamandatepublications in the JSON payload mapping, fixed serialisation of the usageCounts measures
2024-05-14 12:39:31 +02:00
Claudio Atzori
53e7bb4336
Merge pull request 'rest-collector-plugin-with-retry' ( #432 ) from rest-collector-plugin-with-retry into beta
...
Reviewed-on: #432
2024-05-10 09:02:33 +02:00
Claudio Atzori
f7d56e2ef2
Merge branch 'beta' into rest-collector-plugin-with-retry
2024-05-10 09:02:21 +02:00
Claudio Atzori
c1237ab39e
Merge pull request 'Fixes in Graph Provision' ( #434 ) from beta_provision_relation into beta
...
Reviewed-on: #434
2024-05-09 14:15:05 +02:00
Claudio Atzori
dc3a5858f7
Merge branch 'beta' into beta_provision_relation
2024-05-09 14:14:43 +02:00
Claudio Atzori
55f39f7850
[graph provision] adds the possibility to validate the XML records before storing them via the validateXML parameter
2024-05-09 14:06:04 +02:00
Claudio Atzori
39a2afe8b5
[graph provision] fixed XML serialization of the usage counts measures, renamed workflow actions to better reflect their role
2024-05-09 13:54:42 +02:00
Claudio Atzori
908ed9da7a
Merge pull request 'Various fixes in the stats wf' ( #430 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #430
2024-05-08 13:41:02 +02:00
Antonis Lempesis
0cada3cc8f
every step is run in the analytics queue. Hardcoded for now, will make a parameter later
2024-05-08 13:42:53 +03:00
Antonis Lempesis
90a4fb3547
fixed typos
2024-05-08 13:17:58 +03:00
Claudio Atzori
18aa323ee9
cleanup unused classes, adjustments in the oozie wf definition
2024-05-08 11:36:46 +02:00
Michele Artini
c9a327bc50
refactoring of gzip method
2024-05-08 11:34:08 +02:00
Michele Artini
e234848af8
oaf record: xpath for root
2024-05-08 10:00:53 +02:00
Claudio Atzori
b4e3389432
fixed property mapping creating the RelatedEntity transient objects. spark cores & memory adjustments. Code formatting
2024-05-07 16:25:17 +02:00
Giambattista Bloisi
711048ceed
PrepareRelationsJob rewritten to use Spark Dataframe API and Windowing functions
2024-05-07 15:44:33 +02:00
Michele Artini
70bf6ac415
oai exporter tests
2024-05-07 09:36:26 +02:00
Michele Artini
aa40e53c19
oai exporter parameters
2024-05-07 08:01:19 +02:00
Michele Artini
ed052a3476
job for the population of the oai database
2024-05-06 16:08:33 +02:00
Claudio Atzori
26363060ed
fixed id prefix creation for the fosnodoi records, again
2024-05-03 15:53:52 +02:00
Claudio Atzori
0486227185
[cleaning] deactivating the cleaning of FOS subjects found in the metadata provided by repositories
2024-05-03 14:31:12 +02:00
Claudio Atzori
a5d13d5d27
code formatting
2024-05-03 14:14:34 +02:00
Claudio Atzori
e1a0fb8933
fixed id prefix creation for the fosnodoi records
2024-05-03 14:14:18 +02:00
Giambattista Bloisi
69c5efbd8b
Fix: when applying enrichments with no instance information the resulting merge entity was generated with no instance instead of keeping the original information
2024-05-03 13:57:56 +02:00
Sandro La Bruzzo
db358ad0d2
code formatted
2024-05-02 15:25:57 +02:00
Sandro La Bruzzo
26bf8e763a
merged from beta
2024-05-02 15:20:23 +02:00
Sandro La Bruzzo
a860c57bbc
updated .gitignore
2024-05-02 15:16:00 +02:00
Sandro La Bruzzo
0646d0d064
Updated main sparkApplication to avoid to require master variable
2024-05-02 15:15:03 +02:00
Claudio Atzori
00ad21d814
Merge pull request 'preparations for dhp-common beta release 1.2.5' ( #433 ) from beta-release-1.2.5 into beta
...
Reviewed-on: #433
2024-05-02 11:28:19 +02:00
Claudio Atzori
4355f64810
reverted to version 1.2.5-SNAPSHOT
2024-05-02 11:23:53 +02:00
Claudio Atzori
66680b8b9a
refactoring of common utilities
2024-05-02 11:16:58 +02:00
Claudio Atzori
dcf23b3d06
Merge branch 'beta' into beta-release-1.2.5
2024-05-02 10:01:49 +02:00
Michele Artini
f4068de298
code reindent + tests
2024-05-02 09:51:33 +02:00
Claudio Atzori
11bd89e132
[enrichment] use sparkExecutorMemory to define also the memoryOverhead
2024-05-01 08:32:59 +02:00
Claudio Atzori
e96c2c1606
[ranking wf] set spark.executor.memoryOverhead to fine tune the resource consumption
2024-04-30 16:23:25 +02:00
Claudio Atzori
50c18f7a0b
[dedup wf] revised memory settings to address the increased volume of input contents
2024-04-30 12:34:16 +02:00
Michele Artini
2615136efc
added a retry mechanism
2024-04-30 11:58:42 +02:00
Sandro La Bruzzo
133ead1e3e
updated new version of scholexplorer Generation
2024-04-29 09:00:30 +02:00
Sandro La Bruzzo
052c6aac9d
formatted code
2024-04-26 16:03:04 +02:00
Sandro La Bruzzo
9cd3bc0f10
Added a new generation of the dump for scholexplorer tested with last version of spark, and strongly refactored
2024-04-26 16:02:07 +02:00
Claudio Atzori
c08a58bba8
Merge pull request 'Miscellaneous related to changes in MergeUtils' ( #429 ) from misc_fixes_merge_entities into beta
...
Reviewed-on: #429
2024-04-24 08:55:37 +02:00
Claudio Atzori
e2937db385
Merge branch 'beta' into misc_fixes_merge_entities
2024-04-24 08:55:28 +02:00
Giambattista Bloisi
1878199dae
Miscellaneous fixes:
...
- in Merge By ID pick by preference those records coming from delegated Authorities
- fix various tests
- close spark session in SparkCreateSimRels
2024-04-24 08:12:45 +02:00
Sandro La Bruzzo
0d628cd62b
merged again from beta
2024-04-23 17:34:55 +02:00
Lampros Smyrnaios
3c17183d10
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into convert_hive_to_spark_actions
2024-04-23 17:18:16 +03:00
Lampros Smyrnaios
49af2e5740
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Update the algorithm for creating views that depend on other views; overcome some bash-instabilities.
- Upon any error, fail the whole process, not just the current DB-creation, as those errors usually indicate a bug in the initial DB-creation, that should be fixed immediately.
- Enhance parallel-copy of large files by "hadoop distcp" command.
- Reduce the "invalidate metadata" commands to just the current DB's tables, in order to eliminate the general overhead on Impala.
- Show the number of tables and views in the logs.
- Fix some log-messages.
2024-04-23 17:15:04 +03:00