Giambattista Bloisi
|
6b1c05d118
|
Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb
|
2023-08-29 16:04:19 +02:00 |
Giambattista Bloisi
|
d012aec0b3
|
Revert PropagateRelation's argument name from outputPath to graphOutputPath in consistency workflow (#8964)
|
2023-08-28 22:44:54 +02:00 |
Giambattista Bloisi
|
95cd2b9b1e
|
Make filterInvisible a mandatory parameter of DispathEntitiesSparkJob
Make filterInvisible a mandatory parameter of both dedup/consistency and graph/group oozie workflows
|
2023-08-10 11:53:48 +02:00 |
Giambattista Bloisi
|
fab9920271
|
DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag
|
2023-08-09 15:41:43 +02:00 |
Giambattista Bloisi
|
af49424b59
|
Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleyted by inference or that are pointing to dangling entities
|
2023-08-04 14:27:39 +02:00 |
Claudio Atzori
|
44a937f4ed
|
factored out entity grouping implementation, extended to consider results from delegated authorities rather than identical records from other sources
|
2022-01-19 12:24:52 +01:00 |
Claudio Atzori
|
0a727d325d
|
[dedup] increased number of partitions in the consistency phase
|
2021-11-16 08:43:41 +01:00 |
Claudio Atzori
|
8f309b72ff
|
[dedup] using node names consistently across the workflow
|
2021-04-21 17:54:51 +02:00 |
Claudio Atzori
|
e208b03755
|
renamed workflow
|
2020-11-25 14:55:50 +01:00 |
Claudio Atzori
|
dfd6205b95
|
Consistency graph workflow merges all the entities by ID
|
2020-11-25 14:55:32 +01:00 |
Claudio Atzori
|
71813795f6
|
various refactorings on the dnet-dedup-openaire workflow
|
2020-04-18 12:06:23 +02:00 |
Claudio Atzori
|
038ac7afd7
|
relation consistency workflow separated from dedup scan and creation of CCs
|
2020-04-17 13:12:44 +02:00 |
Claudio Atzori
|
376efd67de
|
removed prepare statement in spark action
|
2020-04-16 12:14:16 +02:00 |
Claudio Atzori
|
011b342bc9
|
trying to avoid OOM in SparkPropagateRelation
|
2020-04-16 11:13:51 +02:00 |
Claudio Atzori
|
069ef5eaed
|
trying to avoid OOM in SparkPropagateRelation
|
2020-04-15 21:23:21 +02:00 |
Claudio Atzori
|
673e744649
|
moved openaire specific implementations under dedicated package eu.dnetlib.dhp.oa
|
2020-03-27 10:42:17 +01:00 |