DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag #329

Merged
miriam.baglioni merged 2 commits from dispatch_filter_invisible_entities into beta 2023-08-10 12:56:19 +02:00

This PR implements point 5 of 8898:

  • Get rid of invisible records after deduplication. They already contributed so we don't need them anymore after the deduplication workflow (graph consistency)

DispatchEntitiesSparkJob has been extended with a "--filterInvisible" flag to control whether invisible records have to be filtered out or not.
Also, DispatchEntitiesSparkJob has been refactored to dispatch all entities as parallel Spark jobs in the same application instead of parallel oozie jobs.

Note: graph/group and dedup/consistency workflows now require a mandatory filterInvisible parameters to be set.

This PR implements point 5 of [8898](https://support.openaire.eu/issues/8898): - Get rid of invisible records after deduplication. They already contributed so we don't need them anymore after the deduplication workflow (graph consistency) DispatchEntitiesSparkJob has been extended with a "--filterInvisible" flag to control whether invisible records have to be filtered out or not. Also, DispatchEntitiesSparkJob has been refactored to dispatch all entities as parallel Spark jobs in the same application instead of parallel oozie jobs. Note: graph/group and dedup/consistency workflows now require a mandatory filterInvisible parameters to be set.
giambattista.bloisi added 1 commit 2023-08-09 15:54:24 +02:00
giambattista.bloisi requested review from claudio.atzori 2023-08-09 15:54:36 +02:00
giambattista.bloisi requested review from miriam.baglioni 2023-08-09 15:54:44 +02:00
giambattista.bloisi added 1 commit 2023-08-10 11:54:00 +02:00
95cd2b9b1e Make filterInvisible a mandatory parameter of DispathEntitiesSparkJob
Make filterInvisible a mandatory parameter of both dedup/consistency and graph/group oozie workflows
miriam.baglioni merged commit 35b8deb2c6 into beta 2023-08-10 12:56:19 +02:00
Sign in to join this conversation.
No description provided.