[Aggregator graph|master] Discard invalid records #245

Merged
claudio.atzori merged 6 commits from discard-non-wellformed into master 2022-09-19 09:48:20 +02:00

This PR modifyes the bahavior of the oozie workflow responsible to create the aggregator graph.

Background

Transformation rules might output XML records that are either invalid XMLs or might not satisfy certain characteristics that would make them to be excluded by the mapping layer that transforms them in the internal graph data model. So far this happened silently during the application of the mapping, hardly making the situation to emerge and thus not allowing to inspect the invalid records and take consequent measures.

Proposed change

The proposed modification alters the oozie workflow adding an extra spark action (VerifyRecords) that implements the verification. It actually applies the mapping and in case it returns an error, the input record is stored aside for further inspection in the path configured as:

<arg>--invalidPath</arg><arg>${workingDir}/invalid_records</arg>
This PR modifyes the bahavior of the oozie workflow responsible to create the aggregator graph. #### Background Transformation rules might output XML records that are either invalid XMLs or might not satisfy certain characteristics that would make them to be excluded by the mapping layer that transforms them in the internal graph data model. So far this happened silently during the application of the mapping, hardly making the situation to emerge and thus not allowing to inspect the invalid records and take consequent measures. #### Proposed change The proposed modification alters the oozie workflow adding an extra spark action (`VerifyRecords`) that implements the verification. It actually applies the mapping and in case it returns an error, the input record is stored aside for further inspection in the path configured as: ``` <arg>--invalidPath</arg><arg>${workingDir}/invalid_records</arg> ```
claudio.atzori added the
enhancement
label 2022-09-19 09:44:58 +02:00
alessia.bardi was assigned by claudio.atzori 2022-09-19 09:44:58 +02:00
miriam.baglioni was assigned by claudio.atzori 2022-09-19 09:44:59 +02:00
claudio.atzori self-assigned this 2022-09-19 09:44:59 +02:00
claudio.atzori added 6 commits 2022-09-19 09:45:10 +02:00
claudio.atzori changed title from [Aggregator graph] Discard invalid records to [Aggregator graph|master] Discard invalid records 2022-09-19 09:48:08 +02:00
claudio.atzori merged commit 96062164f9 into master 2022-09-19 09:48:19 +02:00
Sign in to join this conversation.
No description provided.