[Aggregator graph|master] Discard invalid records #245
No reviewers
Labels
No Label
bug
duplicate
enhancement
help wanted
invalid
question
RDGraph
RSAC
wontfix
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: D-Net/dnet-hadoop#245
Loading…
Reference in New Issue
No description provided.
Delete Branch "discard-non-wellformed"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR modifyes the bahavior of the oozie workflow responsible to create the aggregator graph.
Background
Transformation rules might output XML records that are either invalid XMLs or might not satisfy certain characteristics that would make them to be excluded by the mapping layer that transforms them in the internal graph data model. So far this happened silently during the application of the mapping, hardly making the situation to emerge and thus not allowing to inspect the invalid records and take consequent measures.
Proposed change
The proposed modification alters the oozie workflow adding an extra spark action (
VerifyRecords
) that implements the verification. It actually applies the mapping and in case it returns an error, the input record is stored aside for further inspection in the path configured as:[Aggregator graph] Discard invalid recordsto [Aggregator graph|master] Discard invalid records