blacklist filtering moved before the cleanup phase in order to have case sensitive regex #485
No reviewers
Labels
No Label
bug
duplicate
enhancement
help wanted
invalid
question
RDGraph
RSAC
wontfix
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: D-Net/dnet-hadoop#485
Loading…
Reference in New Issue
No description provided.
Delete Branch "dedup_blacklist_fix"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR fixes an error in the blacklist application stage when deduplication is performed. The regex in the blacklist are now applied before the cleaning (lower casing, normalization, etc.) of the fields allowing to take advantage of its case sensitiveness.