implementation of the whitelist for similarity relations #144
No reviewers
Labels
No Label
bug
duplicate
enhancement
help wanted
invalid
question
RDGraph
RSAC
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: D-Net/dnet-hadoop#144
Loading…
Reference in New Issue
No description provided.
Delete Branch "dedup_whitelist"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Implementation of a new Job for the Scan WF (de-duplication).
The job takes the whitelist file path to add whitelisted similarity relations to the relations calculated by the dedup algorithm.
File format: source_id####target_id (1 per line)
Note for updating the dnet workflow: the only parameter we need to introduce is the
whiteListPath
pointing to the HDFS location of the whitelist file.