implementation of the whitelist for similarity relations #144

Merged
claudio.atzori merged 3 commits from dedup_whitelist into beta 2021-09-27 16:47:41 +02:00

Implementation of a new Job for the Scan WF (de-duplication).
The job takes the whitelist file path to add whitelisted similarity relations to the relations calculated by the dedup algorithm.
File format: source_id####target_id (1 per line)

Implementation of a new Job for the Scan WF (de-duplication). The job takes the whitelist file path to add whitelisted similarity relations to the relations calculated by the dedup algorithm. File format: source_id####target_id (1 per line)
michele.debonis added 1 commit 2021-09-20 16:27:22 +02:00
claudio.atzori added 1 commit 2021-09-22 11:31:20 +02:00
claudio.atzori added 1 commit 2021-09-27 16:41:26 +02:00
claudio.atzori merged commit 35619b93ee into beta 2021-09-27 16:47:41 +02:00

Note for updating the dnet workflow: the only parameter we need to introduce is the whiteListPath pointing to the HDFS location of the whitelist file.

Note for updating the dnet workflow: the only parameter we need to introduce is the `whiteListPath` pointing to the HDFS location of the whitelist file.
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#144
No description provided.