implementation of the whitelist for similarity relations #144

Merged
claudio.atzori merged 3 commits from dedup_whitelist into beta 3 years ago
Collaborator

Implementation of a new Job for the Scan WF (de-duplication).
The job takes the whitelist file path to add whitelisted similarity relations to the relations calculated by the dedup algorithm.
File format: source_id####target_id (1 per line)

Implementation of a new Job for the Scan WF (de-duplication). The job takes the whitelist file path to add whitelisted similarity relations to the relations calculated by the dedup algorithm. File format: source_id####target_id (1 per line)
michele.debonis added 1 commit 3 years ago
claudio.atzori added 1 commit 3 years ago
claudio.atzori added 1 commit 3 years ago
claudio.atzori merged commit 35619b93ee into beta 3 years ago
Owner

Note for updating the dnet workflow: the only parameter we need to introduce is the whiteListPath pointing to the HDFS location of the whitelist file.

Note for updating the dnet workflow: the only parameter we need to introduce is the `whiteListPath` pointing to the HDFS location of the whitelist file.
The pull request has been merged as 35619b93ee.
You can also view command line instructions.

Step 1:

From your project repository, check out a new branch and test the changes.
git checkout -b dedup_whitelist beta
git pull origin dedup_whitelist

Step 2:

Merge the changes and update on Gitea.
git checkout beta
git merge --no-ff dedup_whitelist
git push origin beta
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#144
Loading…
There is no content yet.