deduplication workflow to consider pre-existing (dis)equality relationships #32
Labels
No Label
bug
duplicate
enhancement
help wanted
invalid
question
RDGraph
RSAC
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: D-Net/dnet-hadoop#32
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The current deduplication workflow implementation doesn't take into account the possibility to exploit pre-existing relationships provided to the graph from stateful subsystems.
The use case built around the OpenOrgs DB is one of those and will provide:
I created a graph that contains:
The graph is available in /tmp/graph_openorgs_and_corda.
I have also updated the dnet:pid_types vocabulary, adding the terms: ROR and GRID.
@michele.debonis In the tests performed some months ago, the final results of the dedup process was a tsv with this fields:
In addition to this tsv, you should produce an other tsv containing the corda organizations that have not been related with an openOrgs organization.