Optimizations for the Openorgs Dedup: normalization and inference of strings and implementation of new general-purpose comparators #455

Merged
claudio.atzori merged 1 commits from openorgs_optimization into beta 2024-07-17 10:25:21 +02:00

This PR implements various changes in the Openorgs Deduplication:

  • inference of attributes starting from a field (e.g. country inference when UNKNOWN, city and keyword inference)
  • normalization moved from the comparator to the pre-processing to save computation time
  • addition of new dedup flags to drive the inference and the normalization
  • update of the configuration for the openorgs dedup
This PR implements various changes in the Openorgs Deduplication: - inference of attributes starting from a field (e.g. country inference when UNKNOWN, city and keyword inference) - normalization moved from the comparator to the pre-processing to save computation time - addition of new dedup flags to drive the inference and the normalization - update of the configuration for the openorgs dedup
michele.debonis added 1 commit 2024-07-10 09:46:11 +02:00
claudio.atzori merged commit 6665976604 into beta 2024-07-17 10:25:21 +02:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#455
No description provided.