Update documentation to describe dedup profile v4 #70

Merged
claudio.atzori merged 9 commits from dedup_v4 into main 2024-02-21 10:55:52 +01:00
1 changed files with 3 additions and 2 deletions
Showing only changes of commit cc17acb259 - Show all commits

View File

@ -53,7 +53,8 @@ entity types:
etc.), divided by 10, and (iii) a string obtained as an alternation of the etc.), divided by 10, and (iii) a string obtained as an alternation of the
function prefix(3) and suffix(3) (and vice versa) on the first 3 words (2 function prefix(3) and suffix(3) (and vice versa) on the first 3 words (2
words if the title only has 2). words if the title only has 2).
<br>For example, a product composed by 197 authors and <br />
For example, a product composed by 197 authors and
titled ``Search for the Standard Model Higgs Boson`` titled ``Search for the Standard Model Higgs Boson``
becomes the two keys ``21-0-seaardmod`` and ``21-0-rchstadel`` becomes the two keys ``21-0-seaardmod`` and ``21-0-rchstadel``
@ -67,7 +68,7 @@ Local information about matching records is kept and possibly used to prune
unneeded comparisons, for example once it is known that A equals to both B and unneeded comparisons, for example once it is known that A equals to both B and
C, B will not be compared against C because the A,B,C group will be anyway C, B will not be compared against C because the A,B,C group will be anyway
discovered by the global transitive closure step later. discovered by the global transitive closure step later.
<br> <br />
A different decision tree is adopted depending on the type of the entity being A different decision tree is adopted depending on the type of the entity being
processed. processed.
Similarity relations drawn in this stage will be consequently used to perform Similarity relations drawn in this stage will be consequently used to perform