Describe the usage of the pivot table to improve stability of “representative records” and how “non authoritative” PIDs are used to generate “representative records”

2024-05-13 14:41:27 +02:00 · 2024-05-13 14:41:27 +02:00 · 6b3533d29a
parent 6bb810a606
commit 6b3533d29a
1 changed files with 5 additions and 4 deletions
--- a/docs/graph-production-workflow/deduplication/research-products.md
+++ b/docs/graph-production-workflow/deduplication/research-products.md
@ -163,18 +163,19 @@ are included in the group, while the remaining elements are kept ungrouped.
 #### Selection of the pivot record

 Each group of duplicate records needs to be identified in the final graph with
-an OpenAIRE identifier, derived from a record of the group known as the pivot
-record. The pivot record is determined after sorting by the following criteria:
+an OpenAIRE identifier, derived from a record of the group known as the _pivot
+record_. It is determined after sorting the group of duplicate records by the 
+following criteria:

 1. Records previously chosen as pivot records in the graph's previous
   generations.
-2. Records with identifiers from a "PID authority".
+2. Records with identifiers from a [PID authority](/data-model/pids-and-identifiers#pid-authorities).
 3. Publications from CrossRef or datasets from DataCite.
 4. Records with an earlier date of acceptance.
 5. Records with smaller IDs in lexicographical order.

 The first sorting criterion is possible because a state table, called "pivot
-history," is maintained across graph generations. It keeps track of which
+history", is maintained across graph generations. It keeps track of which
 records were used as pivot records in what graph, guaranteed to retain data for
 the last 12 months.