diff --git a/docs/graph-production-workflow/deduplication/research-products.md b/docs/graph-production-workflow/deduplication/research-products.md index 5942fb6..8cb1aef 100644 --- a/docs/graph-production-workflow/deduplication/research-products.md +++ b/docs/graph-production-workflow/deduplication/research-products.md @@ -163,18 +163,19 @@ are included in the group, while the remaining elements are kept ungrouped. #### Selection of the pivot record Each group of duplicate records needs to be identified in the final graph with -an OpenAIRE identifier, derived from a record of the group known as the pivot -record. The pivot record is determined after sorting by the following criteria: +an OpenAIRE identifier, derived from a record of the group known as the _pivot +record_. It is determined after sorting the group of duplicate records by the +following criteria: 1. Records previously chosen as pivot records in the graph's previous generations. -2. Records with identifiers from a "PID authority". +2. Records with identifiers from a [PID authority](/data-model/pids-and-identifiers#pid-authorities). 3. Publications from CrossRef or datasets from DataCite. 4. Records with an earlier date of acceptance. 5. Records with smaller IDs in lexicographical order. The first sorting criterion is possible because a state table, called "pivot -history," is maintained across graph generations. It keeps track of which +history", is maintained across graph generations. It keeps track of which records were used as pivot records in what graph, guaranteed to retain data for the last 12 months.