From 6b3533d29abe050479090a8af5871af320118efd Mon Sep 17 00:00:00 2001 From: Giambattista Bloisi Date: Mon, 13 May 2024 14:41:27 +0200 Subject: [PATCH] =?UTF-8?q?Describe=20the=20usage=20of=20the=20pivot=20tab?= =?UTF-8?q?le=20to=20improve=20stability=20of=20=E2=80=9Crepresentative=20?= =?UTF-8?q?records=E2=80=9D=20and=20how=20=E2=80=9Cnon=20authoritative?= =?UTF-8?q?=E2=80=9D=20PIDs=20are=20used=20to=20generate=20=E2=80=9Crepres?= =?UTF-8?q?entative=20records=E2=80=9D?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../deduplication/research-products.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/graph-production-workflow/deduplication/research-products.md b/docs/graph-production-workflow/deduplication/research-products.md index 5942fb6..8cb1aef 100644 --- a/docs/graph-production-workflow/deduplication/research-products.md +++ b/docs/graph-production-workflow/deduplication/research-products.md @@ -163,18 +163,19 @@ are included in the group, while the remaining elements are kept ungrouped. #### Selection of the pivot record Each group of duplicate records needs to be identified in the final graph with -an OpenAIRE identifier, derived from a record of the group known as the pivot -record. The pivot record is determined after sorting by the following criteria: +an OpenAIRE identifier, derived from a record of the group known as the _pivot +record_. It is determined after sorting the group of duplicate records by the +following criteria: 1. Records previously chosen as pivot records in the graph's previous generations. -2. Records with identifiers from a "PID authority". +2. Records with identifiers from a [PID authority](/data-model/pids-and-identifiers#pid-authorities). 3. Publications from CrossRef or datasets from DataCite. 4. Records with an earlier date of acceptance. 5. Records with smaller IDs in lexicographical order. The first sorting criterion is possible because a state table, called "pivot -history," is maintained across graph generations. It keeps track of which +history", is maintained across graph generations. It keeps track of which records were used as pivot records in what graph, guaranteed to retain data for the last 12 months.