Describe the usage of the pivot table to improve stability of “representative records” and how “non authoritative” PIDs are used to generate “representative records”

This commit is contained in:
Giambattista Bloisi 2024-05-13 14:41:27 +02:00
parent 6bb810a606
commit 6b3533d29a
1 changed files with 5 additions and 4 deletions

View File

@ -163,18 +163,19 @@ are included in the group, while the remaining elements are kept ungrouped.
#### Selection of the pivot record
Each group of duplicate records needs to be identified in the final graph with
an OpenAIRE identifier, derived from a record of the group known as the pivot
record. The pivot record is determined after sorting by the following criteria:
an OpenAIRE identifier, derived from a record of the group known as the _pivot
record_. It is determined after sorting the group of duplicate records by the
following criteria:
1. Records previously chosen as pivot records in the graph's previous
generations.
2. Records with identifiers from a "PID authority".
2. Records with identifiers from a [PID authority](/data-model/pids-and-identifiers#pid-authorities).
3. Publications from CrossRef or datasets from DataCite.
4. Records with an earlier date of acceptance.
5. Records with smaller IDs in lexicographical order.
The first sorting criterion is possible because a state table, called "pivot
history," is maintained across graph generations. It keeps track of which
history", is maintained across graph generations. It keeps track of which
records were used as pivot records in what graph, guaranteed to retain data for
the last 12 months.