forked from D-Net/openaire-graph-docs
added merge by id description
This commit is contained in:
parent
6e56aa1a4d
commit
099a500e88
|
@ -1,3 +1,28 @@
|
|||
# Merge by id
|
||||
|
||||
<span className="todo">TODO</span>
|
||||
In the metadata aggregation system it is common to find the same record provided by
|
||||
different datasources and, sometimes, even inside the same datasource (especially in
|
||||
case of aggregators). As the harmonisation processes are performed per datasource
|
||||
contents, the relative records are the output of different mapping implementations.
|
||||
This approach has the advantage to be deeply customisable to catch datasource specific
|
||||
aspects, but it leaves room for inconsistencies when evaluating the different mappings
|
||||
across the various datasources.
|
||||
|
||||
This phase is therefore responsible to compensate for such inconsistencies and performs
|
||||
a global grouping of every record available in the graph:
|
||||
|
||||
- entities are grouped by [`id`](../data-model/entities/result#id)
|
||||
- relations are grouped by [`source`, `target`, `reltype`](../data-model/relationships#the-relationship-object)
|
||||
|
||||
This ensures that the same record, possibly assigned to different types by different
|
||||
mappings, appears only once in the graph and under a single typing. In case of clashing
|
||||
identifiers, the properties are merged (including the provencance information), considering
|
||||
the following precedence order for the result typing:
|
||||
|
||||
```
|
||||
publication > dataset > software > other
|
||||
```
|
||||
|
||||
The same holds for relationships, as the same (e.g.) DOI-to-DOI citation relation could
|
||||
be aggregated from multiple sources, this grouping phase would collapse all the different
|
||||
duplicates onto a single relation that would however include all the individual provenances.
|
||||
|
|
Loading…
Reference in New Issue