Add placeholders for info to be updated regarding pids per entity && hasAuthorInstitution detailed page

This commit is contained in:
Serafeim Chatzopoulos 2023-04-03 18:49:15 +03:00
parent 93e14c1754
commit a7c4daa10f
3 changed files with 60 additions and 18 deletions

View File

@ -10,18 +10,10 @@ Not only, but the mappings applied to the original contents may also change and
One of the fronts regards the attribution of the identity to the objects populating the graph. The basic idea is to build the identifiers of the objects in the graph from the PIDs available in some authoritative sources while considering all the other sources as by definition “unstable”. Examples of authoritative sources are Crossref and DataCite. Examples of non-authoritative ones are institutional repositories, aggregators, etc. PIDs from the authoritative sources would form the stable OpenAIRE ID skeleton of the Graph, precisely because they are immutable by construction.
Such a policy defines a list of data sources that are considered authoritative for a specific type of PID they provide, whose effect is twofold:
* OpenAIRE IDs depend on persistent IDs when they are provided by the authority responsible to create them;
* OpenAIRE IDs depend on persistent IDs when they are provided by the authority responsible to create them
* PIDs are included in the graph according to a tight criterion: the PID Types declared in the table below are considered to be mapped as PIDs only when they are collected from the relative PID authority data source.
| PID Type | Authority |
|-----------|-----------------------------------------------------------------------------------------------------|
| doi | [Crossref](https://www.crossref.org), [Datacite](https://datacite.org) |
| pmc, pmid | [Europe PubMed Central](https://europepmc.org/), [PubMed Central](https://www.ncbi.nlm.nih.gov/pmc) |
| arXiv | [arXiv.org e-Print Archive](https://arxiv.org/) |
| uniprot | [Protein Data Bank](http://www.pdb.org/) |
| ena | [Protein Data Bank](http://www.pdb.org/) |
| pdb | [Protein Data Bank](http://www.pdb.org/) |
<span className="todo">[PID authorities table was removed from here]</span>
There is an exception though: Handle(s) are minted by several repositories; as listing them all would not be a viable option, to avoid losing them as PIDs, Handles bypass the PID authority filtering rule.
In all other cases, PIDs are be included in the graph as alternate Identifiers.
@ -35,10 +27,7 @@ assigns PIDs to their scientific products from a given PID minter.
This "selection" can be performed when the entities in the graph sharing the same identifier are grouped together. The list of the delegated authorities currently includes
| Datasource delegated | Datasource delegating | Pid Type |
|--------------------------------------|----------------------------------|-----------|
| [Zenodo](https://zenodo.org) | [Datacite](https://datacite.org) | doi |
| [RoHub](https://reliance.rohub.org/) | [W3ID](https://w3id.org/) | w3id |
<span className="todo">[deletated authorities table was removed from here]</span>
## Identifiers in the Graph
@ -47,7 +36,8 @@ OpenAIRE assigns internal identifiers for each object it collects.
By default, the internal identifier is generated as `sourcePrefix::md5(localId)` where:
* `sourcePrefix` is a namespace prefix of 12 chars assigned to the data source at registration time
* `localid` is the identifier assigned to the object by the data source
* `localId` is the identifier assigned to the object by the data source
<span className="todo">[so, the openaire id of objects with no pid is based on this local id; is this always available?]</span>
After years of operation, we can say that:
@ -66,6 +56,33 @@ When the record is collected from a source which is not authoritative for any ty
Currently, the following data sources are used as "PID authorities":
<span className="todo">[PID authorities table was removed from here]</span>
OpenAIRE also perform duplicate identification (see the [dedicated section for details](/graph-production-workflow/deduplication)).
All duplicates are **merged** together in a **representative record** which must be assigned a dedicated OpenAIRE identifier (i.e. it cannot have the identifier of one of the aggregated record).
## PID authorities per entity
### Result
| PID Type | Authority |
|-----------|-----------------------------------------------------------------------------------------------------|
| doi | [Crossref](https://www.crossref.org), [Datacite](https://datacite.org) |
| pmc, pmid | [Europe PubMed Central](https://europepmc.org/), [PubMed Central](https://www.ncbi.nlm.nih.gov/pmc) |
| arXiv | [arXiv.org e-Print Archive](https://arxiv.org/) |
| uniprot | [Protein Data Bank](http://www.pdb.org/) |
| ena | [Protein Data Bank](http://www.pdb.org/) |
| pdb | [Protein Data Bank](http://www.pdb.org/) |
| Datasource delegated | Datasource delegating | Pid Type |
|--------------------------------------|----------------------------------|-----------|
| [Zenodo](https://zenodo.org) | [Datacite](https://datacite.org) | doi |
| [RoHub](https://reliance.rohub.org/) | [W3ID](https://w3id.org/) | w3id |
| PID Type | Prefix (12 chars) | Authority |
|-----------|------------------------|-------------------------------------------|
| doi | `doi_________` | Crossref, Datacite, Zenodo |
@ -77,5 +94,15 @@ Currently, the following data sources are used as "PID authorities":
| pdb | `pdb_________` | EMBL-EBI |
| uniprot | `uniprot_____` | EMBL-EBI |
OpenAIRE also perform duplicate identification (see the [dedicated section for details](/graph-production-workflow/deduplication)).
All duplicates are **merged** together in a **representative record** which must be assigned a dedicated OpenAIRE identifier (i.e. it cannot have the identifier of one of the aggregated record).
### Data source
### Organization
<div className="todo">* how we use OpenOrgs?</div>
<div className="todo">* explain what is "pending" in the openaire id of some organizations</div>
### Project
### Community

View File

@ -152,7 +152,7 @@ Note: the labels used to specify the semantic of the relationships are (for the
| 19 | [Result](entities/result) | [Result](entities/result) | IsPreviousVersionOf / IsNewVersionOf | Harvested |
| 20 | [Result](entities/result) | [Result](entities/result) | IsContinuedBy / Continues | Harvested |
| 21 | [Result](entities/result) | [Result](entities/result) | IsDescribedBy / Describes | Harvested |
| 22 | [Result](entities/result) | [Organization](entities/organization) | hasAuthorInstitution / isAuthorInstitutionOf | Harvested, Inferred by OpenAIRE |
| 22 | [Result](entities/result) | [Organization](entities/organization) | hasAuthorInstitution / isAuthorInstitutionOf | Harvested, Inferred by OpenAIRE [(more)](relationships/hasAuthorInstitution) |
| 23 | [Result](entities/result) | [Data source](entities/data-source) | isHostedBy / hosts | Harvested, Inferred by OpenAIRE |
| 24 | [Result](entities/result) | [Data source](entities/data-source) | isProvidedBy / provides | Harvested |
| 25 | [Result](entities/result) | [Community](entities/community) | IsRelatedTo / IsRelatedTo | Harvested, Inferred by OpenAIRE, Linked by user |

View File

@ -0,0 +1,15 @@
# hasAuthorInstitution
#### Inverse relationship type: `isAuthorInstitutionOf`
This relationship connects [Results](/data-model/entities/result) with the affiliated [Organizations](/data-model/entities/organization) for their authors.
<span className="todo">
Specifically, we collect those relations from the following data sources:
[TODO: add more details and enrich the following list]
</span>
* MAG
* Institutional repositories
Last but to least, the final graph is also enriched with `hasAuthorInstitution` relationships through the Propagation process; you can find more details
[here](/graph-production-workflow/deduction-and-propagation/propagation).