WIP: added ORCID enrichment alternative
This commit is contained in:
parent
f0d9b74ba5
commit
f187b1aafb
|
@ -58,6 +58,6 @@ For a more extensive description of the different fields and the schema of the r
|
|||
## Process
|
||||
|
||||
The information obtained by ORCID is used to enrich the Graph, in particular to add the author identifiers to the results not providing one.
|
||||
This process is described in the [enrichment by PID](/graph-production-workflow/enrichment-by-pid/orcid-enrichment) section.
|
||||
This process is described in the [enrichment by PID](../../enrichment-by-pid/orcid-enrichment) section.
|
||||
|
||||
|
||||
|
|
|
@ -0,0 +1,55 @@
|
|||
## Enrichment from ORCID
|
||||
|
||||
OpenAIRE enhances publication metadata by incorporating author information from ORCID. This involves adding persistent identifiers to authors and leveraging ORCID data to improve author disambiguation.
|
||||
|
||||
### Enrichment Process
|
||||
|
||||
The following steps outline how ORCID information is integrated into the OpenAIRE Graph:
|
||||
|
||||
#### Extracting Author and Work Information
|
||||
|
||||
1. **Data Collection:** OpenAIRE extracts the following from ORCID profiles:
|
||||
* Author information: ORCID, family name, given name, other names, credit name
|
||||
* Work information: Persistent identifiers (DOI, PMC, PMID, arXiv, handle)
|
||||
|
||||
2. **ORCID-Work Pair Creation:** For each work identified by a persistent identifier (PID), an ORCID-Work pair is created. For example:
|
||||
* `<orcid1, doi1>`
|
||||
* `<orcid1, pmc1>`
|
||||
|
||||
#### Grouping by Work Persistent Identifier
|
||||
|
||||
ORCID-Work pairs are grouped by the work's persistent identifier to identify multiple authors contributing to the same work. This results in structures like:
|
||||
* `<doi1, [orcid1, orcid2]>`
|
||||
|
||||
**Note:**
|
||||
* `orcidx`: ORCID identifier with associated author name information.
|
||||
* `doix`: Persistent identifier schema and value (e.g., `<"doi", "10....">`).
|
||||
|
||||
#### Matching with Graph and Enriching Author Metadata
|
||||
|
||||
1. **Graph Search:** For each ORCID-Work pair, OpenAIRE searches the Graph for a corresponding result based on the persistent identifier.
|
||||
2. **Author Matching:** Potential authors within the graph result are compared to ORCID profile authors using an *author name disambiguation* algorithm.
|
||||
3. **Metadata Enrichment:** Successful matches enrich the graph's author information with the ORCID identifier.
|
||||
|
||||
#### Author Name Disambiguation Algorithm
|
||||
|
||||
The algorithm compares authors from the graph and ORCID profiles for the same persistent identifier. It employs the following matching strategies in decreasing order of confidence:
|
||||
|
||||
1. **Exact Full Name Match:** Matches full names (given name + family name) directly.
|
||||
2. **Exact Reversed Full Name Match:** Matches full names with reversed order (family name + given name).
|
||||
3. **Ordered Token Match:** Compares author names tokenized into individual words, considering word order and allowing for variations (e.g., abbreviations).
|
||||
4. **Exact Credit Name Match:** Matches the graph author's full name with the ORCID author's credit name.
|
||||
5. **Exact Other Names Match:** Matches the graph author's full name with ORCID author's other names.
|
||||
|
||||
Upon finding a match, the graph author's information is enriched with ORCID data, and the matched ORCID author is removed from the comparison list. This process continues until no more matches are found.
|
||||
|
||||
**Example:**
|
||||
|
||||
Consider the following author lists:
|
||||
|
||||
* **Graph List:** Robert Stein, Sjoert van Velzen, Marek Kowalski, ...
|
||||
* **ORCID List:** Marek Kowalski, Itai Sfaradi, James Carl Miller-Jones, ...
|
||||
|
||||
The algorithm applies matching strategies sequentially, starting with exact full name matches and progressing to ordered token matching. For instance, "Marek Kowalski" would be matched using the exact full name strategy.
|
||||
|
||||
**By combining these approaches, OpenAIRE improves the accuracy of author identification and linking.**
|
Loading…
Reference in New Issue