From 0884de87bfed66c3d501e63aaf64cd3e9da87b4f Mon Sep 17 00:00:00 2001 From: Claudio Atzori Date: Wed, 24 Jul 2024 15:27:54 +0200 Subject: [PATCH] WIP: ORCID content acquisition --- .../non-compatible-sources/orcid.md | 48 ++++++++++++++++--- 1 file changed, 41 insertions(+), 7 deletions(-) diff --git a/docs/graph-production-workflow/aggregation/non-compatible-sources/orcid.md b/docs/graph-production-workflow/aggregation/non-compatible-sources/orcid.md index c6a8f25..0a10acb 100644 --- a/docs/graph-production-workflow/aggregation/non-compatible-sources/orcid.md +++ b/docs/graph-production-workflow/aggregation/non-compatible-sources/orcid.md @@ -1,17 +1,17 @@ # Open Researcher and Contributor ID (ORCID) + ORCID (Open Researcher and Contributor ID) is a non-profit organization that provides a unique identifier for researchers. ORCID iDs are used to connect researchers with their contributions, such as publications, grants, and affiliations. -This document describes how to collect ORCID data from the ORCID datasource. +This document describes how OpenAIRE collects information about the researcher profiles and their works from the ORCID. + ## Data acquisition -### Full ORCID Dump +The ORCID full dataset can be downloaded publicly from [Figshare](https://orcid.figshare.com/) and are described on the [ORCID website](https://support.orcid.org/hc/en-us/articles/360006897394-How-do-I-get-the-public-data-file). +These datasets represented the initial import, whereas to keep up with the updates in the data a scheduled process retrieves the delta regularly. -The ORCID dump can be downloaded from the ORCID website https://support.orcid.org/hc/en-us/articles/360006897394-How-do-I-get-the-public-data-file. -The ORCID dump consists in different compressed files that needs to be extracted. -This compressed file contains information on researchers in XML format. Once extracted, they will be parsed to populate the three tables described below. +The ORCID dataset consists in different compressed files containing information about researchers in XML format. Once uncompressed, the information extracted from the XML records was used to populate the three tables described below. -### Incremental Updates -ORCID provides an API to get incremental updates,the parsed incremental data can be used to update the three tables with the latest changes. +ORCID provides an API to get incremental updates, the parsed incremental data can be used to update the three tables with the latest changes. ### OpenAIRE ORCID Data model @@ -19,7 +19,41 @@ ORCID provides an API to get incremental updates,the parsed incremental data can - **Employments**: This table contains information about the employments of ORCID authors, including their ORCID ID, organization, start date, end date, and ROAR ID. - **Works**: This table contains information about the works of ORCID authors, including te paper PID and ORCID ID. +**Authors** +| Column name | Type | +|----------------------|----------------------------------------------| +| `biography` | `string` | +| `creditName` | `string` | +| `familyName` | `string` | +| `givenName` | `string` | +| `orcid` | `string` | +| `otherNames` | `array[string]` | +| `otherPids` | `array[struct[schema:string, value:string]]` | +| `visibility` | `string` | +| `lastModifiedDate` | `string` | + + +**Employments** + +| Column name | Type | +|------------------|---------------------------------------| +| `affiliationId` | `struct[schema:string, value:string]` | +| `departmentName` | `string` | +| `endDate` | `string` | +| `orcid` | `string` | +| `roleTitle` | `string` | +| `startDate` | `string` | + +**Works** + +| Column name | Type | +|-------------|----------------------------------------------| +| `orcid` | `string` | +| `pids` | `array[struct[schema:string, value:string]]` | +| `title` | `string` | + +For a more extensive description of the different fields and the schema of the record model please refer to the [ORCID project on GitHub](https://github.com/ORCID/orcid-model). ## Process