diff --git a/docs/changelog.md b/docs/changelog.md
index 8635e3b..44b9875 100644
--- a/docs/changelog.md
+++ b/docs/changelog.md
@@ -18,6 +18,41 @@ In our case, given a version `MAJOR.MINOR.PATCH`, we increment the:
This section documents all notable changes for each graph version.
---
+### v9.0.0
+_Start Date: 2024-10-03 • Release Date: 2024-10-23 • Dataset release: **no**_
+
+#### Added
+
+- ~2.5% increase (+6.7Mi) in the number of research products
+- ~6.35% increase (+11.9Mi) in the number of affiliations
+- ~7.3% increase (+311K) in the number of funded research products
+- Import of SDG classifications without a DOI
+- Introduced plugins for collecting research results from the OSF preprints server and the UKRI registry
+
+#### Changed
+
+- Updated Crossref publications to include contents until Aug 2024 and updated mapping so that
+ - records with a relationship "is-review-of" are mapped as publication of type "Review".
+ - force the hostedby of Crossref records with DOI prefix `10.3410` and `10.12703` to the H1 Connect data source.
+- Updated ORCID contents until Sept 2024
+- Updated Datacite contents until Sept 2024
+- Improvements in the comparators used in the organization deduplication.
+- Changed the selection criteria for the pivot record of a group so that by best pid type becomes the first criteria, as consequence pivots will converge to records having DOI pid.
+- Community tags added to all the entity types.
+
+### v8.0.1
+_Start Date: 2024-08-09 • Release Date: 2024-09-12 • Dataset release: **no**_
+
+#### Added
+
+- Introduced mapping of affiliations from publisher websites
+
+#### Changed
+
+- Updated Crossref publications to include contents until June 2024
+- Updated ORCID contents until July 2024
+- Updated Datacite contents until July 2024
+- Include only FoS L1..L2 in the record serialization
### v8.0.0
_Start Date: 2024-07-03 • Release Date: 2024-07-15 • Dataset release: **yes**_
diff --git a/versioned_docs/version-8.0.1/apis/_category_.json b/versioned_docs/version-8.0.1/apis/_category_.json
new file mode 100644
index 0000000..36617e4
--- /dev/null
+++ b/versioned_docs/version-8.0.1/apis/_category_.json
@@ -0,0 +1,8 @@
+{
+ "label": "Public API",
+ "position": 4,
+ "link": {
+ "type": "doc",
+ "id": "api"
+ }
+}
\ No newline at end of file
diff --git a/versioned_docs/version-8.0.1/apis/authentication.md b/versioned_docs/version-8.0.1/apis/authentication.md
new file mode 100644
index 0000000..9051b5e
--- /dev/null
+++ b/versioned_docs/version-8.0.1/apis/authentication.md
@@ -0,0 +1,308 @@
+# Guide for authenticated requests
+
+The OpenAIRE APIs can be accessed over HTTPS both by authenticated and non authenticated requests.
+You can use authenticated requests to increase the rate limit of your requests (please refer [here](./terms#authentication--limits) for the current API rate limits).
+There are 2 main modes that you can use to authenticate API requests:
+
+* [Personal access tokens](#personal-access-token)
+* [Registered services](#registered-services)
+
+
+In the following, we elaborate on these modes.
+
+## Personal access token
+
+To access the OpenAIRE APIs with better rate limits you can use your personal access token. To have access to the following functionalities you need to login to OpenAIRE. In case you are not already a member you will need to register first and provide your [Personal information](https://develop.openaire.eu/personal-info).
+
+:::info New!
+The registration process has been updated! In order to visit the Personal Token and Registered Services functionalities you need to fill in the Personal Information form available [here](https://develop.openaire.eu/personal-info). This update will not affect the operation of your existing services. However, if you want to register a new service or access/modify an existing one, you will need to provide your personal information first.
+:::
+
+### How to create your personal access token
+
+To create your personal access token go to [your personal access token page](https://develop.openaire.eu/personal-token) and copy it!
+
+:::info
+Your access token is valid for an hour.
+:::
+
+:::caution
+Do not share your personal access token. Send your personal access token over HTTPS.
+:::
+
+### How to use your personal access token
+
+To access the OpenAIRE APIs send your personal access token using the Authorization header.
+```js
+GET https://api.openaire.eu/{resourceServicePath}
+Authorization: Bearer {ACCESS_TOKEN}
+```
+
+### An hour is not enough? What to do.
+
+To prolong your access to our APIs you can use a **refresh token** that allows you to programmatically issue a new access token.
+
+To get your refresh tokeng go to [your personal access token page](https://develop.openaire.eu/personal-token) and click the **"Get a refresh token"** button to get your refresh token.
+
+OpenAIRE refresh token expires after 1 month.
+
+In case you already have a refresh token a new one will be issued and the old one will no longer be valid.
+
+Please copy your refresh token and store it confidentially. You will not be able to retrieve it. Do not share your refresh token. Send your refresh token over HTTPS.
+
+Since the OpenAIRE refresh token expires after one month, when a client gets a refresh token, this token must be stored securely to keep it from being used by potential attackers. If a refresh token is leaked, it may be used to obtain new access tokens and access protected resources until a new one is issued or it expires.
+
+To get a personal access token using your refresh token you need to make the following request:
+```js
+GET https://services.openaire.eu/uoa-user-management/api/users/getAccessToken?refreshToken={your_refresh_token}
+```
+
+The response has the following format:
+```json
+{
+ "access_token": "...",
+ "token_type": "Bearer",
+ "refresh_token": "...",
+ "expires_in": "...",
+ "scope": "...",
+ "id_token": "..."
+}
+```
+
+## Registered services
+
+If you have a service (client) that you want to interact with the OpenAIRE APIs you need to register it.
+
+:::info
+You can register up to 5 services.
+:::
+
+We offer two ways of authenticting your service: the Basic Authentication and the Advanced Authentication.
+
+### Which one is for me?
+
+| | How | Client Credential Issuer | Authentication Method |
+| --- | --- | --- | --- |
+| **Basic** | Client ID & Client Secret | OpenAIRE AAI server | Client Secret (Basic) |
+| **Advanced** | Private Key signed JWT | Service owner | Private Key JWT Client Authentication |
+
+For the **Basic Authentication** method the OpenAIRE AAI server generates a pair of _Client ID_ and _Client Secret_ credentials for your service upon its registration. The service sends the client id and client secret when authenticating to the OpenAIRE AAI Server to obtain the access token for the OpenAIRE APIs. The OpenAIRE AAI server checks whether the client id and client secret sent is valid. [Continue reading for the Basic Authentication](#basic-service-authentication-and-registration).
+
+For the **Advanced Authentication** method your service does not send a client secret but it uses a _self signed client assertion_ to authenticate to the OpenAIRE AAI server in order to obtain the access token for the OpenAIRE APIs. The client assertion is a JWT that must be signed with RSASSA using SHA-256 hash algorithm. The OpenAIRE AAI server validates the client assertion using the public key that you have provided upon the service registration. [Continue reading for the Advanced Authentication](#advanced-service-authentication-and-registration).
+
+:::info
+The Advanced Authentication method allows the OpenAIRE AAI server to verify that the client authentication request at the token endpoint was signed by your service and not altered in any way. This is more computation intensive compared to the Basic Authentication but it ensures non-repudiation. On the other hand, the Basic Authentication is more lightweight and easy to deploy but it does not provide signature verification, and there is always a possibility of the Client ID/secret credentials being stolen. Note that tThe Advanced authentication method gives a higher level of security to the process as long as it is used correctly, i.e. when the signed JWT has a short duration. When the duration of the JWT is long, the process is no different from the basic one.
+:::
+
+### Basic service authentication and registration
+
+To have access to the following functionalities you need to login to OpenAIRE. In case you are not already a member you will need to register first and provide your [Personal information](https://develop.openaire.eu/personal-info).
+
+:::info New!
+The registration process has been updated! In order to visit the Personal Token and Registered Services functionalities you need to fill in the Personal Information form available [here](https://develop.openaire.eu/personal-info). This update will not affect the operation of your existing services. However, if you want to register a new service or access/modify an existing one, you will need to provide your personal information first.
+:::
+
+For the **Basic Authentication** method the OpenAIRE AAI server generates a pair of _Client ID_ and _Client Secret_ for your service upon its registration. The service uses the client id and client secret to obtain the access token for the OpenAIRE APIs. The OpenAIRE AAI server checks whether the client id and client secret sent is valid.
+
+#### How to register your service
+
+To register your service you need to:
+
+1. Go to your [Registered Services](https://develop.openaire.eu/apis) page and click the **\+ New Service** button.
+2. Provide the mandatory information for your service.
+3. Select the **Basic** Security level.
+4. Click the **Create** button.
+
+Once your service is created, the _Client ID_ and _Client Secret_ will appear on your screen. Click "OK" and your new service will be appear in the list of your [Registered Services](https://develop.openaire.eu/apis) page.
+
+#### How to make a request
+
+##### Step 1. Request for an access token
+
+To make an access token request use the _Client ID_ and _Client Secret_ of your service.
+```js
+curl -u {CLIENT_ID}:{CLIENT_SECRET} \
+-X POST 'https://aai.openaire.eu/oidc/token' \
+-d 'grant_type=client_credentials'
+```
+
+where **{CLIENT_ID}** and **{CLIENT_SECRET}** are the _Client ID_ and _Client Secret_ assigned to your service upon registration.
+
+The response is:
+```json
+{
+ "access_token": ...,
+ "token_type": "Bearer",
+ "expires_in": ...
+}
+```
+
+Store the access token confidentially on the service side.
+
+##### Step 2. Make a request
+
+To access the OpenAIRE APIs send the access token returned in **Step 1**.
+```js
+GET https://api.openaire.eu/{resourceServicePath}
+Authorization: Bearer {ACCESS_TOKEN}
+```
+
+### Advanced service authentication and registration
+
+
+To have access to the following functionalities you need to login to OpenAIRE. In case you are not already a member you will need to register first and provide your [Personal information](https://develop.openaire.eu/personal-info).
+
+:::info New!
+The registration process has been updated! In order to visit the Personal Token and Registered Services functionalities you need to fill in the Personal Information form available [here](https://develop.openaire.eu/personal-info). This update will not affect the operation of your existing services. However, if you want to register a new service or access/modify an existing one, you will need to provide your personal information first.
+:::
+
+For the **Advanced Authentication** method your service does not send a client secret but it uses a _self signed client assertion_ to obtain the access token for the OpenAIRE APIs. The client assertion is a JWT that must be signed with RSASSA using SHA-256 hash algorithm. The OpenAIRE AAI server validates the client assertion using the public key that you have provided upon the service registration.
+
+#### Prepare to register your service
+
+Before you register your service you need to prepare a pair of a private key and a public key on your side.
+
+:::info
+We accept keys signed with RSASSA using SHA-256 hash algorithm.
+:::
+
+To create the key pair you have the following options:
+
+* Use OpenAIRE authorization server built in tool. You can access the service here: [https://aai.openaire.eu/oidc/generate-oidc-keystore](https://aai.openaire.eu/oidc/generate-oidc-keystore).
+ The response is your **Public and Private Keypair** and has the following format:
+ ```json
+ {
+ "p" : ...,
+ "kty" : "RSA",
+ "q" : ...,
+ "d" : ...,
+ "e" : "AQAB",
+ "kid" : ...,
+ "qi" : ...,
+ "dp" : ...,
+ "alg" : "RS256",
+ "dq" : ...,
+ "n" : ....
+ }
+ ```
+
+ Use the public key parameters (kty, e, kid, alg, n) to create your **Public Key** in the following format:
+ ```json
+ {
+ "kty": "RSA",
+ "e": "AQAB",
+ "kid": ...,
+ "alg": "RS256",
+ "n": ...
+ }
+ ```
+
+:::info
+Store both the **Public and Private keypair** and the **Public key**. You will need them to register your service.
+:::
+
+:::caution
+Store the **Public and Private keypair** confidentially on the service side.
+:::
+
+* Use openssl and then convert the keys to jwk format using PEM to JWK scripts, such as [https://github.com/danedmunds/pem-to-jwk](https://github.com/danedmunds/pem-to-jwk). Alternatively, the client application can read the key pair in PEM format and then convert them, using JWK libraries. Use the public key parameters (kty, e, kid, alg, n) to the service registration.
+
+:::info
+You can also provide a public key in JWK format that can be accessed using a link.
+:::
+
+#### How to register your service
+
+To register your service you need to:
+
+1. Go to your [Registered Services](https://develop.openaire.eu/apis) page and click the **\+ New Service** button.
+2. Provide the mandatory information for your service.
+3. Select the **Advanced** Security level.
+4. Use the public key parameters (kty, e, kid, alg, n) you previously produced to declare your **"Public Key"** **"By value"** in the following format:
+ ```json
+ {
+ "kty": "RSA",
+ "e": "AQAB",
+ "kid": ...,
+ "alg": "RS256",
+ "n": ...
+ }
+ ```
+ **\- OR -**
+
+ If your service has a public key in JWK format that can be accessed using a link, you can set **“Public Key”** to **“By URL”**.
+
+5. Click the **Create** button.
+
+Once your service is created it will appear in the list of your [Registered Services](https://develop.openaire.eu/apis) page, with the **Service Id** that was automatically assigned to it by the AAI OpenAIRE service.
+
+#### How to make a request
+
+##### Step 1. Create and sign a JWT
+
+Your service must create and sign a JWT and include it in the request to token endpoint as described in the [OpenID Connect Core 1.0, 9. Client Authentication](https://openid.net/specs/openid-connect-core-1_0.html#ClientAuthentication).
+
+To create a JWT you can use [https://mkjose.org/](https://mkjose.org/). To do so you need to create a **payload** that should contain the following claims:
+
+```json
+{
+ "iss": "{SERVICE_ID}",
+ "sub": "{SERVICE_ID}",
+ "aud": "https://aai.openaire.eu/oidc/token",
+ "jti": "{RANDOM_STRING}",
+ "exp": {EXPIRATION_TIME_OF_SIGNED_JWT}
+}
+```
+
+* **iss**, _(required)_ the “issuer” claim identifies the principal that issued the JWT. The value is the **Service Id** that was created when you registered your service.
+* **sub**, _(required)_ the “subject” claim identifies the principal that is the subject of the JWT. The value is the **Service Id** that was created when you registered your service.
+* aud, _(required)_ the “audience” claim identifies the recipients that the JWT is intended for. The value is **https://aai.openaire.eu/oidc/token**>.
+* **jti**, _(required)_ The “JWT ID” claim provides a unique identifier for the JWT. The value is a random string.
+* **exp**, _(required)_ the “expiration time” claim identifies the expiration time on or after which the JWT **MUST NOT** be accepted for processing. The value is a timestamp in **epoch format**.
+
+Fill in the payload in the form available at [https://mkjose.org/](https://mkjose.org/), select the Signing Algorithm to be **RS256 using SHA-256** and paste the **Public and Private Keypair** previously created.
+
+To check your JWT you can go to [https://jwt.io/](https://jwt.io/). The **header** should contain the following claims:
+```json
+{
+ "alg": "RS256",
+ "kid": ...
+}
+```
+
+where **kid** is the one of your **Public and Private Keypair** you used to sign the JWT in **Step 1**.
+
+:::caution
+Store the signed key confidentially on the service side. You will need it in Step 2.
+:::
+
+##### Step 2. Request for an access token
+
+To make an access token request use the _signed JWT_ that you created in **Step 1**. The OpenAIRE AAI server will check if the signed JWT is valid using the public key that you declared in the **"How to register your service"** process.
+```js
+ curl -k -X POST "https://aai.openaire.eu/oidc/token" \
+ -d "grant_type=client_credentials" \
+ -d "client_assertion_type=urn:ietf:params:oauth:client-assertion-type:jwt-bearer" \
+ -d "client_assertion={signedJWT}"
+```
+where **{signedJWT}** is the signed JWT created in **Step 1**.
+
+The response is:
+```json
+{
+ "access_token": {ACCESS_TOKEN}
+ "token_type":"Bearer",
+ "expires_in": ...,
+ "scope":"openid"
+}
+```
+
+Store the access token confidentially on the service side.
+
+##### Step 3. Make a request
+
+To access the OpenAIRE APIs send the access token returned in **Step 2**.
+```js
+ GET https://test.openaire.eu/{resourceServicePath}
+ Authorization: Bearer {ACCESS_TOKEN}
+```
\ No newline at end of file
diff --git a/versioned_docs/version-8.0.1/apis/broker-api.md b/versioned_docs/version-8.0.1/apis/broker-api.md
new file mode 100644
index 0000000..20c9589
--- /dev/null
+++ b/versioned_docs/version-8.0.1/apis/broker-api.md
@@ -0,0 +1,50 @@
+# Broker API
+
+
+## Introduction
+
+The Broker Service is available to use via the OpenAIRE Content Provider Dashboard. Thanks to the Broker, repositories, publishers or aggregators can exchange metadata and enrich their local metadata collection by subscribing to notifications of different types. The Broker is able to notify providers when the OpenAIRE Graph contains information that is not available in the original collection of the data source. In particular, the data source manager can subscribe via the [Content Provider Dashboard](https://provide.openaire.eu) and be notified about:
+
+* Additional PIDs of its publications (e.g. DOIs)
+* Links to projects
+* ORCID that can be associated to an author of datasource publications
+* Links to Open Access versions
+* Additional classification subjects (e.g. subjects from standard schemes like ACM, JEL and DDC)
+* Abstracts identified in duplicate publications
+* Missing publication dates
+
+All Repository managers approaching the Content Provider Dashboard will be offered the possibility to preview a set of enrichments relative to their repository that OpenAIRE can derive from the Graph. More specifically, enrichments will be organized into categories named topics and representing the different types of enrichments OpenAIRE can build. For each topic the preview consists of 100 “enrichment events”, a subset of all the possible enrichments pertinent to a given repository in the OpenAIRE Graph, that the user can explore by applying filters on different criteria and the total number of events that can be potentially built is highlighted in the UI. Repository managers can create subscriptions for specific topics and that include the filtering criteria they used to analyze the enrichments preview, or can subscribe to all the available topics with no restrictions at once. Once the repository manager creates a subscription, the algorithm analyzing the OpenAIRE Graph will produce the full set of enrichments for the manager's repository, possibly far beyond the 100 enrichments available in the preview. The enrichments will be made available as notifications in a dedicated section in the Content Provider Dashboard UI to be further checked as well as through the broker service API for programmatic access. Notifications will be sent to subscribers every time the OpenAIRE Graph will be updated and analyzed to derive the enrichments.
+
+## Usage Example
+
+The following commands indicate how the broker API documented at [api.openaire.eu/broker](https://api.openaire.eu/broker/swagger-ui/index.html) can be used to access the set of enrichments:
+
+1. Get the list of subscriptions for a given subscriber, e.g.
+
+ ```js
+ curl -X GET --header 'Accept: application/json' 'https://api.openaire.eu/broker/subscriptions?email=[subscriber_email]'
+ ```
+
+2. Extract the subscription ID and use it to access the 1st page of enrichment notification records
+
+ ```js
+ curl -X GET --header 'Accept: application/json' 'https://api.openaire.eu/broker/scroll/notifications/bySubscriptionId/[sub-1234]'
+ ```
+
+3. Extract the scroll ID from the response to request subsequent pages
+
+ ```js
+ curl -X GET --header 'Accept: application/json' 'https://api.openaire.eu/broker/scroll/notifications/[scroll_id]'
+ ```
+
+To simplify accessing the enrichment notification records, please check the OpenAIRE broker cmdline client available on [GitHub](https://github.com/openaire/broker-cmdline-client).
+
+## Terms of Use and SLA
+
+APIs are free-to-use (no sign-up needed) by any third-party service
+
+**Metadata license is CC-BY**: the metadata records retuned by the service can be freely re-used by commercial and non-commercial partners under CC-BY license, hence as long as OpenAIRE is acknowledged as data source.
+
+**Quality of Service**: all API services are running in production 24/7 within the OpenAIRE infrastructure premises deployed at the [data center](http://icm.edu.pl/en/centre-of-technology/) facilities of the [Interdisciplinary Centre for Mathematical and Computational Modelling](http://icm.edu.pl/en/) (ICM).
+
+**APIs rate limits**: please check [here](./authentication).
\ No newline at end of file
diff --git a/versioned_docs/version-8.0.1/apis/dspace-eprints-api.md b/versioned_docs/version-8.0.1/apis/dspace-eprints-api.md
new file mode 100644
index 0000000..93f15f8
--- /dev/null
+++ b/versioned_docs/version-8.0.1/apis/dspace-eprints-api.md
@@ -0,0 +1,61 @@
+# Dspace & EPrints API
+
+
+The APIs offer custom access to metadata about projects funded by a selection of international funders for the **DSpace** and **EPrints** platforms. The currently supported funders and relative codes are:
+
+* **FP7:** The 7th Framework Programme funded by the European Commission
+* **H2020:** Horizon2020 Programme funded by the European Commission
+* **HE:** Horizon Europe Programme funded by the European Commission
+* **AKA:** Academy of Finland
+* **ARC:** Australian Research Council
+* **FWF:** Austrian Science Foundation
+* **CHISTERA:** CHIST-ERA
+* **CIHR:** Canadian Institutes of Health Research
+* **HRZZ:** Croatian Science Foundation
+* **EEA:** European Environemnt Agency
+* **ANR:** French National Research Agency
+* **FCT:** The funding programme of Fundação para a Ciência e a Tecnologia, the national funding agency of Portugal
+* **MESTD:** The Ministry of Education, Science and Technological Development of Serbia
+* **MZOS:** Ministry of Science, Education and Sports of the Republic of Croatia
+* **NHMRC:** Australian National Health and Medical Research Council
+* **NIH:** US National Institutes of Health
+* **NSF:** US National Science Foundation
+* **NSERC:** Natural Sciences and Engineering Research Council of Canada
+* **NWO:** The Netherlands Organisation for Scientific Research
+* **SFI:** Science Foundation Ireland
+* **SSHRC:** Social Sciences and Humanities Research Council
+* **SNSF:** Swiss National Science Foundation
+* **TARA:** Tara Expeditions Foundation
+* **TUBITAK:** The National funder of Turkey
+* **UKRI:** United Kingdom Research and Innovation
+* **WT:** Wellcome Trust
+
+## DSpace/ePrints
+
+DSpace endpoint: http://api.openaire.eu/projects/dspace/$fundingStream/ALL/ALL
+
+ePrints endpoint: http://api.openaire.eu/projects/eprints/$fundingStream/ALL/ALL
+
+The URLs embed the parameters needed to collect projects funded by specific funding stream, where the pattern is FundingStream/FundingSubStream/FundingSubSubStream.
+Additional parameters can be concatenated to the URL to refine the results by date (date must be in the form `YYYY-MM-DD`):
+
+* startFrom
+* startUntil
+* endFrom
+* endUntil
+
+## Examples
+
+Get Wellcome Trust projects for EPrints: [http://api.openaire.eu/projects/eprints/WT/ALL/ALL](http://api.openaire.eu/projects/eprints/WT/ALL/ALL)
+Get EC-FP7 projects of the specific programme “SP2-IDEAS” for EPrints: [http://api.openaire.eu/projects/eprints/FP7/SP2/ALL](http://api.openaire.eu/projects/eprints/FP7/SP2/ALL)
+Get EC-FP7 projects for DSpace that started after the given date: [http://api.openaire.eu/projects/dspace/FP7/ALL/ALL?startFrom=2011-01-01](http://api.openaire.eu/projects/dspace/FP7/ALL/ALL?startFrom=2011-01-01).
+
+## Terms of Use and SLA
+
+APIs are free-to-use (no sign-up needed) by any third-party service.
+
+**Metadata license is CC-BY**: the metadata records retuned by the service can be freely re-used by commercial and non-commercial partners under CC-BY license, hence as long as OpenAIRE is acknowledged as data source.
+
+**Quality of Service**: all API services are running in production 24/7 within the OpenAIRE infrastructure premises deployed at the [data center](http://icm.edu.pl/en/centre-of-technology/) facilities of the [Interdisciplinary Centre for Mathematical and Computational Modelling](http://icm.edu.pl/en/) (ICM).
+
+**APIs rate limits**: please check [here](./authentication).
\ No newline at end of file
diff --git a/versioned_docs/version-8.0.1/apis/graph-api/getting-a-single-entity.md b/versioned_docs/version-8.0.1/apis/graph-api/getting-a-single-entity.md
new file mode 100644
index 0000000..432bb68
--- /dev/null
+++ b/versioned_docs/version-8.0.1/apis/graph-api/getting-a-single-entity.md
@@ -0,0 +1,50 @@
+# Getting a single entity
+
+This is a guide on how to retrieve detailed information on a single entity using the OpenAIRE Graph API.
+
+## Endpoints
+Currently, the Graph API supports the following entity types:
+
+- Research products - endpoint: `GET /researchProducts/{id}`
+- Organizations - endpoint: `GET /organizations/{id}`
+- Data sources - endpoint: `GET /dataSources/{id}`
+- Projects - endpoint: `GET /projects/{id}`
+
+You can retrieve the data of a single entity by providing the entity's OpenAIRE identifier (id) in the corresponding endpoint.
+The OpenAIRE id is the primary key of an entity in the OpenAIRE Graph.
+
+:::note
+Note that if you want to retrieve multiple entities based on their OpenAIRE ids, you can use the [search endpoints and filter](./searching-entities/filtering-search-results.md#or-operator) by the `id` field using `OR`.
+:::
+
+## Response
+The response of the Graph API is a [Research product](../../data-model/entities/research-product.md), [Organization](../../data-model/entities/organization.md), [Data Source](../../data-model/entities/data-source.md), or [Project](../../data-model/entities/project.md), depending on the endpoint used.
+
+## Example
+
+In order to retrieve the research product with OpenAIRE id: `doi_dedup___::2b3cb7130c506d1c3a05e9160b2c4108`,
+you have to perform the following API call:
+
+[https://api-beta.openaire.eu/graph/researchProducts/doi_dedup___::a55b42c0d32a4a24cf99e621623d110e](https://api-beta.openaire.eu/graph/researchProducts/doi_dedup___::a55b42c0d32a4a24cf99e621623d110e)
+
+This will return all the data of the research product with the provided identifier:
+
+```json
+{
+ id: "doi_dedup___::a55b42c0d32a4a24cf99e621623d110e",
+ mainTitle: "OpenAIRE Graph Dataset",
+ description: [
+ "The OpenAIRE Graph is exported as several dataseta, so you can download the parts you are interested into. publication_[part].tar: metadata records about research literature (includes types of publications listed here)
dataset_[part].tar: metadata records about research data (includes the subtypes listed here)
software.tar: metadata records about research software (includes the subtypes listed here)
otherresearchproduct_[part].tar: metadata records about research products that cannot be classified as research literature, data or software (includes types of products listed here)
organization.tar: metadata records about organizations involved in the research life-cycle, such as universities, research organizations, funders.
datasource.tar: metadata records about data sources whose content is available in the OpenAIRE Graph. They include institutional and thematic repositories, journals, aggregators, funders' databases.
project.tar: metadata records about project grants.
relation_[part].tar: metadata records about relations between entities in the graph.
communities_infrastructures.tar: metadata records about research communities and research infrastructures Each file is a tar archive containing gz files, each with one json per line. Each json is compliant to the schema available at http://doi.org/10.5281/zenodo.8238874. The documentation for the model is available at https://graph.openaire.eu/docs/data-model/ Learn more about the OpenAIRE Graph at https://graph.openaire.eu. Discover the graph's content on OpenAIRE EXPLORE and our API for developers."
+ ],
+ type: "dataset",
+ publicationDate: "2023-08-08",
+ publisher: "Zenodo",
+ id: [
+ {
+ scheme: "Digital Object Identifier",
+ value: "10.5281/zenodo.8217359"
+ }
+ ],
+ // for brevity, the rest of the fields are omitted
+}
+```
\ No newline at end of file
diff --git a/versioned_docs/version-8.0.1/apis/graph-api/graph-api.md b/versioned_docs/version-8.0.1/apis/graph-api/graph-api.md
new file mode 100644
index 0000000..ebcef6e
--- /dev/null
+++ b/versioned_docs/version-8.0.1/apis/graph-api/graph-api.md
@@ -0,0 +1,32 @@
+# Graph API beta
+
+
+The OpenAIRE Graph API provides a comprehensive way for developers to explore the [OpenAIRE Graph](https://graph.openaire.eu/), a vast interconnected dataset that aggregates metadata from a wide range of scholarly resources.
+The Graph API offers endpoints for accessing and querying this interconnected dataset, enabling users to retrieve detailed information on research products, data sources, organizations, and projects.
+
+## Base URL and Swagger documentation
+
+The base URL of the Graph API is:
+```
+https://api-beta.openaire.eu/graph/
+```
+
+You can access the API Swagger documentation in [https://api-beta.openaire.eu/graph/swagger-ui/index.html#/](https://api-beta.openaire.eu/graph/swagger-ui/index.html#/).
+
+## Notes
+Please note that the Graph API:
+
+- is intended for data discovery and exploration, hence you are now allowed to navigate the full result set: you are limited to the first 10,000 results of a search query. If you are interested to access the whole graph, we encourage you to download the [OpenAIRE full Graph dataset](../../downloads/full-graph.md).
+
+- adhers to the [terms of use](../terms.md) of the OpenAIRE public APIs - certain (rate limit) restrictions apply.
+
+## Learn more
+
+Please use the following links to learn more about the Graph API:
+
+- [Getting a single entity](./getting-a-single-entity.md) - Retrieve detailed information on a single entity.
+- [Searching entities](./searching-entities/searching-entities.md) - Retrieve a list of entities based on specific search criteria.
+ - [Filtering results](./searching-entities/filtering-search-results.md) - Filter search results based on specific criteria.
+ - [Sorting results](./searching-entities/sorting-and-paging.md#sorting) - Sort search results based on specific criteria.
+ - [Paging](./searching-entities/sorting-and-paging.md#paging) - Retrieve a subset of search results.
+- [Making requests](./making-requests.md) - Learn how to make requests with different programming languages.
diff --git a/versioned_docs/version-8.0.1/apis/graph-api/making-requests.md b/versioned_docs/version-8.0.1/apis/graph-api/making-requests.md
new file mode 100644
index 0000000..7f1be3c
--- /dev/null
+++ b/versioned_docs/version-8.0.1/apis/graph-api/making-requests.md
@@ -0,0 +1,41 @@
+# Making requests
+
+This guide provides examples of how to make requests to the OpenAIRE Graph API using different programming languages.
+
+## Using `curl`
+
+```bash
+curl -X GET "https://api-beta.openaire.eu/graph/researchProducts?search=OpenAIRE%20Graph&type=publication&page=1&pageSize=10&sortBy=relevance%20DESC" -H "accept: application/json"
+```
+
+
+## Using Python (with `requests` library)
+
+```python
+import requests
+
+url = "https://api-beta.openaire.eu/graph/researchProducts"
+params = {
+ "search": "OpenAIRE Graph",
+ "type": "publication",
+ "page": 1,
+ "pageSize": 10,
+ "sortBy": "relevance DESC"
+}
+headers = {
+ "accept": "application/json"
+}
+
+response = requests.get(url, headers=headers, params=params)
+
+if response.status_code == 200:
+ data = response.json()
+ print(data)
+else:
+ print(f"Failed to retrieve data: {response.status_code}")
+
+```
+
+:::note
+Note that when using `curl` you should ensure that the URL is properly encoded, especially when using special characters or spaces in the query parameters. On the contrary, the `requests` library in Python takes care of URL encoding automatically.
+:::
\ No newline at end of file
diff --git a/versioned_docs/version-8.0.1/apis/graph-api/searching-entities/filtering-search-results.md b/versioned_docs/version-8.0.1/apis/graph-api/searching-entities/filtering-search-results.md
new file mode 100644
index 0000000..c455fb7
--- /dev/null
+++ b/versioned_docs/version-8.0.1/apis/graph-api/searching-entities/filtering-search-results.md
@@ -0,0 +1,222 @@
+# Filtering search results
+
+Filters can be used to narrow down the search results based on specific criteria.
+Filters are provided as query parameters in the request URL (see [here](./searching-entities.md) for the available search entpoints).
+
+Multiple filters can be provided in a single request; they should be formatted as follows:
+`param1=value1¶m2=value2&...¶mN=valueN`.
+
+:::note
+Filters are combined using the logical `AND` operator.
+If a filter is provided multiple times, its values are combined using the logical `OR` operator.
+For more information on how to use logical operators when searching and filtering, see [Using logical operators](#using-logical-operators).
+:::
+
+Examples:
+
+- Get all research products that contain the word `"covid"`, sorted by popularity in descending order:
+
+ [https://api-beta.openaire.eu/graph/researchProducts?search=covid&sortBy=popularity DESC](https://api-beta.openaire.eu/graph/researchProducts?search=covid&sortBy=popularity%20DESC)
+
+- Get all publications that are published after `2019-01-01`:
+
+ [https://api-beta.openaire.eu/graph/researchProducts?type=publication&fromPublicationDate=2019-01-01](https://api-beta.openaire.eu/graph/researchProducts?type=publication&fromPublicationDate=2019-01-01)
+
+- Get the organization with the ROR id `https://ror.org/0576by029`:
+
+ [https://api-beta.openaire.eu/graph/organizations?pid=https://ror.org/0576by029](https://api-beta.openaire.eu/graph/organizations?pid=https://ror.org/0576by029)
+
+## Available parameters
+
+This section provides an overview of the available parameters for each entity type.
+
+### Research products
+
+The following query parameters are available for research products:
+
+
+| **Parameter** | **Description** |
+|-------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| **search** | Search in the content of the research product. |
+| **mainTitle** | Search in the research product's main title. |
+| **description** | Search in the research product's description. |
+| **id** | The OpenAIRE id of the research product. |
+| **pid** | The persistent identifier of the research product. |
+| **originalId** | The identifier of the record at the original sources. |
+| **type** | The type of the research product. One of `publication`, `dataset`, `software`, or `other` |
+| **fromPublicationDate** | Gets the research products whose publication date is greater than or equal to the given date. A date formatted as `ΥΥΥΥ` or `YYYY-MM-DD` |
+| **toPublicationDate** | Gets the research products whose publication date is less than or equal to the given date. A date formatted as `YYYY` or `YYYY-MM-DD` |
+| **subjects** | List of subjects associated to the research product. |
+| **countryCode** | The country code for the country associated with the research product. |
+| **authorFullName** | The full name of the authors involved in producing this research product. |
+| **authorOrcid** | The ORCiD of the authors involved in producing this research product. |
+| **publisher** | The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource. |
+| **bestOpenAccessRightLabel** | The best open access rights among the research product's instances. One of `OPEN SOURCE`, `OPEN`, `EMBARGO`, `RESTRICTED`, `CLOSED`, `UNKNOWN` |
+| **influenceClass** | Citation-based indicator that reflects the overall impact of a research product. Please, choose a class among `C1`, `C2`, `C3`, `C4`, or `C5` for top 0.01%, top 0.1%, top 1%, top 10%, and average in terms of influence respectively. |
+| **impulseClass** | Citation-based indicator that reflects the initial momentum of a research product directly after its publication. Please, choose a class among `C1`, `C2`, `C3`, `C4`, or `C5` for top 0.01%, top 0.1%, top 1%, top 10%, and average in terms of impulse respectively |
+| **popularityClass** | Citation-based indicator that reflects current impact or attention of a research product. Please, choose a class among `C1`, `C2`, `C3`, `C4`, or `C5` for top 0.01%, top 0.1%, top 1%, top 10%, and average in terms of popularity respectively. |
+| **citationCountClass** | Citation-based indicator that reflects the overall impact of a research product by summing all its citations. Please, choose a class among `C1`, `C2`, `C3`, `C4`, or `C5` for top 0.01%, top 0.1%, top 1%, top 10%, and average in terms of citation count respectively. |
+| **instanceType** `[Only for publications]` | Retrieve publications of the given instance type. Check [here](http://api.openaire.eu/vocabularies/dnet:publication_resource) for all possible instance type values. |
+| **sdg** `[Only for publications]` | Retrieves publications classified with the respective Sustainable Development Goal number. Integer in the range [1, 17] |
+| **fos** `[Only for publications]` | Retrieves publications classified with a given Field of Science (FOS). A FOS classification identifier (see [here](https://explore.openaire.eu/assets/common-assets/vocabulary/fos.json) for details). |
+| **isPeerReviewed** `[Only for publications]` | Indicates whether the publications are peerReviewed or not. (Boolean) |
+| **isInDiamondJournal** `[Only for publications]` | Indicates whether the publication was published in a diamond journal or not. (Boolean) |
+| **isPubliclyFunded** `[Only for publications]` | Indicates whether the publication was publicly funded or not. (Boolean) |
+| **isGreen** `[Only for publications]` | Indicates whether the publication was published following the green open access model. (Boolean) |
+| **openAccessColor** `[Only for publications]` | Specifies the Open Access color of the publication. One of `bronze`, `gold`, or `hybrid` |
+| **relOrganizationId** | Retrieve research products connected to the organization (with OpenAIRE id). |
+| **relCommunityId** | Retrieve research products connected to the community (with OpenAIRE id). |
+| **relProjectId** | Retrieve research products connected to the project (with OpenAIRE id). |
+| **relProjectCode** | Retrieve research products connected to the project with code. |
+| **hasProjectRel** | Retrieve research products that are connected to a project. (Boolean) |
+| **relProjectFundingShortName**| Retrieve research products connected to a project that has a funder with the given short name. |
+| **relProjectFundingStreamId** | Retrieve research products connected to a project that has the given funding identifier. |
+| **relHostingDataSourceId** | Retrieve research products hosted by the data source (with OpenAIRE id). |
+| **relCollectedFromDatasourceId**| Retrieve research products collected from the data source (with OpenAIRE id). |
+| **debugQuery** | Retrieve debug information for the search query. (Boolean) |
+| **page** | Page number of the results. (Integer) |
+| **pageSize** | Number of results per page. Integer in the range [1, 100] |
+| **cursor** | Cursor-based pagination. Initial value: `cursor=*` |
+| **sortBy** | The field to set the sorting order of the results. Should be provided in the format `fieldname sortDirection`, where the `sortDirection` can be either `ASC` for ascending order or `DESC` for descending order and `fielaname` is one of `relevance`, `publicationDate`, `dateOfCollection`, `influence`, `popularity`, `citationCount`, `impulse`. Multiple sorting parameters should be comma-separated. |
+
+
+### Organizations
+
+The following query parameters are available for organizations:
+
+| **Parameter** | **Description** |
+|----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+|**search** | Search in the content of the organization. |
+|**legalName** | The legal name of the organization. |
+|**legalShortName** | The legal name of the organization in short form. |
+|**id** | The OpenAIRE id of the organization. |
+|**pid** | The persistent identifier of the organization. |
+|**countryCode** | The country code of the organization. |
+|**relCommunityId** | Retrieve organizations connected to the community (with OpenAIRE id). |
+|**relCollectedFromDatasourceId**| Retrieve organizations collected from the data source (with OpenAIRE id). |
+|**debugQuery** | Retrieve debug information for the search query. |
+|**page** | Page number of the results. |
+|**pageSize** | Number of results per page. |
+| **cursor** | Cursor-based pagination. Initial value: `cursor=*` |
+|**sortBy** | The field to set the sorting order of the results. Should be provided in the format `fieldname sortDirection`, where the `sortDirection` can be either `ASC` for ascending order or `DESC` for descending order - organizations can only be sorted by `relevance`. |
+
+
+### Data sources
+
+The following query parameters are available for data sources:
+
+| **Parameter** | **Description** |
+|----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+|**search** | Search in the content of the data source. |
+|**officialName** |The official name of the data source. |
+|**englishName** |The English name of the data source. |
+|**legalShortName** |The legal name of the organization in short form. |
+|**id** |The OpenAIRE id of the data source. |
+|**pid** |The persistent identifier of the data source. |
+|**subjects** |List of subjects associated to the datasource. |
+|**dataSourceTypeName** |The data source type; see all possible values here . |
+|**contentTypes** |Types of content in the data source, as defined by OpenDOAR. |
+|**relOrganizationId** |Retrieve data sources connected to the organization (with OpenAIRE id). |
+|**relCommunityId** |Retrieve data sources connected to the community (with OpenAIRE id). |
+|**relCollectedFromDatasourceId**|Retrieve data sources collected from the data source (with OpenAIRE id). |
+|**debugQuery** |Retrieve debug information for the search query. |
+|**page** |Page number of the results. |
+|**pageSize** |Number of results per page. |
+| **cursor** | Cursor-based pagination. Initial value: `cursor=*` |
+|**sortBy** |The field to set the sorting order of the results. Should be provided in the format `fieldname sortDirection`, where the `sortDirection` can be either `ASC` for ascending order or `DESC` for descending order - data sources can only be sorted by `relevance`.|
+
+
+### Projects
+
+The following query parameters are available for projects:
+
+| **Parameter** | **Description** |
+|----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+|**search** | Search in the content of the projects. |
+|**title** | Search in the project's title. |
+|**keywords** | The project's keywords. |
+|**id** | The OpenAIRE id of the project. |
+|**code** | The grant agreement (GA) code of the project. |
+|**acronym** | Project's acronym. |
+|**callIdentifier** | The identifier of the research call. |
+|**fundingShortName** | The short name of the funder. |
+|**fundingStreamId** | The identifier of the funding stream. |
+|**fromStartDate** | Gets the projects with start date greater than or equal to the given date. Please provide a date formatted as `YYYY` or `YYYY-MM-DD`. |
+|**toStartDate** | Gets the projects with start date less than or equal to the given date. Please provide a date formatted as `YYYY` or `YYYY-MM-DD`. |
+|**fromEndDate** | Gets the projects with end date greater than or equal to the given date. Please provide a date formatted as `YYYY` or `YYYY-MM-DD`. |
+|**toEndDate** | Gets the projects with end date less than or equal to the given date. Please provide a date formatted as `YYYY` or `YYYY-MM-DD`. |
+|**relOrganizationName** | The name or short name of the related organization. |
+|**relOrganizationId** | The organization identifier of the related organization. |
+|**relCommunityId** | Retrieve projects connected to the community (with OpenAIRE id). |
+|**relOrganizationCountryCode** | The country code of the related organizations. |
+|**relCollectedFromDatasourceId**| Retrieve projects collected from the data source (with OpenAIRE id). |
+|**debugQuery** | Retrieve debug information for the search query. |
+|**page** | Page number of the results. |
+|**pageSize** | Number of results per page. |
+| **cursor** | Cursor-based pagination. Initial value: `cursor=*` |
+|**sortBy** | The field to set the sorting order of the results. Should be provided in the format `fieldname sortDirection`, where the `sortDirection` can be either `ASC` for ascending order or `DESC` for descending order and `fielaname` is one of `relevance`, `startDate`, `endDate`. Multiple sorting parameters should be comma-separated. |
+
+
+## Using logical operators
+
+The API supports the use of logical operators `AND`, `OR`, and `NOT` to refine your search queries.
+These operators help you combine or exclude one or more values for a specific filter.
+
+
+### `AND` operator
+
+Use the `AND` operator to retrieve results that include all specified values. This narrows your search.
+
+Examples:
+
+- Get research products that contain both `"climate"` and `"change"`:
+
+ [https://api-beta.openaire.eu/graph/researchProducts?search=climate AND change](https://api-beta.openaire.eu/graph/researchProducts?search=climate%20AND%20change)
+
+- Get research products that are classified with both Fields of Study (FOS) `"03 medical and health sciences"` and `"0502 economics and business"`:
+
+ [https://api-beta.openaire.eu/graph/researchProducts?fos="03 medical and health sciences" AND "0502 economics and business"](https://api-beta.openaire.eu/graph/researchProducts?fos=%2203%20medical%20and%20health%20sciences%22%20AND%20%220502%20economics%20and%20business%22)
+
+:::note
+Note that when multiple tokens denote a single filter value, you should enclose them in double quotes, as in the FOS example above.
+:::
+### `OR` operator
+
+Use the `OR` operator to retrieve results that include any of the specified terms. This broadens your search.
+The same functionality can be achieved by providing multiple times the same query parameter or using a comma to separate the values.
+
+Examples:
+
+- Get research products with the OpenAIRE ids `doi_dedup___::2b3cb7130c506d1c3a05e9160b2c4108` or `pmid_dedup__::1591ebf0e0698ed4a99455ff2ba4adc0`:
+
+ [https://api-beta.openaire.eu/graph/researchProducts?id=r3730f562f9e::539da48b3796663b17e6166bb966e5b1 OR pmid_dedup__::1591ebf0e0698ed4a99455ff2ba4adc0](https://api-beta.openaire.eu/graph/researchProducts?id=r3730f562f9e::539da48b3796663b17e6166bb966e5b1%20OR%20pmid_dedup__::1591ebf0e0698ed4a99455ff2ba4adc0)
+
+- Get projects that are connected to organizations in the US or Greece:
+
+ [https://api-beta.openaire.eu/graph/projects?relOrganizationCountryCode=US OR GR](https://api-beta.openaire.eu/graph/projects?relOrganizationCountryCode=US%20OR%20GR)
+
+ or by using the same query parameter multiple times: [https://api-beta.openaire.eu/graph/projects?relOrganizationCountryCode=US&relOrganizationCountryCode=GR](https://api-beta.openaire.eu/graph/projects?relOrganizationCountryCode=US&relOrganizationCountryCode=GR)
+
+ or just using comma: [https://api-beta.openaire.eu/graph/projects?relOrganizationCountryCode=US,GR](https://api-beta.openaire.eu/graph/projects?relOrganizationCountryCode=US,GR)
+
+### `NOT` operator
+
+Use the `NOT` operator to exclude specific terms from your search results. This refines your search by filtering out unwanted results.
+
+Examples:
+
+- Get research products that contain `"semantic"` but not `"web"`:
+
+ [https://api-beta.openaire.eu/graph/researchProducts?search=semantic NOT web](https://api-beta.openaire.eu/graph/researchProducts?search=semantic%20NOT%20web)
+
+- Get all data sources that are not journals:
+
+ [https://api-beta.openaire.eu/graph/dataSources?dataSourceTypeName=NOT Journal](https://api-beta.openaire.eu/graph/dataSources?dataSourceTypeName=NOT%20Journal)
+
+
+:::note
+All the above operators can be combined, along with parentheses, and quotes to create more complex queries.
+For example, to get research products that contain the phrase "semantic web" but not "ontology" or "linked data":
+
+[https://api-beta.openaire.eu/graph/researchProducts?search="semantic web" AND NOT (ontology OR "linked data")](https://api-beta.openaire.eu/graph/researchProducts?search=%22semantic%20web%22%20AND%20NOT%20(ontology%20OR%20%22linked%20data%22))
+:::
\ No newline at end of file
diff --git a/versioned_docs/version-8.0.1/apis/graph-api/searching-entities/searching-entities.md b/versioned_docs/version-8.0.1/apis/graph-api/searching-entities/searching-entities.md
new file mode 100644
index 0000000..a1c3df7
--- /dev/null
+++ b/versioned_docs/version-8.0.1/apis/graph-api/searching-entities/searching-entities.md
@@ -0,0 +1,44 @@
+# Searching entities
+
+This is a guide on how to search for specific entities using the OpenAIRE Graph API.
+
+## Endpoints
+
+Currently, the Graph API supports the following entity types:
+* Research products - endpoint: [`GET /researchProducts`](https://api-beta.openaire.eu/graph/researchProducts)
+* Organizations - endpoint: [`GET /organizations`](https://api-beta.openaire.eu/graph/organizations)
+* Data sources - endpoint: [`GET /dataSources`](https://api-beta.openaire.eu/graph/dataSources)
+* Projects - endpoint: [`GET /projects`](https://api-beta.openaire.eu/graph/projects)
+
+Each of these endpoints can be used to list all entities of the corresponding type.
+Listing such entities can be more useful when using the [filtering](./filtering-search-results.md),
+[sorting](./sorting-and-paging.md#sorting), and [paging](./sorting-and-paging.md#paging) capabilities of the Graph API.
+
+## Response
+
+The response of the aforementioned endpoints is an object of the following type:
+
+```json
+{
+ header: {
+ numFound: 36818386,
+ maxScore: 1,
+ queryTime: 21,
+ page: 1,
+ pageSize: 10
+ },
+ results: [
+ ...
+ ]
+}
+```
+
+It contains a `header` object with the following fields:
+- `numFound`: the total number of entities found
+- `maxScore`: the maximum relevance score of the search results
+- `queryTime`: the time in milliseconds that the search took
+- `page`: the current page of the search results (when using basic pagination)
+- `pageSize`: the number of entities per page
+- `nextCursor`: the next page cursor (when using cursor-based pagination, see: [paging](./sorting-and-paging.md#paging)
+
+Finally, the `results` field contains an array of entities of the corresponding type (i.e., [Research product](../../../data-model/entities/research-product.md), [Organization](../../../data-model/entities/organization.md), [Data Source](../../../data-model/entities/data-source.md), or [Project](../../../data-model/entities/project.md)).
diff --git a/versioned_docs/version-8.0.1/apis/graph-api/searching-entities/sorting-and-paging.md b/versioned_docs/version-8.0.1/apis/graph-api/searching-entities/sorting-and-paging.md
new file mode 100644
index 0000000..0766ba1
--- /dev/null
+++ b/versioned_docs/version-8.0.1/apis/graph-api/searching-entities/sorting-and-paging.md
@@ -0,0 +1,87 @@
+# Sorting and paging
+
+The OpenAIRE Graph API allows you to sort and page through the results of your search queries.
+This enables you to retrieve the most relevant results and manage large result sets more effectively.
+
+## Sorting
+Sorting based on specific fields, helps to retrieve data in the preferred order.
+Sorting is achieved using the `sortBy` parameter, which specifies the field and the direction (ascending or descending) for sorting.
+
+* `sortBy`: Defines the field and the sort direction. The format should be `fieldname sortDirection`, where the `sortDirection` can be either `ASC` for ascending order or `DESC` for descending order.
+
+The field names that can be used for sorting are specific to each entity type and can be found in the `sortBy` field values of the [available paremeters](../searching-entities/filtering-search-results.md#available-parameters).
+
+Note that the default sorting is based on the `relevance` score of the search results.
+
+Examples:
+
+- Get research products published after `2020-01-01` and sort them by the publication date in descending order:
+
+ [https://api-beta.openaire.eu/graph/researchProducts?fromPublicationDate=2020-01-01&sortBy=publicationDate DESC](https://api-beta.openaire.eu/graph/researchProducts?fromPublicationDate=2020-01-01&sortBy=publicationDate%20DESC)
+
+- Get research products with the keyword `"COVID-19"` and sort them by their (citation-based) popularity:
+
+ [https://api-beta.openaire.eu/graph/researchProducts?search=COVID-19&sortBy=popularity DESC](https://api-beta.openaire.eu/graph/researchProducts?search=COVID-19&sortBy=popularity%20DESC)
+
+Note that you can combine multiple sorting conditions by separating them with a comma.
+
+Example:
+
+- Get research products with the keyword `"COVID-19"` and sort them by their publication date in ascending order and then by their popularity in descending order:
+
+ [https://api-beta.openaire.eu/graph/researchProducts?search=COVID-19&sortBy=publicationDate ASC,popularity DESC](https://api-beta.openaire.eu/graph/researchProducts?search=COVID-19&sortBy=publicationDate%20ASC,popularity%20DESC)
+
+## Paging
+
+The OpenAIRE Graph API supports basic and cursor-based pagination. In basic pagination, `page` and `pageSize` parameters are used, enabling you to specify which part of the result set to retrieve and how many results per page.
+
+### Offset-based paging
+Offset-based paging should be used to retrieve a small dataset only (up to 10000 records).
+* `page`: Specifies the page number of the results you want to retrieve. Page numbering starts from 1.
+
+* `pageSize`: Defines the number of results to be returned per page. This helps limit the amount of data returned in a single request, making it easier to process.
+
+Example:
+- Get the top 10 most influential research products that contain the phrase "knowledge graphs":
+
+ [https://api-beta.openaire.eu/graph/researchProducts?search="knowledge graphs"&page=1&pageSize=10&sortBy=influence DESC](https://api-beta.openaire.eu/graph/researchProducts?search=%22knowledge%20graphs%22&page=1&pageSize=10&sortBy=influence%20DESC)
+
+response:
+```json
+{
+ header: {
+ numFound: 36818386,
+ maxScore: 1,
+ queryTime: 21,
+ page: 1,
+ pageSize: 10
+ },
+ results: [
+ ...
+ ]
+}
+```
+
+### Cursor-based paging
+Cursor should be used when it is required to retrieve a big dataset (more than 10000 records).
+* `cursor`: Cursor-based pagination. Initial value: `cursor=*`.
+
+Example:
+- [https://api-beta.openaire.eu/graph/researchProducts?search="knowledge graphs"&pageSize=10&cursor=*&sortBy=influence DESC](https://api-beta.openaire.eu/graph/researchProducts?search=%22knowledge%20graphs%22&pageSize=10&cursor=*&sortBy=influence%20DESC)
+
+response:
+```json
+{
+ header: {
+ numFound: 36818386,
+ maxScore: 1,
+ queryTime: 21,
+ pageSize: 10,
+ nextCursor: "AoI/D2M2NGU1YjVkNTQ4Nzo6NjlmZTBmNjljYzM4YTY1MjI5YjM3ZDRmZmIyMTU1NDAIP4AAAA=="
+ },
+ results: [
+ ...
+ ]
+}
+```
+Use `nextCursor` value, to get the next page of results.
\ No newline at end of file
diff --git a/versioned_docs/version-8.0.1/apis/home.md b/versioned_docs/version-8.0.1/apis/home.md
new file mode 100644
index 0000000..24198f2
--- /dev/null
+++ b/versioned_docs/version-8.0.1/apis/home.md
@@ -0,0 +1,10 @@
+# Public APIs
+
+The OpenAIRE Graph data are accessible through various public APIs. More specifically, the following APIs are currently provided:
+* [Graph API](./graph-api/graph-api.md) - an API to explore the OpenAIRE Graph
+* [Search API](./search-api/search-api.md) - an API to search for research products and projects
+* [ScholeXplorer API](https://api.scholexplorer.openaire.eu/swagger-ui/index.html?urls.primaryName=Scholexplorer%20API%20V2.0) - an API offering dataset-publication & dataset-dataset links
+* [DSpace & EPrints API](./dspace-eprints-api.md) - an API to offer custom access to metadata for projects funded by a selection of international funders for DSpace and EPrints platforms
+* [Broker API](./broker-api.md) - an API to enrich metadata for repositories, publishers, and aggregators
+
+It is also worth mentioning that, between 2015 and 2023 a LOD API was being provided but the respective service has been discontinued. Old LOD datasets can be found on Zenodo [here](https://zenodo.org/records/4587369).
\ No newline at end of file
diff --git a/versioned_docs/version-8.0.1/apis/search-api/projects.md b/versioned_docs/version-8.0.1/apis/search-api/projects.md
new file mode 100644
index 0000000..686debf
--- /dev/null
+++ b/versioned_docs/version-8.0.1/apis/search-api/projects.md
@@ -0,0 +1,31 @@
+# Searching for projects
+
+## Endpoints
+
+For research projects: http://api.openaire.eu/search/projects
+
+## Parameters
+
+| Parameter | Option | Description |
+| --- | --- | --- |
+| page | integer | Page number of the search results. |
+| size | integer | Number of results per page. |
+| format | json \| xml \| csv \| tsv | The format of the response. The default is xml. |
+| model | openaire \| sygma | The data model of the response. Default is openaire. Model sygma is a simplified version of the openaire model. For sygma, only the xml format is available. The relative XML schema is available [here](https://www.openaire.eu/schema/sygma/oaf_sygma_v2.1.xsd). |
+| sortBy | `sortBy=field,[ascending\|descending]`; **'field'** is one of: `projectstartdate`, `projectstartyear`, `projectenddate`, `projectendyear`, `projectduration` | The sorting order of the specified field. |
+| hasECFunding | true \| false | If hasECFunding is true gets the entities funded by the EC. If hasECFunding is false gets the entities related to projects not funded by the EC. |
+| hasWTFunding | true \| false | If hasWTFunding is true gets the entities funded by Wellcome Trust. The results are the same as those obtained with `funder=wt`. If hasWTFunding is false gets the entities related to projects not funded by Wellcome Trust. |
+| funder | WT \| EC \| ARC \| ANDS \| NSF \| FCT \| NHMRC | Search for entities by funder. |
+| fundingStream | ... | Search for entities by funding stream. |
+| FP7scientificArea | ... | Search for FP7 entities by scientific area. |
+| keywords | White-space separated list of keywords. | N/A |
+| sortBy | `sortBy=field,[ascending\|descending]`; **'field'** is one of: `projectstartdate`, `projectstartyear`, `projectenddate`, `projectendyear`, `projectduration` | The sorting order of the specified field. |
+| grantID | Comma separated list of grant identifiers. | Gets the project with the given grant identifier, if any. |
+| openairePublicationID | Comma separated list of OpenAIRE identifiers. | Gets the publication with the given openaire identifier, if any. |
+| name | White-space separated list of keywords. | Gets the projects whose names contain the given list of keywords. Using double quotes `"` you get an exact match, if any. |
+| acronym | N/A | Gets the project with the given acronym, if any. |
+| callID | N/A | Search for projects by call identifier. |
+| startYear | Year formatted as `YYYY` | Gets the projects that started in the given year. |
+| endYear | Year formatted as `YYYY`. | Gets the projects that ended in the given year. |
+| participantCountries | Comma separeted list of 2 letter country codes. | Search for projects by participant countries. |
+| participantAcronyms | White space separeted list of acronyms of institutions. | Search for projects by participant institutions. |
\ No newline at end of file
diff --git a/versioned_docs/version-8.0.1/apis/search-api/research-products.md b/versioned_docs/version-8.0.1/apis/search-api/research-products.md
new file mode 100644
index 0000000..c2d7e17
--- /dev/null
+++ b/versioned_docs/version-8.0.1/apis/search-api/research-products.md
@@ -0,0 +1,98 @@
+# Searching for research products
+
+## Endpoints
+
+For research products: https://api.openaire.eu/search/researchProducts
+
+By specific type:
+* publications: https://api.openaire.eu/search/publications
+* research data: https://api.openaire.eu/search/datasets
+* research software: https://api.openaire.eu/search/software
+* other research products: https://api.openaire.eu/search/other
+
+
+## General parameters
+
+Endpoint: https://api.openaire.eu/search/researchProducts
+
+| Parameter | Option | Description |
+| --- | --- | --- |
+| page | integer | Page number of the search results. |
+| size | integer | Number of results per page. |
+| format | json \| xml \| csv \| tsv | The format of the response. The default is xml. |
+| model | openaire \| sygma | The data model of the response. Default is openaire. Model sygma is a simplified version of the openaire model. For sygma, only the xml format is available. The relative XML schema is available [here](https://www.openaire.eu/schema/sygma/oaf_sygma_v2.1.xsd). |
+| sortBy | `sortBy=field,[ascending\|descending]`
**'field'** can one of:
+ +
+ +The figure above, presents the graph's data model. +Its main entities are described in brief below: + +* [Research products](./entities/research-product) represent the outcomes (or products) of research activities. +* [Data sources](./entities/data-source) are the sources from which the metadata of graph objects are collected. +* [Organizations](./entities/organization) correspond to companies or research institutions involved in projects, +responsible for operating data sources or consisting the affiliations of Product creators. +* [Projects](./entities/project) are research project grants funded by a Funding Stream of a Funder. +* [Communities](./entities/community) are groups of people with a common research intent (e.g. research infrastructures, university alliances). +* Persons correspond to individual researchers who are involved in the design, creation or maintenance of research products. Currently, this is a non-materialized entity type in the Graph, which means that the respective metadata (and relationships) are encapsulated in the author field of the respective research products. + +:::note Further reading + +A detailed report on the OpenAIRE Graph Data Model can be found on [Zenodo](https://zenodo.org/record/2643199). +::: + diff --git a/versioned_docs/version-8.0.1/data-model/entities/_category_.json b/versioned_docs/version-8.0.1/data-model/entities/_category_.json new file mode 100644 index 0000000..8161451 --- /dev/null +++ b/versioned_docs/version-8.0.1/data-model/entities/_category_.json @@ -0,0 +1,8 @@ +{ + "label": "Entities", + "position": 1, + "link": { + "type": "generated-index", + "description": "The main entities of the OpenAIRE Graph are listed below." + } +} \ No newline at end of file diff --git a/versioned_docs/version-8.0.1/data-model/entities/community.md b/versioned_docs/version-8.0.1/data-model/entities/community.md new file mode 100644 index 0000000..bf057cf --- /dev/null +++ b/versioned_docs/version-8.0.1/data-model/entities/community.md @@ -0,0 +1,82 @@ +--- +sidebar_position: 6 +--- + +# Communities + +Research communities and research initiatives are intended as groups of people with a common research intent and can be of two types: research initiatives or research communities: + +* Research initiatives are intended to capture a view of the information space that is "research impact"-oriented, i.e. all products generated due to my research initiative; +* Research communities the latter “research activity” oriented, i.e. all products that may be of interest or related to my research initiative. + +For example, the organizations supporting a research infrastructure fall in the first category, while the researchers involved in a discipline fall in the second. + +## The `Community` object + +### id +_Type: String • Cardinality: ONE_ + +The OpenAIRE id for the community/research infrastructure, created according to the [OpenAIRE entity identifier and PID mapping policy](../pids-and-identifiers). + +```json + "id": "context_____::5b7f9fa40bdc12072249204cedfa7808" +``` + +### acronym +_Type: String • Cardinality: ONE_ + +The acronym of the community. + +```json +"acronym": "covid-19" +``` + +### description +_Type: String • Cardinality: ONE_ + +Description of the research community/research infrastructure + +```json +"description": "This portal provides access to publications, research data, projects and software that may be relevant to the Corona Virus Disease (COVID-19). The OpenAIRE COVID-19 Gateway aggregates COVID-19 related records, links them and provides a single access point for discovery and navigation. We tag content from the OpenAIRE Graph (10,000+ data sources) and additional sources. All COVID-19 related research results are linked to people, organizations and projects, providing a contextualized navigation." +``` + +### name +_Type: String • Cardinality: ONE_ + +The long name of the community. + +```json +"name": "Corona Virus Disease" +``` + +### subject +_Type: String • Cardinality: MANY_ + +The list of the subjects associated to the research community (only appies to research communities). + +```json +"subject": [ + "COVID19", + "SARS-CoV", + "HCoV-19", + ... +] +``` + +### type +_Type: String • Cardinality: ONE_ + +The type of the community; one of `{ Research Community, Research infrastructure }`. + +```json +"type": "Research Community" +``` + +### zenodoCommunity +_Type: String • Cardinality: ONE_ + +The URL of the Zenodo community associated to the Research community/Research infrastructure. + +```json +"zenodoCommunity": "https://zenodo.org/communities/covid-19" +``` diff --git a/versioned_docs/version-8.0.1/data-model/entities/data-source.md b/versioned_docs/version-8.0.1/data-model/entities/data-source.md new file mode 100644 index 0000000..e01ea7c --- /dev/null +++ b/versioned_docs/version-8.0.1/data-model/entities/data-source.md @@ -0,0 +1,294 @@ +--- +sidebar_position: 2 +--- + +# Data sources + +OpenAIRE entity instances are created out of data collected from various data sources of different kinds, such as publication repositories, research data archives, CRIS systems, funder databases, etc. Data sources export information packages (e.g., XML records, HTTP responses, RDF data, JSON) that may contain information on one or more of such entities and possibly relationships between them. + +For example, a metadata record about a project carries information for the creation of a Project entity and its participants (as Organization entities). It is important, once each piece of information is extracted from such packages and inserted into the OpenAIRE information space as an entity, for such pieces to keep provenance information relative to the originating data source. This is to give visibility to the data source, but also to enable the reconstruction of the very same piece of information if problems arise. + +--- + +## The `DataSource` object + +### id +_Type: String • Cardinality: ONE_ + +The OpenAIRE id of the data source, created according to the [OpenAIRE entity identifier and PID mapping policy](../pids-and-identifiers). + +```json +"id": "issn___print::22c514d022b199c346e7f29ca06efc95" +``` + +### originalId +_Type: String • Cardinality: MANY_ + +The list of original identifiers associated to the datasource. + +```json +"originalId": [ + "issn___print::2451-8271", + ... +] +``` + +### pid + +_Type: [ControlledField](other#controlledfield) • Cardinality: MANY_ + +The persistent identifiers for the datasource. + +```json +"pid": [ + { + "scheme": "DOI", + "value": "10.5281/zenodo.4707307" + }, + ... +] +``` + +### type +_Type: [ControlledField](other#controlledfield) • Cardinality: ONE_ + +The datasource type; see the vocabulary [dnet:datasource_typologies](https://api.openaire.eu/vocabularies/dnet:datasource_typologies). + +```json +"type": { + "scheme": "pubsrepository::journal", + "value": "Journal" +} +``` + +### openaireCompatibility +_Type: String • Cardinality: ONE_ + +The OpenAIRE compatibility of the ingested research products, indicates which guidelines they are compliant according to the vocabulary [dnet:datasourceCompatibilityLevel](https://api.openaire.eu/vocabularies/dnet:datasourceCompatibilityLevel). + +```json +"openaireCompatibility": "collected from a compatible aggregator" +``` + +### officialName +_Type: String • Cardinality: ONE_ + +The official name of the datasource. + +```json +"officialBame": "Recent Patents and Topics on Medical Imaging" +``` + +### englishName +_Type: String • Cardinality: ONE_ + +The English name of the datasource. + +```json +"englishName": "Recent Patents and Topics on Medical Imaging" +``` + +### websiteUrl +_Type: String • Cardinality: ONE_ + +The URL of the website of the datasource. + +```json +"websiteUrl": "http://dspace.unict.it/" +``` + +### logoUrl +_Type: String • Cardinality: ONE_ + +The URL of the logo for the datasource. + +```json +"logoUrl": "https://impactum-journals.uc.pt/public/journals/26/pageHeaderLogoImage_en_US.png" +``` + +### dateOfValidation +_Type: String • Cardinality: ONE_ + +The date of validation against the OpenAIRE guidelines for the datasource records. + +```json +"dateOfValidation": "2016-10-10" +``` + +### description +_Type: String • Cardinality: ONE_ + +The description for the datasource. + +```json +"description": "Recent Patents on Medical Imaging publishes review and research articles, and guest edited single-topic issues on recent patents in the field of medical imaging. It provides an important and reliable source of current information on developments in the field. The journal is essential reading for all researchers involved in Medical Imaging." +``` + +### subjects +_Type: String • Cardinality: MANY_ + +List of subjects associated to the datasource + +```json +"subjects": [ + "Medicine", + "Imaging", + ... +] +``` + +### languages +_Type: String • Cardinality: MANY_ + +The languages present in the data source's content, as defined by OpenDOAR. + +```json +"languages": [ + "eng", + ... +] +``` + +### contentTypes +_Type: String • Cardinality: MANY_ + +Types of content in the data source, as defined by OpenDOAR + +```json +"contentTypes": [ + "Journal articles", + ... +] +``` + +### releaseStartDate +_Type: String • Cardinality: ONE_ + +Releasing date of the data source, as defined by re3data.org. + +```json +"releaseStartDate": "2010-07-24" +``` + +### releaseEndDate +_Type: String • Cardinality: ONE_ + +Date when the data source went offline or stopped ingesting new research data. As defined by re3data.org + +```json +"releaseEndDate": "2016-03-28" +``` + +### accessRights +_Type: String • Cardinality: ONE_ + +Type of access to the data source, as defined by re3data.org. Possible values: `{ open, restricted, closed }`. + +```json +"accessRights": "open" +``` + +### uploadRights +_Type: String • Cardinality: ONE_ + +Type of data upload, as defined by re3data.org; one of `{ open, restricted, closed }`. + +```json +"uploadRights": "closed" +``` + +### databaseAccessRestriction +_Type: String • Cardinality: ONE_ + +Access restrictions to the research data repository. Allowed values are: `{ feeRequired, registration, other }`. + +This field only applies for re3data data source; see [re3data schema specification](https://gfzpublic.gfz-potsdam.de/rest/items/item_758898_6/component/file_775891/content) for more details. + +```json +"databaseAccessRestriction": "registration" +``` + +### dataUploadRestriction +_Type: String • Cardinality: ONE_ + +Upload restrictions applied by the datasource, as defined by re3data.org. One of `{ feeRequired, registration, other }`. + +This field only applies for re3data data source; see [re3data schema specification](https://gfzpublic.gfz-potsdam.de/rest/items/item_758898_6/component/file_775891/content) for more details. + +```json +"dataUploadRestriction": "feeRequired registration" +``` + +### versioning +_Type: Boolean • Cardinality: ONE_ + +Whether the research data repository supports versioning: +`yes` if the data source supports versioning, `no` otherwise. + +This field only applies for re3data data source; see [re3data schema specification](https://gfzpublic.gfz-potsdam.de/rest/items/item_758898_6/component/file_775891/content) for more details. + +```json +"versioning": true +``` + +### citationGuidelineUrl +_Type: String • Cardinality: ONE_ + +The URL of the data source providing information on how to cite its items. The DataCite citation format is recommended (http://www.datacite.org/whycitedata). + +This field only applies for re3data data source; see [re3data schema specification](https://gfzpublic.gfz-potsdam.de/rest/items/item_758898_6/component/file_775891/content) for more details. + +```json +"citationGuidelineUrl": "https://physionet.org/about/#citation" +``` + +### pidSystems +_Type: String • Cardinality: ONE_ + +The persistent identifier system that is used by the data source. As defined by re3data.org. + +```json +"pidSystems": "hdl" +``` + +### certificates +_Type: String • Cardinality: ONE_ + +The certificate, seal or standard the data source complies with. As defined by re3data.org. + +```json +"certificates": "WDS" +``` + +### policies +_Type: String • Cardinality: MANY_ + +Policies of the data source, as defined in OpenDOAR. + +### journal +_Type: [Container](other#container) • Cardinality: ONE_ + +Information about the journal, if this data source is of type Journal. + +```json +"container": { + "edition": "", + "iss": "5", + "issnLinking": "", + "issnOnline": "1873-7625", + "issnPrinted":"2451-8271", + "name": "Recent Patents and Topics on Imaging", + "sp": "12", + "ep": "22", + "vol": "50" +} +``` + +### missionStatementUrl +_Type: String • Cardinality: ONE_ + +The URL of a mission statement describing the designated community of the data source. As defined by re3data.org + +```json +"missionStatementUrl": "https://www.sigma2.no/content/nird-research-data-archive" +``` \ No newline at end of file diff --git a/versioned_docs/version-8.0.1/data-model/entities/organization.md b/versioned_docs/version-8.0.1/data-model/entities/organization.md new file mode 100644 index 0000000..c0c8f6a --- /dev/null +++ b/versioned_docs/version-8.0.1/data-model/entities/organization.md @@ -0,0 +1,93 @@ +--- +sidebar_position: 3 +--- + +# Organizations + +Organizations include companies, research centers or institutions involved as project partners or as responsible of operating data sources. Information about organizations are collected from funder databases like CORDA, registries of data sources like OpenDOAR and re3Data, and CRIS systems, as being related to projects or data sources. + + +--- + +## The `Organization` object + +### id +_Type: String • Cardinality: ONE_ + +The OpenAIRE id for the organization, created according to the [OpenAIRE entity identifier and PID mapping policy](../pids-and-identifiers). + +```json +"id": "openorgs____::b84450f9864182c67b8611b5593f4250" +``` + +### legalShortName +_Type: String • Cardinality: ONE_ + +The legal name in short form of the organization. + +```json +"legalShortName": "ARC" +``` + +### legalName +_Type: String • Cardinality: ONE_ + +The legal name of the organization. + +```json +"legalName": "Athena Research and Innovation Center In Information Communication & Knowledge Technologies" +``` + +### alternativeNames +_Type: String • Cardinality: MANY_ + +Alternative names that identify the organization. + +```json +"alternativeNames": [ + "Athena Research and Innovation Center In Information Communication & Knowledge Technologies", + "Athena RIC", + "ARC", + ... +] +``` + +### websiteUrl +_Type: String • Cardinality: ONE_ + +The websiteurl of the organization. + +```json +"websiteUrl": "https://www.athena-innovation.gr/el/announce/pressreleases.html" +``` + +### country +_Type: [Country](other#country) • Cardinality: ONE_ + +The country where the organization is located. + +```json +"country":{ + "code": "GR", + "label": "Greece" +} +``` + +### pid +_Type: [OrganizationPid](other#organizationpid) • Cardinality: MANY_ + +The list of persistent identifiers for the organization. + +```json +"pid": [ + { + "scheme": "ISNI", + "value": "0000 0004 0393 5688" + }, + { + "scheme": "GRID", + "value": "grid.19843.37" + }, + ... +] +``` \ No newline at end of file diff --git a/versioned_docs/version-8.0.1/data-model/entities/other.md b/versioned_docs/version-8.0.1/data-model/entities/other.md new file mode 100644 index 0000000..f0ed18e --- /dev/null +++ b/versioned_docs/version-8.0.1/data-model/entities/other.md @@ -0,0 +1,831 @@ +--- +sidebar_position: 7 +--- + +# Other component objects + +Here, we describe other component objects that are used as part of the main graph entities. + +## AccessRight + +Subclass of [BestAccessRight](#bestaccessright), indicates information about rights held in and over the resource and the open Access Route. + +### openAccessRoute +_Type: One of `{ gold, green, hybrid, bronze }` • Cardinality: ONE_ + +Indicates the OpenAccess status. Values are set according to the [Unpaywall methodology](https://support.unpaywall.org/support/solutions/articles/44001777288-what-do-the-types-of-oa-status-green-gold-hybrid-and-bronze-mean-). + +```json +"openAccessRoute": "gold" +``` + +## AlternateIdentifier +Type used to represent the information associated to persistent identifiers associated to the research product that have not been forged by an authority for that pid type. For example we collect metadata from an institutional repository that provides as identifier for the research product also the DOI. + +### scheme +_Type: String • Cardinality: ONE_ + +Vocabulary reference. + +```json +"scheme": "doi" +``` + +### value +_Type: String • Cardinality: ONE_ + +Value from the given scheme/vocabulary. + +```json +"value": "10.1016/j.respol.2021.104226" +``` + +## APC +Indicates the money spent to make a book or article available in Open Access. Sources for this information includes the OpenAPC initiative. + +### currency +_Type: String • Cardinality: ONE_ + +The system of money in which the amount is expressed (Euro, USD, etc). + +```json +"currency": "EU" +``` + +### amount +_Type: String • Cardinality: ONE_ + +The quantity of money. + +```json +"amount": "1000" +``` + +## Author + +Represents the research product author. + +### fullName +_Type: String • Cardinality: ONE_ + +Author's full name. + +```json +"fullName": "Turunen, Heidi" +``` + +### name +_Type: String • Cardinality: ONE_ + +Author's given name. + +```json +"name": "Heidi" +``` + +### surname +_Type: String • Cardinality: ONE_ + +Author's family name. + +```json +"surname": "Turunen" +``` + +### rank +_Type: String • Cardinality: ONE_ + +Author's order in the list of authors for the given research product. + +```json +"rank": 1 +``` + +### pid +_Type: [AuthorPid](#authorpid) • Cardinality: ONE_ + +Persistent identifier associated with this author. + +```json +"pid": { + "id": { + "scheme": "orcid", + "value": "0000-0001-7169-1177" + }, + "provenance": { + "provenance": "Harvested", + "trust": "0.9" + } +} +``` + +## AuthorPid + +The author's persistent identifier. + +### id +_Type: [AuthorPidSchemaValue](#authorpidschemavalue) • Cardinality: ONE_ + +```json +"id": { + "scheme": "orcid", + "value": "0000-0001-7169-1177" +} +``` + +### provenance +_Type: [Provenance](#provenance-2) • Cardinality: ONE_ + +The reason why the pid was associated to the author. + +```json +"provenance": { + "provenance": "Inferred by OpenAIRE", + "trust": "0.85" +} +``` + +## AuthorPidSchemaValue +Type used to represent the scheme and value for the author's pid. + +### schema +_Type: String • Cardinality: ONE_ + +The author's pid scheme. OpenAIRE currently supports ORCID. + +```json +"scheme": "orcid" +``` + +### value +_Type: String • Cardinality: ONE_ + +The author's pid value in that scheme. + +```json +"value": "0000-1111-2222-3333" +``` + +## BestAccessRight +Indicates the most open access rights \*available among the research product instances. + +\* where the openness is defined by the ordering of the access right terms in the following. +``` +OPEN SOURCE > OPEN > EMBARGO (6MONTHS) > EMBARGO (12MONTHS) > RESTRICTED > CLOSED > UNKNOWN +``` + +### code +_Type: String • Cardinality: ONE_ + +COAR access mode code: http://vocabularies.coar-repositories.org/documentation/access_rights/. + +```json +"code": "c_16ec" +``` + +### label +_Type: String • Cardinality: ONE_ + +Label for the access mode. + +```json +"label": "RESTRICTED" +``` + +### scheme +_Type: String • Cardinality: ONE_ + +Scheme of reference for access right code. Currently, always set to COAR access rights vocabulary: http://vocabularies.coar-repositories.org/documentation/access_rights/. + +```json +"scheme": "http://vocabularies.coar-repositories.org/documentation/access_rights/" +``` + +## CitationImpact + +The different citation-based impact indicators as computed by [BIP!](https://bip.imsi.athenarc.gr/). + + +### indicator +_Type: String • Cardinality: ONE_ + +The name of indicator; it can be either one of: +* `influence`: it reflects the overall/total (citation-based) impact of an article in the research community at large, based on the underlying citation network (diachronically). +* `citationCount`: it is an alternative to the "Influence" indicator, which also reflects the overall/total (citation-based) impact of an article in the research community at large, based on the underlying citation network (diachronically). +* `popularity`: it reflects the "current" (citation-based) impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. +* `impulse`: it reflects the initial momentum of an article directly after its publication, based on the underlying citation network. + +For more details on how these indicators are calculated, please refer [here](/graph-production-workflow/indicators-ingestion/impact-indicators). + +```json +"citationImpact": { + "influence": 123, + "influenceClass": "C2", + "citationCount": 456, + "citationClass": "C3", + "popularity": 234, + "popularityClass": "C1", + "impulse": 987, + "impulseClass": "C3" +} +``` + +### class +_Type: String • Cardinality: ONE_ + +The impact class assigned based on the indicator score. + +To facilitate comprehension, BIP! also offers impact classes for articles, to group together those that have similar impact. The following 5 classes are provided: +* `C1`: Top 0.01% +* `C2`: Top 0.1% +* `C3`: Top 1% +* `C4`: Top 10% +* `C5`: Bottom 90% + +## Container +This field has information about the conference or journal where the research product has been presented or published. + +```json +"container": { + "name": "Research Policy", + "edition": "xyz", + "issnLinking": "0048-7333", + "issnOnline": "1873-7625", + "issnPrinted": "1377-9655", + "sp": "xyz", + "ep": "xyz", + "iss": "xyz", + "vol": "xyz" +} +``` + +```json +"container": { + "name": "Research Policy", + "conferenceDate": "2022-09-22", + "conferencePlace": "Padua, Italy" +} +``` + +### name +_Type: String • Cardinality: ONE_ + +Name of the journal or conference. + +### issnPrinted +_Type: String • Cardinality: ONE_ + +The journal printed issn. + +### issnOnline +_Type: String • Cardinality: ONE_ + +The journal online issn. + +### issnLinking +_Type: String • Cardinality: ONE_ + +The journal linking issn. + +### iss +_Type: String • Cardinality: ONE_ + +The journal issue. + +### sp +_Type: String • Cardinality: ONE_ + +The start page. + +### ep +_Type: String • Cardinality: ONE_ + +The end page. + +### vol +_Type: String • Cardinality: ONE_ + +The journal volume. + +### edition +_Type: String • Cardinality: ONE_ + +The edition of the journal or conference. + +### conferencePlace +_Type: String • Cardinality: ONE_ + +The place of the conference. + +### conferenceDate +_Type: String • Cardinality: ONE_ + +The date of the conference. + +## ControlledField + + +Generic type used to represent the information described by a scheme and a value in that scheme (i.e. pid). + +```json +{ + "scheme": "DOI", + "value": "10.5281/zenodo.4707307" +} +``` + +### scheme +_Type: String • Cardinality: ONE_ + +Vocabulary reference. + +### value +_Type: String • Cardinality: ONE_ + +Value from the given scheme/vocabulary. + +## Country +To represent the generic country code and label. + +```json +{ + "code" : "IT", + "label": "Italy" +} +``` + +### code +_Type: String • Cardinality: ONE_ + +ISO 3166-1 alpha-2 country code. + +### label +_Type: String • Cardinality: ONE_ + +The country label. + +## Funding +Funding information for a project. + +### fundingStream +_Type: [FundingStream](#fundingstream) • Cardinality: ONE_ + +Funding information for the project. + +```json +"fundingStream": { + "description": "Horizon 2020 Framework Programme - Research and Innovation action", + "id": "EC::H2020::RIA" +} +``` +### jurisdiction +_Type: String • Cardinality: ONE_ + +Geographical jurisdiction (e.g. for European Commission is EU, for Croatian Science Foundation is HR). + +```json +"jurisdiction": "EU" +``` + +### name +_Type: String • Cardinality: ONE_ + +The name of the funder. + +```json +"name": "European Commission" +``` + +### shortName +_Type: String • Cardinality: ONE_ + +The short name of the funder. + +```json +"shortName": "EC" +``` + +## FundingStream +Description of a funding stream. + +### id +_Type: String • Cardinality: ONE_ + +The identifier of the funding stream. + +```json +"id": "EC::H2020::RIA" +``` + +### description +_Type: String • Cardinality: ONE_ + +Short description of the funding stream. + +```json +"description": "Horizon 2020 Framework Programme - Research and Innovation action" +``` + +## GeoLocation +Represents the geolocation information. + +### point +_Type: String • Cardinality: ONE_ + +A point with Latitude and Longitude. + +```json +"point": "7.72486 50.1084" +``` + +### box +_Type: String • Cardinality: ONE_ + +A specified bounding box defined by two longitudes (min and max) and two latitudes (min and max). + + +```json +"box": "18.569386 54.468973 18.066832 54.83707" +``` + +### place +_Type: String • Cardinality: ONE_ + +The name of a specific place. + +```json +"place": "Tübingen, Baden-Württemberg, Southern Germany" +``` + +## Grant +The money granted to a project. + +### currency +_Type: String • Cardinality: ONE_ + +The currency of the granted amount (e.g. EUR). + +```json +"currency": "EUR" +``` + +### fundedAmount +_Type: Number • Cardinality: ONE_ + +The funded amount. + +```json +"fundedAmount": 1.0E7 +``` + +### totalCost +_Type: Number • Cardinality: ONE_ + +The total cost of the project. + +```json +"totalcost": 1.0E7 +``` + +## H2020Programme +The H2020 programme funding a project. + +### code +_Type: String • Cardinality: ONE_ + +The code of the programme. + +```json +"code": "H2020-EU.1.4.1.3." +``` + +### description +_Type: String • Cardinality: ONE_ + +The description of the programme. + +```json +"description": "Development, deployment and operation of ICT-based e-infrastructures" +``` + +## Instance +An instance is one specific materialization or version of the research product. For example, you can have one research product with three instances due to deduplication: + +* one is the pre-print +* one is the post-print +* one is the published version + +Each instance is characterized by the properties that follow. + +### accessRight +_Type: [AccessRight](#accessright) • Cardinality: ONE_ + +Maps [dc:rights](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/elements11/rights/), describes the access rights of the web resources relative to this instance. + +```json +"accessRight": { + "code": "c_abf2", + "label": "OPEN", + "openAccessRoute": "gold", + "scheme": "http://vocabularies.coar-repositories.org/documentation/access_rights/" +} +``` + +### alternateIdentifier +_Type: [AlternateIdentifier](#alternateidentifier) • Cardinality: MANY_ + +All the identifiers associated to the research product other than the authoritative ones. + +```json +"alternateIdentifier": [ + { + "scheme": "doi", + "value": "10.1016/j.respol.2021.104226" + }, + ... +] +``` + +### articleProcessingCharge +_Type: [APC](#apc) • Cardinality: ONE_ + +The money spent to make this book or article available in Open Access. Source for this information is the OpenAPC initiative. + +```json +"articleProcessingCharge": { + "currency": "EUR", + "amount": "1000" +} +``` + +### license +_Type: String • Cardinality: ONE_ + +The license URL. + +```json +"license": "http://creativecommons.org/licenses/by-nc/4.0" +``` + +### pid +_Type: [ResultPid](#resultpid) • Cardinality: MANY_ + +The set of persistent identifiers associated to this instance that have been collected from an authority for the pid type (i.e. Crossref/Datacite for doi). See the [OpenAIRE entity identifier and PID mapping policy](../pids-and-identifiers) for more information. + +```json +"pid": [ + { + "scheme": "pmc", + "value": "PMC8024784" + }, + ... +] +``` + +### publicationDate +_Type: String • Cardinality: ONE_ + +The publication date of the research product. + +```json +"publicationDate": "2009-02-12" +``` + +### refereed +_Type: String • Cardinality: ONE_ + +Describes if this instance has been peer-reviewed or not. Allowed values are peerReviewed, nonPeerReviewed, UNKNOWN (as defined in https://api.openaire.eu/vocabularies/dnet:review_levels). For example: + +* peerReviewed: https://api.openaire.eu/vocabularies/dnet:review_levels/0001 +* nonPeerReviewed: https://api.openaire.eu/vocabularies/dnet:review_levels/0002 + +based on guidelines covers the vocabularies + +* [DRIVE guidelines 2.0 - info:eu-repo/semantic](https://wiki.surfnet.nl/download/attachments/10851536/DRIVER_Guidelines_v2_Final_2008-11-13.pdf) (OpenAIRE v1.0 till v3.0 - Literature) +* [COAR Vocabulary v2.0 and v3.0](https://vocabularies.coar-repositories.org/resource_types/) (OpenAIRE v4 - Inst.+Them.) + +```json +"refereed": "UNKNOWN" +``` + +### type +_Type: String • Cardinality: ONE_ + +The specific sub-type of this instance (see https://api.openaire.eu/vocabularies/dnet:result_typologies following the links) + +```json +"type": "Article" +``` + +### url +_Type: String • Cardinality: MANY_ + +URLs to the instance. They may link to the actual full-text or to the landing page at the hosting source. + +```json +"url": [ + "https://periodicos2.uesb.br/index.php/folio/article/view/4296", + ... +] +``` + +## Indicator + +These are indicators computed for a specific OpenAIRE research product. + +Each Indicator object is composed of the following properties: + +### citationImpact +_Type: [CitationImpact](#citationImpact) • Cardinality: MANY_ + +These indicators, provided by [BIP!](https://bip.imsi.athenarc.gr/), estimate the citation-based impact of a research product. + +For details about their calculation, please refer [here](/graph-production-workflow/indicators-ingestion/impact-indicators). + +```json +"citationImpact": { + "influence": 123, + "influenceClass": "C2", + "citationCount": 456, + "citationClass": "C3", + "popularity": 234, + "popularityClass": "C1", + "impulse": 987, + "impulseClass": "C3" +} +``` + +### usageCounts +_Type: [UsageCounts](#usagecounts-1) • Cardinality: ONE_ + +These measures, computed by the [UsageCounts Service](https://usagecounts.openaire.eu/), are based on usage statistics. + +Please refer [here](/graph-production-workflow/indicators-ingestion/usage-counts) for more details. + +```json +"usageCounts": { + "downloads": "10", + "views": "20" +} +``` +## Language +Represents information for the language of the research product. + +```json +"language": { + "code": "eng", + "label": "English" +} +``` + +### code +_Type: String • Cardinality: ONE_ + +Alpha-3/ISO 639-2 code of the language. Values controlled by the [dnet:languages vocabulary](https://api.openaire.eu/vocabularies/dnet:languages). + +### label +_Type: String • Cardinality: ONE_ + +Language label in English. + +## OrganizationPid + +The schema and value for identifiers of the organization. + +```json +{ + "scheme" : "GRID", + "value" : "grid.7119.e" +} +``` + +### scheme +_Type: String • Cardinality: ONE_ + +Vocabulary reference (i.e. isni). + +### value +_Type: String • Cardinality: ONE_ + +Value from the given scheme/vocabulary (i.e. 0000000090326370). + +## Provenance +Indicates the process that produced (or provided) the information, and the trust associated to the information. + +```json +{ + "provenance" : "Harvested", + "trust": "0.9" +} +``` + +### provenance +_Type: String • Cardinality: ONE_ + +Provenance term from the vocabulary [dnet:provenanceActions](https://api.openaire.eu/vocabularies/dnet:provenanceActions). + +### trust +_Type: String • Cardinality: ONE_ + +Trust, expressed as a number in the range [0-1]. + +## ResultCountry +Indicates the country associated to the research product. +It is a subclass of [Country](#country) and extends it with provenance information. + +### provenance +_Type: [Provenance](#provenance-2) • Cardinality: ONE_ + +Indicates the reason why this country is associated to this research product. + +```json +{ + "code" : "IT", + "label": "Italy", + "provenance": { + "provenance": "inferred by OpenAIRE", + "trust": "0.85" + } +} +``` + +## ResultPid +Type used to represent the information associated to persistent identifiers for the research product that have been forged by an authority for that pid type. + + + +```json +{ + "scheme" : "doi", + "value" : "10.21511/bbs.13(3).2018.13" +} +``` + +### scheme +_Type: String • Cardinality: ONE_ + +The scheme of the persistent identifier for the research product (i.e. doi). If the pid is here it means the information for the pid has been collected from an authority for that pid type (i.e. Crossref/Datacite for doi). The set of authoritative pid is: `doi` when collected from Crossref or Datacite, `pmid` when collected from EuroPubmed, `arxiv` when collected from arXiv, `handle` from the repositories. + +### value +_Type: String • Cardinality: ONE_ + +The value expressed in the scheme (i.e. 10.1000/182). + +## Subject +Represents keywords associated to the research product. + +### subject +_Type: [SubjectSchemeValue](#subjectschemevalue) • Cardinality: ONE_ + +Contains the subject term: subject type (keyword, MeSH, etc) and the subject term (medicine, chemistry, etc.). + +```json +"subject": { + "scheme": "keyword", + "value": "SVOC", + "provenance": { + "provenance": "Harvested", + "trust": "0.9" + } +} +``` + +### scheme +_Type: String • Cardinality: ONE_ + +OpenAIRE subject classification scheme (https://api.openaire.eu/vocabularies/dnet:subject_classification_typologies). + +```json +"scheme" : "keyword" +``` + +### value +_Type: String • Cardinality: ONE_ + +The value for the subject in the selected scheme. When the scheme is 'keyword', it means that the subject is free-text (i.e. not a term from a controlled vocabulary). + +### provenance +_Type: [Provenance](#provenance-2) • Cardinality: ONE_ + +Contains provenance information for the subject term. + +## UsageCounts + +The usage counts indicator computed for this research product. + +```json +"usageCounts": { + "downloads": "10", + "views": "20" +} +``` + +### views +_Type: String • Cardinality: ONE_ + +The number of views for this research product. + +### downloads +_Type: String • Cardinality: ONE_ + +The number of downloads for this research product. diff --git a/versioned_docs/version-8.0.1/data-model/entities/project.md b/versioned_docs/version-8.0.1/data-model/entities/project.md new file mode 100644 index 0000000..5476cce --- /dev/null +++ b/versioned_docs/version-8.0.1/data-model/entities/project.md @@ -0,0 +1,171 @@ +--- +sidebar_position: 4 +--- + +# Projects + +Of crucial interest to OpenAIRE is also the identification of the funders (e.g. European Commission, WellcomeTrust, FCT Portugal, NWO The Netherlands) that co-funded the projects that have led to a given research product. Projects are characterized by a list of funding streams (e.g. FP7, H2020 for the EC), which identify the strands of fundings. Funding streams can be nested to form a tree of sub-funding streams. + +--- + +## The `Project` object + +### id +_Type: String • Cardinality: ONE_ + +Main entity identifier, created according to the [OpenAIRE entity identifier and PID mapping policy](../pids-and-identifiers). + +```json +"id": "corda__h2020::70ea22400fd890c5033cb31642c4ae68" +``` + +### code +_Type: String • Cardinality: ONE_ + +Τhe grant agreement code of the project. + +```json +"code": "777541" +``` + +### acronym +_Type: String • Cardinality: ONE_ + +Project's acronym. + +```json +"acronym": "OpenAIRE-Advance" +``` + +### title +_Type: String • Cardinality: ONE_ + +Project's title. + +```json +"title": "OpenAIRE Advancing Open Scholarship" +``` + +### callIdentifier +_Type: String • Cardinality: ONE_ + +The identifier of the research call. + +```json +"callIdentifier": "H2020-EINFRA-2017"` +``` + +### funding +_Type: [Funding](other#funding) • Cardinality: MANY_ + +Funding information for the project. + +```json +"funding": [ + { + "fundingStream": { + "description": "Horizon 2020 Framework Programme - Research and Innovation action", + "id": "EC::H2020::RIA" + }, + "jurisdiction": "EU", + "name": "European Commission", + "shortName": "EC" + } +] +``` +### granted +_Type: [Grant](other#grant) • Cardinality: ONE_ + +The money granted to the project. + +```json +"granted": { + "currency": "EUR", + "fundedAmount": 1.0E7, + "totalCost": 1.0E7 +} +``` + +### h2020programme +_Type: [H2020Programme](other#h2020programme) • Cardinality: MANY_ + +The H2020 programme funding the project. + +```json +"h2020programme":[ + { + "code": "H2020-EU.1.4.1.3.", + "description": "Development, deployment and operation of ICT-based e-infrastructures" + } +] +``` +### keywords +_Type: String • Cardinality: ONE_ + +```json +"keywords": [ + "Open Science", + ... +] +``` + +### openAccessMandateForDataset +_Type: Boolean • Cardinality: ONE_ + +```json +"openAccessMandateForDataset": true +``` + +### openAccessMandateForPublications +_Type: Boolean • Cardinality: ONE_ + +```json +"openAccessMandateForPublications": true +``` + +### startDate +_Type: String • Cardinality: ONE_ + +The start year of the project. + +```json +"startDate": "2018-01-01" +``` + +### endDate +_Type: String • Cardinality: ONE_ + +The end year pf the project. + +```json +"endDate": "2021-02-28" +``` + +### subject +_Type: String • Cardinality: MANY_ + +The subjects of the project + +```json +"subject": [ + "Data and Distributed Computing e-infrastructures for Open Science", + ... +] +``` +### summary +_Type: String • Cardinality: ONE_ + +Short summary of the project. + +```json +"summary": "OpenAIRE-Advance continues the mission of OpenAIRE to support the Open Access/Open Data mandates in Europe. By sustaining the current successful infrastructure, comprised of a human network and robust technical services, it consolidates its achievements while working to shift the momentum among its communities to Open Science, aiming to be a trusted e-Infrastructurewithin the realms of the European Open Science Cloud.In this next phase, OpenAIRE-Advance strives to empower its National Open Access Desks (NOADs) so they become a pivotal part within their own national data infrastructures, positioningOA and open science onto national agendas. The capacity building activities bring together experts ontopical task groups in thematic areas(open policies, RDM, legal issues, TDM), promoting a train the trainer approach, strengthening and expanding the pan-European Helpdesk with support and training toolkits, training resources and workshops.It examines key elements of scholarly communication, i.e., co-operative OA publishing and next generation repositories, to develop essential building blocks of the scholarly commons.On the technical level OpenAIRE-Advance focuses on the operation and maintenance of the OpenAIRE technical TRL8/9 services,and radically improvesthe OpenAIRE services on offer by: a) optimizing their performance and scalability, b) refining their functionality based on end-user feedback, c) repackagingthem into products, taking a professional marketing approach with well-defined KPIs, d)consolidating the range of services/products into a common e-Infra catalogue to enable a wider uptake.OpenAIRE-Advancesteps up its outreach activities with concrete pilots with three major RIs,citizen science initiatives, and innovators via a rigorous Open Innovation programme. Finally, viaits partnership with COAR, OpenAIRE-Advance consolidatesOpenAIRE’s global roleextending its collaborations with Latin America, US, Japan, Canada, and Africa." +``` + +### websiteUrl +_Type: String • Cardinality: ONE_ + +The website of the project + +```json +"websiteUrl": "https://www.openaire.eu/advance/" +``` diff --git a/versioned_docs/version-8.0.1/data-model/entities/research-product.md b/versioned_docs/version-8.0.1/data-model/entities/research-product.md new file mode 100644 index 0000000..28c3b27 --- /dev/null +++ b/versioned_docs/version-8.0.1/data-model/entities/research-product.md @@ -0,0 +1,527 @@ +--- +sidebar_position: 1 +--- + +# Research products + +Research products are intended as digital objects, described by metadata, resulting from a scientific process. +In this page, we descibe the properties of the `ResearchProduct` object. + +Moreover, there are the following sub-types of a `ResearchProduct`, that inherit all its properties and further extend it: +* [Publication](#publication) +* [Data](#data) +* [Software](#software) +* [Other research product](#other-research-product) + +--- + +## The `ResearchProduct` object + +### id +_Type: String • Cardinality: ONE_ + +Main entity identifier, created according to the [OpenAIRE entity identifier and PID mapping policy](../pids-and-identifiers). + +```json +"id": "doi_dedup___::80f29c8c8ba18c46c88a285b7e739dc3" +``` + +### type +_Type: String • Cardinality: ONE_ + +Type of the research products. Possible types: + +* `publication` +* `data` +* `software` +* `other` + +as declared in the terms from the [dnet:result_typologies vocabulary](https://api.openaire.eu/vocabularies/dnet:result_typologies). + +```json +"type": "publication" +``` + +### originalId +_Type: String • Cardinality: MANY_ + +Identifiers of the record at the original sources. + +```json +"originalId": [ + "oai:pubmedcentral.nih.gov:8024784", + "S0048733321000305", + "10.1016/j.respol.2021.104226", + "3136742816" +] +``` + +### mainTitle +_Type: String • Cardinality: ONE_ + +A name or title by which a research product is known. It may be the title of a publication or the name of a piece of software. + +```json +"mainTitle": "The fall of the innovation empire and its possible rise through open science" +``` + +### subTitle + +_Type: String • Cardinality: ONE_ + +Explanatory or alternative name by which a research product is known. + +```json +"subTitle": "An analysis of cases from 1980 - 2020" +``` + +### author +_Type: [Author](other#author) • Cardinality: MANY_ + +The main researchers involved in producing the data, or the authors of the publication. + +```json +"author": [ + { + "fullName": "E. Richard Gold", + "rank": 1, + "name": "Richard", + "surname": "Gold", + "pid": { + "id": { + "scheme": "orcid", + "value": "0000-0002-3789-9238" + }, + "provenance": { + "provenance": "Harvested", + "trust": "0.9" + } + } + }, + ... +] +``` +### bestAccessRight +_Type: [BestAccessRight](other#bestaccessright) • Cardinality: ONE_ + +The most open access right associated to the manifestations of this research product. + +```json +"bestAccessRight": { + "code": "c_abf2", + "label": "OPEN", + "scheme": "http://vocabularies.coar-repositories.org/documentation/access_rights/" +} +``` + +### contributor +_Type: String • Cardinality: MANY_ + +The institution or person responsible for collecting, managing, distributing, or otherwise contributing to the development of the resource. + +```json +"contributor": [ + "University of Zurich", + "Wright, Aidan G C", + "Hallquist, Michael", + ... +] +``` + +### country +_Type: [ResultCountry](other#resultcountry) • Cardinality: MANY_ + +Country associated with the research product: it is the country of the organisation that manages the institutional repository or national aggregator or CRIS system from which this record was collected. +Country of affiliations of authors can be found instead in the affiliation relation. + +```json +"country": [ + { + "code": "CH", + "label": "Switzerland", + "provenance": { + "provenance": "Inferred by OpenAIRE", + "trust": "0.85" + } + }, + ... +] +``` + +### coverage +_Type: String • Cardinality: MANY_ + +### dateOfCollection +_Type: String • Cardinality: ONE_ + +When OpenAIRE collected the record the last time. + +```json +"dateOfCollection": "2021-06-09T11:37:56.248Z" +``` + +### description +_Type: String • Cardinality: MANY_ + +A brief description of the resource and the context in which the resource was created. + +```json +"description": [ + "Open science partnerships (OSPs) are one mechanism to reverse declining efficiency. OSPs are public-private partnerships that openly share publications, data and materials.", + "There is growing concern that the innovation system's ability to create wealth and attain social benefit is declining in effectiveness. This article explores the reasons for this decline and suggests a structure, the open science partnership, as one mechanism through which to slow down or reverse this decline.", + "The article examines the empirical literature of the last century to document the decline. This literature suggests that the cost of research and innovation is increasing exponentially, that researcher productivity is declining, and, third, that these two phenomena have led to an overall flat or declining level of innovation productivity.", + ... +] +``` + +### embargoEndDate +_Type: String • Cardinality: ONE_ + +Date when the embargo ends and this research product turns Open Access. + +```json +"embargoEndDate": "2017-01-01" +``` + +### indicators +_Type: [Indicator](other#indicator-1) • Cardinality: ONE_ + +The indicators computed for this research product; +currently, the following types of indicators are supported: + +* [Citation-based impact indicators by BIP!](other#citationimpact) +* [Usage Statistics indicators](other#usagecounts) + +```json +"indicators": { + "citationImpact": { + "influence": 123, + "influenceClass": "C2", + "citationCount": 456, + "citationClass": "C3", + "popularity": 234, + "popularityClass": "C1", + "impulse": 987, + "impulseClass": "C3" + }, + "usageCounts": { + "downloads": "10", + "views": "20" + } +} +``` + +### instance +_Type: [Instance](other#instance) • Cardinality: MANY_ + +Specific materialization or version of the research product. For example, you can have one research product with three instances: one is the pre-print, one is the post-print, one is the published version. + +```json +"instance": [ + { + "accessRight": { + "code": "c_abf2", + "label": "OPEN", + "openAccessRoute": "gold", + "scheme": "http://vocabularies.coar-repositories.org/documentation/access_rights/" + }, + "alternateIdentifier": [ + { + "scheme": "doi", + "value": "10.1016/j.respol.2021.104226" + }, + ... + ], + "articleProcessingCharge": { + "amount": "4063.93", + "currency": "EUR" + }, + "license": "http://creativecommons.org/licenses/by-nc/4.0", + "pid": [ + { + "scheme": "pmc", + "value": "PMC8024784" + }, + ... + ], + + "publicationDate": "2021-01-01", + "refereed": "UNKNOWN", + "type": "Article", + "url": [ + "http://europepmc.org/articles/PMC8024784" + ] + }, + ... +] +``` + +### language +_Type: [Language](other#language) • Cardinality: ONE_ + +The alpha-3/ISO 639-2 code of the language. Values controlled by the [dnet:languages vocabulary](https://api.openaire.eu/vocabularies/dnet:languages). + +```json +"language": { + "code": "eng", + "label": "English" +} +``` +### lastUpdateTimeStamp +_Type: Long • Cardinality: ONE_ + +Timestamp of last update of the record in OpenAIRE. + +```json +"lastUpdateTimeStamp": 1652722279987 +``` + +### pid +_Type: [ResultPid](other#resultpid) • Cardinality: MANY_ + +Persistent identifiers of the research product. See also the [OpenAIRE entity identifier and PID mapping policy](../pids-and-identifiers) to learn more. + +```json +"pid": [ + { + "scheme": "pmc", + "value": "PMC8024784" + }, + { + "scheme": "doi", + "value": "10.1016/j.respol.2021.104226" + }, + ... +] +``` + +### publicationDate +_Type: String • Cardinality: ONE_ + +Main date of the research product: typically the publication or issued date. In case of a research product with different versions with different dates, the date of the research product is selected as the most frequent well-formatted date. If not available, then the most recent and complete date among those that are well-formatted. For statistics, the year is extracted and the research product is counted only among the research products of that year. Example: Pre-print date: 2019-02-03, Article date provided by repository: 2020-02, Article date provided by Crossref: 2020, OpenAIRE will set as date 2019-02-03, because it’s the most recent among the complete and well-formed dates. If then the repository updates the metadata and set a complete date (e.g. 2020-02-12), then this will be the new date for the research product because it becomes the most recent most complete date. However, if OpenAIRE then collects the pre-print from another repository with date 2019-02-03, then this will be the “winning date” because it becomes the most frequent well-formatted date. + +```json +"publicationDate": "2021-03-18" +``` + +### publisher +_Type: String • Cardinality: ONE_ + +The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource. + +```json +"publisher": "Elsevier, North-Holland Pub. Co" +``` + +### source +_Type: String • Cardinality: MANY_ + +A related resource from which the described resource is derived. See definition of Dublin Core field [dc:source](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/elements11/source). + +```json +"source": [ + "Research Policy", + "Crossref", + ... +] +``` + +### subjects +_Type: [Subject](other#subject) • Cardinality: MANY_ + +Subject, keyword, classification code, or key phrase describing the resource. + +OpenAIRE classifies research products according to the [Field of Science](../../graph-production-workflow/indicators-ingestion/fos-classification.md) +and [Sustainable Development Goals](../../graph-production-workflow/indicators-ingestion/sdg-classification.md) taxonomies. +Check out the relative sections to know more. + +```json +"subjects": [ + { + "subject": { + "scheme": "FOS", + "value": "01 natural sciences" + }, + "provenance": { + "provenance": "inferred by OpenAIRE", + "trust": "0.85" + } + }, + { + "subject": { + "scheme": "SDG", + "value": "2. Zero hunger" + }, + "provenance": { + "provenance": "inferred by OpenAIRE", + "trust": "0.83" + } + }, + { + "subject": { + "scheme": "keyword", + "value": "Open science" + }, + "provenance": { + "provenance": "Harvested", + "trust": "0.9" + } + }, + ... +] +``` + +### isGreen +_Type: Boolean • Cardinality: ONE_ + +Indicates whether or not the scientific result was published following the green open access model. + +### openAccessColor +_Type: String • Cardinality: ONE_ + + +Indicates the specific open access model used for the publication; possible value is one of `bronze, gold, hybrid`. + +### isInDiamondJournal +_Type: Boolean • Cardinality: ONE_ + +Indicates whether or not the publication was published in a diamond journal. + +### publiclyFunded +_Type: String • Cardinality: ONE_ + +Discloses whether the publication acknowledges grants from public sources. + +--- + +## Sub-types + +There are the following sub-types of `Result`. Each inherits all its fields and extends them with the following. + +### Publication + +Metadata records about research literature (includes types of publications listed [here](http://api.openaire.eu/vocabularies/dnet:result_typologies/publication)). + +#### container +_Type: [Container](other#container) • Cardinality: ONE_ + +Container has information about the conference or journal where the research product has been presented or published. + +```json +"container": { + "edition": "", + "iss": "5", + "issnLinking": "", + "issnOnline": "1873-7625", + "issnPrinted": "0048-7333", + "name": "Research Policy", + "sp": "12", + "ep": "22", + "vol": "50" +} +``` +### Data + +Metadata records about research data (includes the subtypes listed [here](http://api.openaire.eu/vocabularies/dnet:result_typologies/dataset)). + +#### size +_Type: String • Cardinality: ONE_ + +The declared size of the research data. + +```json +"size": "10129818" +``` + +#### version +_Type: String • Cardinality: ONE_ + +The version of the research data. + +```json +"version": "v1.3" +``` + +#### geolocation +_Type: [GeoLocation](other#geolocation) • Cardinality: MANY_ + +The list of geolocations associated with the research data. + +```json +"geolocation": [ + { + "box": "18.569386 54.468973 18.066832 54.83707", + "place": "Tübingen, Baden-Württemberg, Southern Germany", + "point": "7.72486 50.1084" + }, + ... +] +``` + +### Software + +Metadata records about research software (includes the subtypes listed [here](http://api.openaire.eu/vocabularies/dnet:result_typologies/software)). + +#### documentationUrl +_Type: String • Cardinality: MANY_ + +The URLs to the software documentation. + +```json +"documentationUrl": [ + "https://github.com/openaire/iis/blob/master/README.markdown", + ... +] +``` + +#### codeRepositoryUrl +_Type: String • Cardinality: ONE_ + +The URL to the repository with the source code. + +```json +"codeRepositoryUrl": "https://github.com/openaire/iis" +``` + +#### programmingLanguage +_Type: String • Cardinality: ONE_ + +The programming language. + +```json +"programmingLanguage": "Java" +``` + +### Other research product + +Metadata records about research products that cannot be classified as research literature, data or software (includes types of products listed [here](http://api.openaire.eu/vocabularies/dnet:result_typologies/other)). + +#### contactPerson +_Type: String • Cardinality: MANY_ + +Information on the person responsible for providing further information regarding the resource. + +```json +"contactPerson": [ + "Noémie Dominguez", + ... +] +``` + +#### contactGroup +_Type: String • Cardinality: MANY_ + +Information on the group responsible for providing further information regarding the resource. + +```json +"contactGroup": [ + "Networked Multimedia Information Systems (NeMIS)", + ... +] +``` + +#### tool +_Type: String • Cardinality: MANY_ + +Information about tool useful for the interpretation and/or re-use of the research product. + diff --git a/versioned_docs/version-8.0.1/data-model/pids-and-identifiers.md b/versioned_docs/version-8.0.1/data-model/pids-and-identifiers.md new file mode 100644 index 0000000..05e33ab --- /dev/null +++ b/versioned_docs/version-8.0.1/data-model/pids-and-identifiers.md @@ -0,0 +1,80 @@ +# PIDs and identifiers + +One of the challenges towards the stability of the contents in the OpenAIRE Graph consists of making its identifiers and records stable over time. +The barriers to this scenario are many, as the Graph keeps a map of data sources that is subject to constant variations: records in repositories vary in content, +original IDs, and PIDs, may disappear or reappear, and the same holds for the repository or the metadata collection it exposes. +Not only, but the mappings applied to the original contents may also change and improve over time to catch up with the changes in the input records. + +## PID Authorities + +One of the fronts regards the attribution of the identity to the objects populating the graph. The basic idea is to build the identifiers of the objects in the graph from the PIDs available in some authoritative sources while considering all the other sources as by definition “unstable”. Examples of authoritative sources are Crossref and DataCite. Examples of non-authoritative ones are institutional repositories, aggregators, etc. PIDs from the authoritative sources would form the stable OpenAIRE ID skeleton of the Graph, precisely because they are immutable by construction. + +Such a policy defines a list of data sources that are considered authoritative for a specific type of PID they provide, whose effect is twofold: +* OpenAIRE IDs depend on persistent IDs when they are provided by the authority responsible to create them; +* PIDs are included in the graph according to a tight criterion: the PID Types declared in the table below are considered to be mapped as PIDs only when they are collected from the relative PID authority data source. + +| PID Type | Authority | +|-----------|-----------------------------------------------------------------------------------------------------| +| doi | [Crossref](https://www.crossref.org), [Datacite](https://datacite.org) | +| pmc, pmid | [Europe PubMed Central](https://europepmc.org/), [PubMed Central](https://www.ncbi.nlm.nih.gov/pmc) | +| arXiv | [arXiv.org e-Print Archive](https://arxiv.org/) | +| uniprot | [Protein Data Bank](http://www.pdb.org/) | +| ena | [Protein Data Bank](http://www.pdb.org/) | +| pdb | [Protein Data Bank](http://www.pdb.org/) | + + +There is an exception though: Handle(s) are minted by several repositories; as listing them all would not be a viable option, to avoid losing them as PIDs, Handles bypass the PID authority filtering rule. +In all other cases, PIDs are included in the graph as alternate Identifiers. + +## Delegated authorities + +When a record is aggregated from multiple sources considered authoritative for minting specific PIDs, different mappings could be applied to them and, depending on the case, +this could result in inconsistencies in the attribution of the field values. +To overcome the issue, the intuition is to include such records only once in the graph. To do so, the concept of "delegated authorities" defines a list of datasources that +assigns PIDs to their scientific products from a given PID minter. + +This "selection" can be performed when the entities in the graph sharing the same identifier are grouped together. The list of the delegated authorities currently includes + +| Datasource delegated | Datasource delegating | Pid Type | +|--------------------------------------|----------------------------------|----------| +| [Zenodo](https://zenodo.org) | [Datacite](https://datacite.org) | doi | +| [RoHub](https://reliance.rohub.org/) | [W3ID](https://w3id.org/) | w3id | + + +## Identifiers in the Graph + +OpenAIRE assigns internal identifiers for each object it collects. +By default, the internal identifier is generated as `sourcePrefix::md5(localId)` where: + +* `sourcePrefix` is a namespace prefix of 12 chars assigned to the data source at registration time +* `localΙd` is the identifier assigned to the object by the data source + +After years of operation, we can say that: + +* `localId` are generally unstable +* objects can disappear from sources +* PIDs provided by sources that are not PID agencies (authoritative sources for a specific type of PID) are often wrong (e.g. pre-print with the DOI of the published version, DOIs with typos) + +Therefore, when the record is collected from an authoritative source: + +* the identity of the record is forged using the PID, like `pidTypePrefix::md5(lowercase(doi))` +* the PID is added in a `pid` element of the data model + +When the record is collected from a source which is not authoritative for any type of PID: +* the identity of the record is forged as usual using the local identifier +* the PID, if available, is added as `alternateIdentifier` + +Currently, the following data sources are used as "PID authorities": + +| PID Type | Prefix (12 chars) | Authority | +|----------|-----------------------|-----------------------------------------| +| doi | `doi_________` | Crossref, Datacite, Zenodo | +| pmc | `pmc_________` | Europe PubMed Central, PubMed Central | +| pmid | `pmid________` | Europe PubMed Central, PubMed Central | +| arXiv | `arXiv_______` | arXiv.org e-Print Archive | +| ena | `ena_________` | EMBL-EBI | +| pdb | `pdb_________` | EMBL-EBI | +| uniprot | `uniprot_____` | EMBL-EBI | + +OpenAIRE also perform duplicate identification (see the [dedicated section for details](/graph-production-workflow/deduplication)). +All duplicates are **merged** together in a **representative record** which must be assigned a [dedicated OpenAIRE identifier](/graph-production-workflow/deduplication/research-products#openaire-identifier-of-the-representative-record) (i.e. it cannot have the identifier of one of the aggregated record). diff --git a/versioned_docs/version-8.0.1/data-model/relationships/relationship-object.md b/versioned_docs/version-8.0.1/data-model/relationships/relationship-object.md new file mode 100644 index 0000000..1945717 --- /dev/null +++ b/versioned_docs/version-8.0.1/data-model/relationships/relationship-object.md @@ -0,0 +1,109 @@ +--- +title: The Relationship object +--- + +# The `Relationship` object + +A relationship in the Graph is represented with the data type presented in this page, which aims to model a directed edge between two nodes, providing information about its semantics, provenance and validation. + +### source +_Type: String • Cardinality: ONE_ + +OpenAIRE identifier of the node in the graph. + +```json +"source": "openorgs____::1cb75a3ad756e4c83e455e3e7347643b" +``` + +### sourceType +_Type: String • Cardinality: ONE_ + +Graph node type. + +```json +"sourceType": "organization" +``` + +### target +_Type: String • Cardinality: ONE_ + +OpenAIRE identifier of the node in the graph. + +```json +"target": "doajarticles::022409068174087a003647ff46070f7f" +``` + +### targetType +_Type: String • Cardinality: ONE_ + +Graph node type. + +```json +"target": "datasource" +``` + +### relType +_Type: [RelType](#the-reltype-object) • Cardinality: ONE_ + +Represent the semantics of the relationship between two nodes of the graph. + +```json +"relType": { + "name": "provides", + "type": "provision" +} +``` +### provenance +_Type: [Provenance](/data-model/entities/other#provenance-1) • Cardinality: ONE_ + +Indicates the process that produced (or provided) the information. + +```json +"provenance": { + "provenance": "Harvested", + "trust":"0.900" +} +``` + +### validated +_Type: Boolean • Cardinality: ONE_ + +Indicates weather or not the relationship was validated. + +```json +"validated": true +``` + +### validationDate +_Type: String • Cardinality: ONE_ + +Indicates the validation date of the relationship - applies only when the validated flag is set to true. + +```json +"validationDate": "2022-09-02" +``` + +--- + +## The `RelType` object + +The RelType data type models the semantic of the relationship among two nodes. + +### type +_Type: String • Cardinality: ONE_ + +The relationship category, e.g. affiliation, citation. (see [relationship types](./relationship-types)). + +```json +"name": "provides" +``` + +### name +_Type: String • Cardinality: ONE_ + +Further specifies the relationship semantic, indicating the relationship direction, e.g. Cites, isCitedBy. + +```json +"type": "provision" +``` +--- \ No newline at end of file diff --git a/versioned_docs/version-8.0.1/data-model/relationships/relationship-types.md b/versioned_docs/version-8.0.1/data-model/relationships/relationship-types.md new file mode 100644 index 0000000..cc7e135 --- /dev/null +++ b/versioned_docs/version-8.0.1/data-model/relationships/relationship-types.md @@ -0,0 +1,37 @@ +# Relationship types + +The following table lists all the possible relation semantics found in the Graph Dataset. + +Note: the labels used to specify the semantic of the relationships are (for the large) inherited from the [DataCite metadata kernel](https://schema.datacite.org/meta/kernel-4.4/doc/DataCite-MetadataKernel_v4.4.pdf), which provides a description for them. + +| # | Source entity type | Target entity type | Relation name / inverse | Provenance | +|:--:|:--------------------------------------:|:--------------------------------------:|:----------------------------------------------------------:|:-----------------------------------------------:| +| 1 | [Project](/data-model/entities/project) | [ResearchProduct](../../data-model/entities/research-product) | produces / isProducedBy | Harvested, Inferred by OpenAIRE, Linked by user | +| 2 | [Project](/data-model/entities/project) | [Organization](/data-model/entities/organization) | hasParticipant / isParticipant | Harvested | +| 3 | [Project](/data-model/entities/project) | [Community](/data-model/entities/community) | IsRelatedTo / IsRelatedTo | Linked by user | +| 4 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsAmongTopNSimilarDocuments / HasAmongTopNSimilarDocuments | Inferred by OpenAIRE | +| 5 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsSupplementTo / IsSupplementedBy | Harvested | +| 6 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsRelatedTo / IsRelatedTo | Harvested, Inferred by OpenAIRE, Linked by user | +| 7 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsPartOf / HasPart | Harvested | +| 8 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsDocumentedBy / Documents | Harvested | +| 9 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsObsoletedBy / Obsoletes | Harvested | +| 10 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsSourceOf / IsDerivedFrom | Harvested | +| 11 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsCompiledBy / Compiles | Harvested | +| 12 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsRequiredBy / Requires | Harvested | +| 13 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsCitedBy / Cites | Harvested, Inferred by OpenAIRE | +| 14 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsReferencedBy / References | Harvested | +| 15 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsReviewedBy / Reviews | Harvested | +| 16 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsOriginalFormOf / IsVariantFormOf | Harvested | +| 17 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsVersionOf / HasVersion | Harvested | +| 18 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsIdenticalTo / IsIdenticalTo | Harvested | +| 19 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsPreviousVersionOf / IsNewVersionOf | Harvested | +| 20 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsContinuedBy / Continues | Harvested | +| 21 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsDescribedBy / Describes | Harvested | +| 22 | [ResearchProduct](../../data-model/entities/research-product) | [Organization](/data-model/entities/organization) | hasAuthorInstitution / isAuthorInstitutionOf | Harvested, Inferred by OpenAIRE | +| 23 | [ResearchProduct](../../data-model/entities/research-product) | [Data source](/data-model/entities/data-source) | isHostedBy / hosts | Harvested, Inferred by OpenAIRE | +| 24 | [ResearchProduct](../../data-model/entities/research-product) | [Data source](/data-model/entities/data-source) | isProvidedBy / provides | Harvested | +| 25 | [ResearchProduct](../../data-model/entities/research-product) | [Community](/data-model/entities/community) | IsRelatedTo / IsRelatedTo | Harvested, Inferred by OpenAIRE, Linked by user | +| 26 | [Organization](/data-model/entities/organization) | [Community](/data-model/entities/community) | IsRelatedTo / IsRelatedTo | Linked by user | +| 27 | [Organization](/data-model/entities/organization) | [Organization](/data-model/entities/organization) | IsChildOf / IsParentOf | Linked by user | +| 28 | [Data source](/data-model/entities/data-source) | [Community](/data-model/entities/community) | IsRelatedTo / IsRelatedTo | Linked by user | +| 29 | [Data source](/data-model/entities/data-source) | [Organization](/data-model/entities/organization) | isProvidedBy / provides | Harvested | diff --git a/versioned_docs/version-8.0.1/downloads/alternative-model/cfhb.md b/versioned_docs/version-8.0.1/downloads/alternative-model/cfhb.md new file mode 100644 index 0000000..4d9863d --- /dev/null +++ b/versioned_docs/version-8.0.1/downloads/alternative-model/cfhb.md @@ -0,0 +1,30 @@ +--- + +sidebar_position: 1 + +--- + +# CfHbKeyValue + +Information about the sources from which the record has been collected. + + + @JsonSchema(description = "the OpenAIRE identifier of the data source") +### key +_Type: String • Cardinality: ONE_ + +the OpenAIRE identifier of the data source + +```json +"key":"openaire____::081b82f96300b6a6e3d282bad31cb6e2" +``` + +### value +_Type: String • Cardinality: ONE_ + +The name of the data source. + +```json +"value":"Crossref" +``` + diff --git a/versioned_docs/version-8.0.1/downloads/alternative-model/communityInstance.md b/versioned_docs/version-8.0.1/downloads/alternative-model/communityInstance.md new file mode 100644 index 0000000..e883626 --- /dev/null +++ b/versioned_docs/version-8.0.1/downloads/alternative-model/communityInstance.md @@ -0,0 +1,37 @@ +--- + +sidebar_position: 1 + +--- + +# CommunityInstance + +It is a subclass of [Instance](../../data-model/entities/research-product#instance) extended with information regarding the collection and hosting source for this materialization of the research product. + +### hostedby +_Type: [CfHbKeyValue](./cfhb) • Cardinality: ONE_ + +Information about the source from which the instance can be viewed or downloaded. + +```json + +"hostedby": { + "key": "issn___print::35ee75a5ad42581d604be113a8f56427", + "value": "New Phytologist" + }, + +``` + +### collectedfrom +_Type: [CfHbKeyValue](./cfhb) • Cardinality: ONE_ + +Information about the source from which the record has been collected + + +```json + +"collectedfrom": { + "key": "openaire____::081b82f96300b6a6e3d282bad31cb6e2", + "value": "Crossref" + } +``` \ No newline at end of file diff --git a/versioned_docs/version-8.0.1/downloads/alternative-model/context.md b/versioned_docs/version-8.0.1/downloads/alternative-model/context.md new file mode 100644 index 0000000..51cf14e --- /dev/null +++ b/versioned_docs/version-8.0.1/downloads/alternative-model/context.md @@ -0,0 +1,46 @@ +--- + +sidebar_position: 1 + +--- + +# Context + +Information related to research initiative/community (RI/RC) related to the research product. + +### code +_Type: String • Cardinality: ONE_ + +Code identifying the RI/RC. + +```json +"code":"sdsn-gr" + +``` + + +### label +_Type: String • Cardinality: ONE_ + +Label of the RI/RC. + +```json +"label":"SDSN - Greece" +``` + +### provenance +_Type: [Provenance](/data-model/entities/other#provenance-2) • Cardinality: MANY_ + +Why this research product is associated to the RI/RC. + +```json + +"provenance":[{ + "provenance":"Inferred by OpenAIRE", + "trust":"0.9" + }, + ... + ] + +``` + diff --git a/versioned_docs/version-8.0.1/downloads/alternative-model/extended-research-product.md b/versioned_docs/version-8.0.1/downloads/alternative-model/extended-research-product.md new file mode 100644 index 0000000..51edaec --- /dev/null +++ b/versioned_docs/version-8.0.1/downloads/alternative-model/extended-research-product.md @@ -0,0 +1,140 @@ +--- + +sidebar_position: 1 + +--- + + +# Extended Research Product + + +It is a subclass of [ResearchProduct](../../data-model/entities/research-product) extended with information regarding projects (and funders), research communities/infrastructure and related data sources. + + +### projects + +_Type: [Project](project.md) • Cardinality: MANY_ + + +List of projects (i.e. grants) that (co-)funded the production of the research products. + + +```json + + +"projects": [ + { + "id": "corda__h2020::94c4a066401e22002c4811a301bb4655", + "code": "727929", + "acronym": "TomRes", + "title": "A NOVEL AND INTEGRATED APPROACH TO INCREASE MULTIPLE AND COMBINED STRESS TOLERANCE IN PLANTS USING TOMATO AS A MODEL", + "funder": { + "shortName": "EC", + "name": "European Commission", + "jurisdiction": "EU", + "fundingStream": "H2020" + }, + "provenance": { + "provenance": "Harvested", + "trust": "0.900000000000000022" + }, + "validated": { + "validationDate": "2021-0101", + "validatedByFunder": true + } + }, + ... + ] + +``` + +### context + +_Type: [Context](./context) • Cardinality: MANY_ + + +Reference to relevant research infrastructure, initiative or communities (RI/RC) among those collaborating with OpenAIRE. Please see https://connect.openaire.eu that are publicly visible. + + +```json + + +"context":[ + { + "code":"sdsn-gr", + "label":"SDSN - Greece", + "provenance":[ + { + "provenance":"Inferred by OpenAIRE", + "trust":"0.9" + } + ] + }, + ... + ] + +``` + + + +### collectedfrom + +_Type: [CfHbKeyValue](./cfhb) • Cardinality: MANY_ + + +Information about the sources from which the record has been collected. + + +```json + +"collectedfrom":[ + { + "key":"openaire____::081b82f96300b6a6e3d282bad31cb6e2", + "value":"Crossref" + }, + ... + ] + +``` + + +### instance + +_Type: [CommunityInstance](./communityInstance) • Cardinality: MANY_ + +Information about the source from which the instance can be viewed or downloaded. + +```json + + +"instance": [ + { + "license": "http://doi.wiley.com/10.1002/tdm_license_1.1", + "accessright": { + "code": "c_16ec", + "label": "RESTRICTED", + "scheme": "http://vocabularies.coar-repositories.org/documentation/access_rights/", + "openAccessRoute": null + }, + "type": "Article", + "url": [ + "https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1111%2Fnph.15014", + "http://onlinelibrary.wiley.com/wol1/doi/10.1111/nph.15014/fullpdf", + "http://dx.doi.org/10.1111/nph.15014" + ], + "publicationdate": "2018-02-09", + "refereed": "UNKNOWN", + "hostedby": { + "key": "issn___print::35ee75a5ad42581d604be113a8f56427", + "value": "New Phytologist" + }, + "collectedfrom": { + "key": "openaire____::081b82f96300b6a6e3d282bad31cb6e2", + "value": "Crossref" + } + }, + ... + ] + + +``` diff --git a/versioned_docs/version-8.0.1/downloads/alternative-model/funder.md b/versioned_docs/version-8.0.1/downloads/alternative-model/funder.md new file mode 100644 index 0000000..1da93a9 --- /dev/null +++ b/versioned_docs/version-8.0.1/downloads/alternative-model/funder.md @@ -0,0 +1,72 @@ +--- + +sidebar_position: 1 + +--- + +# Funder + + +Information about the funder funding the project. + + +### fundingStream + +_Type: String • Cardinality: ONE_ + + +Funding information for the project. + + +```json + +"funding_stream": "H2020" + + +``` + +### jurisdiction + +_Type: String • Cardinality: ONE_ + + +Geographical jurisdiction (e.g. for European Commission is EU, for Croatian Science Foundation is HR). + + +```json + +"jurisdiction": "EU" + +``` + + +### name + +_Type: String • Cardinality: ONE_ + + +The name of the funder. + + +```json + +"name": "European Commission" + +``` + + +### shortName + +_Type: String • Cardinality: ONE_ + + +The short name of the funder. + + +```json + +"shortName": "EC" + +``` + + diff --git a/versioned_docs/version-8.0.1/downloads/alternative-model/project.md b/versioned_docs/version-8.0.1/downloads/alternative-model/project.md new file mode 100644 index 0000000..985eecd --- /dev/null +++ b/versioned_docs/version-8.0.1/downloads/alternative-model/project.md @@ -0,0 +1,134 @@ +--- + +sidebar_position: 1 + +--- + + + +# Project + + +The information about the projects related to a research product. + + +### id + +_Type: String • Cardinality: ONE_ + + +Main entity identifier, created according to the [OpenAIRE entity identifier and PID mapping policy](../../data-model/pids-and-identifiers). + + +```json + +"id": "corda__h2020::70ea22400fd890c5033cb31642c4ae68" + +``` + + +### code + +_Type: String • Cardinality: ONE_ + + +Τhe grant agreement code of the project. + + +```json + +"code": "777541" + +``` + + +### acronym + +_Type: String • Cardinality: ONE_ + + +Project's acronym. + + +```json + +"acronym": "OpenAIRE-Advance" + +``` + + +### title + +_Type: String • Cardinality: ONE_ + + +Project's title. + + +```json + +"title": "OpenAIRE Advancing Open Scholarship" + +``` + + +### funder + +_Type [Funder](funder.md) • Cardinality: ONE_ + + +Information about the funder funding the project. + + +```json + + +"funder": { + "shortName": "EC", + "name": "European Commission", + "jurisdiction": "EU", + "fundingStream": "H2020" + } + + +``` + +### provenace + + +_Type [Provenance](../../data-model/entities/other#provenance-2) • Cardinality: ONE_ + + +The reason why the project is associated to the research product. + + +```json + + +"provenance": { + "provenance": "Harvested", + "trust": "0.900000000000000022" + } + +``` + + +### validated + + +_Type [Validated](validated.md) • Cardinality: ONE_ + + +Specifies whether the association between the project and the research product was validated. + + +```json + + +"validated": { + "validationDate": "2021-0101", + "validatedByFunder": true + } + +``` + diff --git a/versioned_docs/version-8.0.1/downloads/alternative-model/validated.md b/versioned_docs/version-8.0.1/downloads/alternative-model/validated.md new file mode 100644 index 0000000..3dcb572 --- /dev/null +++ b/versioned_docs/version-8.0.1/downloads/alternative-model/validated.md @@ -0,0 +1,41 @@ +--- + +sidebar_position: 1 + +--- + +# Validated + + +Information about the validtion of the association between the research product and the funding information. + + +### validationDate + +_Type: String • Cardinality: ONE_ + + +When OpenAIRE collected the association between the funding and the research product from an authoritative source (i.e. Sygma). + + +```json + +"validationDate": "2021-0101" + +``` + + +### validatedByFunder + +_Type: Boolean • Cardinality: ONE_ + + +Specifies if the validation comes from the funder. + + +```json + + +"validatedByFunder": true + +``` \ No newline at end of file diff --git a/versioned_docs/version-8.0.1/downloads/beginners-kit.md b/versioned_docs/version-8.0.1/downloads/beginners-kit.md new file mode 100644 index 0000000..0dea0a1 --- /dev/null +++ b/versioned_docs/version-8.0.1/downloads/beginners-kit.md @@ -0,0 +1,16 @@ +--- +sidebar_position: 2 +--- + +# Beginner's kit + +The large size of the OpenAIRE Graph is a major impediment for beginners to familiarise with the underlying data model and explore its contents. +Working with the Graph in its full size typically requires access to a huge distributed computing infrastructure which cannot be easily accessible to everyone. +[The OpenAIRE Beginner’s Kit](https://doi.org/10.5281/zenodo.7490191) aims to address this issue. It consists of two components: + + + +* A subset of the Graph composed of the research products published between 2022-06-29 and 2022-12-29, all the entities connected to them and the respective relationships. +* A Zeppelin notebook that demonstrates how you can use PySpark to analyse the Graph and get answers to some interesting research questions. A guide to Apache Zeppelin can be found [here](https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.5/bk_zeppelin-component-guide/content/ch_overview.html). \ No newline at end of file diff --git a/versioned_docs/version-8.0.1/downloads/full-graph.md b/versioned_docs/version-8.0.1/downloads/full-graph.md new file mode 100644 index 0000000..0a1d399 --- /dev/null +++ b/versioned_docs/version-8.0.1/downloads/full-graph.md @@ -0,0 +1,50 @@ +--- +sidebar_position: 1 +--- + +# Full graph dataset + +You can download the full OpenAIRE Graph Dataset as well as its schema from the following links: + + Dataset: https://doi.org/10.5281/zenodo.3516917 + + Schema: https://doi.org/10.5281/zenodo.4238938 + +The schema used to create this dataset mirrors the one described in the [Data Model](/data-model). +This dataset is licensed under a Creative Commons Attribution 4.0 International License. +It is composed of several files so that you can download the parts you are interested into. The files are named after the entity they store (i.e. publication, dataset). Each file is at most 10GB and it is +a tar archive containing gz files, each with one json per line. + +## How to acknowledge this work + +Open Science services are open and transparent and survive thanks to your active support and to the visibility and reward they gather. If you use one of the [OpenAIRE Graph datasets](https://doi.org/10.5281/zenodo.3516917) for your research, please provide a proper citation following the recommendation that you find on the dataset's Zenodo page or as provided below. + +:::note How to cite + +Manghi P., Atzori C., Bardi A., Baglioni M., Schirrwagen J., Dimitropoulos H., La Bruzzo S., Foufoulas I., Mannocci A., Horst M., Czerniak A., Iatropoulou K., Kokogiannaki A., De Bonis M., Artini M., Lempesis A., Ioannidis A., Manola N., Principe P., Vergoulis T., Chatzopoulos S., Pierrakos D. (2022). "OpenAIRE Research Graph Dataset", *Dataset*, Zenodo. [doi:10.5281/zenodo.3516917](https://doi.org/10.5281/zenodo.3516917) ([BibTex](/bibtex/OpenAIRE_Research_Graph_dataset.bib)) +::: + +Please also consider citing [other relevant research products](/publications#relevant-research-products) that can be of interest. + +Also consider adding one of the following badges to your service with the appropriate link to [our website](https://graph.openaire.eu); click on the badges below to download the respective badge image files. + + + diff --git a/versioned_docs/version-8.0.1/downloads/related-datasets.md b/versioned_docs/version-8.0.1/downloads/related-datasets.md new file mode 100644 index 0000000..461fd2a --- /dev/null +++ b/versioned_docs/version-8.0.1/downloads/related-datasets.md @@ -0,0 +1,34 @@ +--- +sidebar_position: 4 +--- + +# Other related datasets + +In this page, we list other related datasets; please refer to their respective schema definitions for the data model they follow. + +## The dataset of ScholeXplorer + + Dataset: https://zenodo.org/doi/10.5281/zenodo.1200252 + + Schema (Scholix version 3): https://doi.org/10.5281/zenodo.1120275 + + Schema (Scholix version 4): https://doi.org/10.5281/zenodo.6351557 + +This dataset is licensed under a CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. +The dataset contains the GZ-compressed dataset of the Scholix links exposed by the OpenAIRE ScholeXplorer service. + +## The OpenAIRE LOD dataset + +:::caution + The OpenAIRE LOD dataset has been discontinued. The SPARQL Endpoint is no longer supported but old LOD datasets can be found in the link below. +::: + +Dataset (RDF): https://doi.org/10.5281/zenodo.609943 + + + + +The OpenAIRE Linked Open Data (LOD) Services and their integration with the OpenAIRE information space have been released as a beta version. The LOD exporting process started with a specification of the OpenAIRE data model as an RDF vocabulary, and then mapping of the OpenAIRE data to the graph-based RDF data model. To interlink the OpenAIRE data with related data on the Web, we have identified a list of potential datasets to interlinked with, including the DBpedia dataset extracted from Wikipedia and the publication databases DBLP and CiteSeer. + \ No newline at end of file diff --git a/versioned_docs/version-8.0.1/downloads/subgraphs.md b/versioned_docs/version-8.0.1/downloads/subgraphs.md new file mode 100644 index 0000000..07d276d --- /dev/null +++ b/versioned_docs/version-8.0.1/downloads/subgraphs.md @@ -0,0 +1,70 @@ +--- +sidebar_position: 3 +--- + +# Sub-graph datasets + +In order to facilitate users, different datasets are available under the Zenodo community called [OpenAIRE Graph](https://zenodo.org/communities/openaire-research-graph). +This page lists all alternative datasets currently available. + + +## The OpenAIRE COVID-19 dataset + +Dataset: https://doi.org/10.5281/zenodo.3980490 + +Schema: https://doi.org/10.5281/zenodo.3974225 + +This dataset is licensed under a Creative Commons Attribution 4.0 International License. +It contains metadata records of publications, research data, software and projects on the topic of Corona Virus and COVID-19. +This dataset is part of the activities of OpenAIRE to support the fight against COVID-19 together with the OpenAIRE COVID-19 Gateway. +The dataset consists of a tar archive containing gzip files with one json per line. Please refer [here](#alternative-sub-graph-data-model) for details on the data model of this dataset. + +## The dataset of funded products + +Dataset: https://doi.org/10.5281/zenodo.4559725 + +Schema: https://doi.org/10.5281/zenodo.3974225 + +This dataset is licensed under a Creative Commons Attribution 4.0 International License. +It contains metadata records of research products (research literature, data, software, other types of research products) with funding +information available in the OpenAIRE Graph. Records are grouped by funder in a dedicated archive file. Each tar archive contains +gzip files, each with one json record per line. The model of this dataset differs from the one of the whole graph. +Please refer [here](#alternative-sub-graph-data-model) for details on the data model of this dataset. + +## The dataset of delta projects + +Dataset: https://doi.org/10.5281/zenodo.6419021 + +Schema: https://doi.org/10.5281/zenodo.4238938 + +This dataset is licensed under a Creative Commons Attribution 4.0 International License. +It contains the metadata records of projects collected by OpenAIRE in a given time frame. Usually one deposition of collected projects is done for each release of the OpenAIRE Graph +The deposition is one tar archive containing gzip files, each with one json record per line. + +## The datasets about research communities, initiatives and infrastructures + +Dataset: https://doi.org/10.5281/zenodo.3974604 + +Schema: https://doi.org/10.5281/zenodo.3974225 + +This dataset is licensed under a Creative Commons Attribution 4.0 International License. +The dataset contains one file per community/initiative/infrastructure collaborating with OpenAIRE. Check out also their community gateways on +CONNECT. Each file is a tar archive containing gzip files with one json per line. The only communities/research initiative/infrastructure included are publicly visible ones. +The model of this dataset differs from the one of the whole graph. +Please refer [here](#alternative-sub-graph-data-model) for details on the data model of this dataset. + +--- + +## Alternative sub-graph data model + +It should be noted that the datasets for research communities, infrastructures, and products related to projects do not strictly follow the main data model of the OpenAIRE Graph. In particular, they differ in the following: + +* only research products are included (no relations or other entities) +* the research products are extended with information that can be inferred in the whole dataset namely: + * funding information if present + * associated research community/infrastructure + * associated data sources + +So they have just one entity type, that is the [Extended Research Product](./alternative-model/extended-research-product.md). diff --git a/versioned_docs/version-8.0.1/faq.md b/versioned_docs/version-8.0.1/faq.md new file mode 100644 index 0000000..ace8840 --- /dev/null +++ b/versioned_docs/version-8.0.1/faq.md @@ -0,0 +1,7 @@ +--- +sidebar_position: 10 +--- + +# FAQ + +https://support.openaire.eu/projects/docs/wiki/FAQ \ No newline at end of file diff --git a/versioned_docs/version-8.0.1/graph-production-workflow/_category_.json b/versioned_docs/version-8.0.1/graph-production-workflow/_category_.json new file mode 100644 index 0000000..8da8ce0 --- /dev/null +++ b/versioned_docs/version-8.0.1/graph-production-workflow/_category_.json @@ -0,0 +1,8 @@ +{ + "label": "Graph production workflow", + "position": 6, + "link": { + "type": "doc", + "id": "graph-production-workflow" + } +} \ No newline at end of file diff --git a/versioned_docs/version-8.0.1/graph-production-workflow/aggregation/aggregation.md b/versioned_docs/version-8.0.1/graph-production-workflow/aggregation/aggregation.md new file mode 100644 index 0000000..f64c397 --- /dev/null +++ b/versioned_docs/version-8.0.1/graph-production-workflow/aggregation/aggregation.md @@ -0,0 +1,58 @@ +--- +sidebar_position: 1 +--- + +# Aggregation + +OpenAIRE materializes an open, participatory research graph (the OpenAIRE Graph) where products of the research life-cycle (e.g. scientific literature, research data, project, software) are semantically linked to each other and carry information about their access rights (i.e. if they are Open Access, Restricted, Embargoed, or Closed) and the sources from which they have been collected and where they are hosted. The OpenAIRE Graph is materialised via a set of autonomic, orchestrated workflows operating in a regimen of continuous data aggregation and integration. [1] + +## What does OpenAIRE collect? + +OpenAIRE aggregates metadata records describing objects of the research life-cycle from content providers compliant to the [OpenAIRE guidelines](https://guidelines.openaire.eu/) and from entity registries (i.e. data sources offering authoritative lists of entities, like [OpenDOAR](https://v2.sherpa.ac.uk/opendoar/), [re3data](https://www.re3data.org/), [DOAJ](https://doaj.org/), and various funder databases). After collection, metadata are transformed according to the OpenAIRE internal metadata model, which is used to generate the final OpenAIRE Graph, accessible from the [OpenAIRE EXPLORE portal](https://explore.openaire.eu) and the [APIs](https://graph.openaire.eu/develop/). + +The transformation process includes the application of cleaning functions whose goal is to ensure that values are harmonised according to a common format (e.g. dates as YYYY-MM-dd) and, whenever applicable, to a common controlled vocabulary. The controlled vocabularies used for cleansing are accessible at [api.openaire.eu/vocabularies](https://api.openaire.eu/vocabularies/). Each vocabulary features a set of controlled terms, each with one code, one label, and a set of synonyms. If a synonym is found as field value, the value is updated with the corresponding term. +In addition, the OpenAIRE Graph is extended with other relevant scholarly communication sources that need special handling, either because they do not strictly follow the OpenAIRE Guidelines or due to the vast amount of data of data they offer; these include Crossref, ORCID, Microsoft Academic Graph, Unpaywall). + ++ +
+ +The OpenAIRE aggregation system collects information about objects of the research life-cycle compliant to the [OpenAIRE acquisition policy](https://www.openaire.eu/content-acquisition-policy) from [different types of data sources](https://explore.openaire.eu/search/find/dataproviders): + +1. Scientific literature metadata and full-texts from institutional and thematic repositories, CRIS (Common Research Information Systems), Open Access journals and publishers; +2. Dataset metadata from data repositories and data journals; +3. Scientific literature, data and software metadata from Zenodo; +4. Metadata about data sources, organizations, projects, and funding programs from entity registries, i.e. authoritative sources such as CORDA and other funder databases for projects, OpenDOAR for publication repositories, re3data for data repositories, DOAJ for Open Access journals; +5. Metadata of open source research software from software repositories and SoftwareHeritge +6. Metadata about other types of research products, like workflow, protocols, methods, research packages + +Relationships between objects are collected from the data sources, but also automatically detected by [inference algorithms](https://www.openaire.eu/blogs/text-mining-services-in-openaire-1) and added by authenticated users, who can insert links between literature, datasets, software and projects via [the “Link” procedure available from the OpenAIRE explore portal](https://explore.openaire.eu). More information about the linking functionality can be found [here](https://www.openaire.eu/linking). + +## What kind of data sources are in OpenAIRE? + +Objects and relationships in the OpenAIRE Graph are extracted from information packages, i.e. metadata records, collected from data sources of the following kinds: + +- *Literature, Institutional and thematic repositories*: Information systems where scientists upload the bibliographic metadata and full-texts of their articles, due to obligations from their organization or due to community practices (e.g. ArXiv, Europe PMC); +- *Open Access Publishers and journals*: Information system of open access publishers or relative journals, which offer bibliographic metadata and PDFs of their published articles; +- *Data archives*: Information systems where scientists deposit descriptive metadata and files about their research data (also known as scientific data, datasets, etc.).; +- *Hybrid repositories/archives*: information systems where scientists deposit metadata and file of any kind of scientific products, incuding scientific literature, research data and research software (e.g. Zenodo) +- *Aggregator services*: Information systems that collect descriptive metadata about publications or datasets from multiple sources in order to enable cross-data source discovery of given research products. Examples are DataCite, BASE, DOAJ; +- *Entity Registries*: Information systems created with the intent of maintaining authoritative registries of given entities in the scholarly communication, such as OpenDOAR for the institutional repositories, re3data for the data repositories, CORDA and other funder databases for projects and funding information; +- *CRIS*: Information systems adopted by research and academic organizations to keep track of their research administration records and relative research products; examples of CRIS content are articles or datasets funded by projects, their principal investigators, facilities acquired thanks to funding, etc.. +- *Research Graphs*: services that maintain an information space of (possibly interlinked) scholalrly communication objects. Examples are CrossRef, ScholeXplorer and OpenAIRE itself. + +## How does OpenAIRE collect metadata records? + +OpenAIRE collects metadata records describing objects of the research life-cycle from content providers compliant to the OpenAIRE guidelines and from entity registries (i.e. data sources offering authoritative lists of entities, like OpenDOAR, re3data, DOAJ, and funder databases). + +The OpenAIRE aggregator collects metadata records in the majority of cases via [OAI-PMH](https://www.openarchives.org/pmh/), but also supports other standard exchange protocols like FTP(S), SFTP, and some RESTful API. +The whole list of available and used collectors could be found in the [RedMine Wiki - API Protocols](https://support.openaire.eu/projects/openaire/wiki/API_protocols) + +For additional details about the aggregation workflows, please refer to [2]. + + +## References + +[1] Manghi, P., Artini, M., Atzori, C., Bardi, A., Mannocci, A., La Bruzzo, S., Candela, L., Castelli, D. and Pagano, P. (2014), “The D-NET software toolkit: A framework for the realization, maintenance, and operation of aggregative infrastructures”, Program: electronic library and information systems, Vol. 48 No. 4, pp. 322-354. [doi:10.1108/prog-08-2013-0045](http://doi.org/10.1108/prog-08-2013-0045) + +[2] Atzori, C., Bardi, A., Manghi, P., & Mannocci, A. (2017, January). "The OpenAIRE workflows for data management". In Italian Research Conference on Digital Libraries (pp. 95-107). Springer, Cham. [doi:10.1007/978-3-319-68130-6_8](https://doi.org/10.1007/978-3-319-68130-6_8) \ No newline at end of file diff --git a/versioned_docs/version-8.0.1/graph-production-workflow/aggregation/compatible-sources.md b/versioned_docs/version-8.0.1/graph-production-workflow/aggregation/compatible-sources.md new file mode 100644 index 0000000..48d831e --- /dev/null +++ b/versioned_docs/version-8.0.1/graph-production-workflow/aggregation/compatible-sources.md @@ -0,0 +1,11 @@ +--- +sidebar_position: 1 +--- + +# OpenAIRE compatible sources + +The OpenAIRE aggregator collects metadata records from content providers compliant to the OpenAIRE guidelines. + +The OpenAIRE Guidelines help repository managers expose publications, datasets and CRIS metadata via the OAI-PMH protocol in order to integrate with OpenAIRE infrastructure. + +You can find more information in https://guidelines.openaire.eu/en/latest/ \ No newline at end of file diff --git a/versioned_docs/version-8.0.1/graph-production-workflow/aggregation/non-compatible-sources/crossref_unpaywall.md b/versioned_docs/version-8.0.1/graph-production-workflow/aggregation/non-compatible-sources/crossref_unpaywall.md new file mode 100644 index 0000000..54ab378 --- /dev/null +++ b/versioned_docs/version-8.0.1/graph-production-workflow/aggregation/non-compatible-sources/crossref_unpaywall.md @@ -0,0 +1,165 @@ +# Crossref & Unpaywall + +This section describes the procedure used to integrate the contents from [Crossref](https://www.crossref.org) and [Unpaywall](https://unpaywall.org) in the OpenAIRE Graph. + +## Data acquisition + +The dataset containing all the Crossref records is obtained via a complete data dump on a monthly basis. +The Unpaywall dataset is no longer updated anymore but its latest snapshot (Dec 2021) is used to enrich the Crossref contents. + +## Process + +In the following we describe the process applied to the Crossref & the Unpaywall contents. + +### Crossref filtering + +Records in Crossref are ruled out according to the following criteria + +* have blank title, examples: + * `10.1093/rheumatology/41.7.837` + * `10.1093/qjmed/95.7.430` + * `10.1371/journal.pone.0171434.g005` +* have one of the following publishers: `"Test accounts"`, `"CrossRef Test Account"` + * Examples from https://api.crossref.org/works?query.publisher-name=%22Test%20accounts%22 + * `10.1007/bf00344543` + * `10.1007/bf00186154` + * `10.1306/64ed947a-1724-11d7-8645000102c1865d` +* have authors matching the following invalid names: `",", "none none", "none, none", "none &na;", "(:null)", "test test test", "test test", "test", "&na; &na"` + * Examples for `"none"` author from https://api.crossref.org/works?query.author=%22none%22 + * `10.4007/annals.2016.184.3.11` + * `10.4007/annals.2012.176.1.6` + * `10.2172/6393585` + * Examples for `"test"` author from https://api.crossref.org/works?query.author=%22test%22 + * `10.5116/ijme.54ca.a5ae` + * `10.5755/j01.ss.71.2.544` + * `10.5755/j01.ee.22.2.319` +* have `"Addie Jackson"` as author and `"Elsevier BV"` as publisher (empirically we say they are test records) + * Examples from https://api.crossref.org/works?query.author=Addie+Jackson&query.publisher-name=%22Elsevier%20BV%22 + * `10.2139/ssrn.2082156` + * `10.2139/ssrn.2202300` + * `10.2139/ssrn.2255657` +* have not one of the following values in the field `type` : `"book-section"`, `"book"`, `"book-chapter"`, `"book-part"`, `"book-series"`, `"book-set"`, `"book-track"`, `"edited-book"`, `"reference-book"`, `"monograph"`, `"journal-article"`, `"dissertation"`, `"other"`, `"peer-review"`, `"proceedings"`, `"proceedings-article"`, `"reference-entry"`, `"report"`, `"report-series"`, `"standard"`, `"standard-series"`, `"posted-content"`, `"dataset"`, + * Example: + * `10.1371/journal.pone.0171434.g005` + * `10.7554/elife.21052.049` + * `10.1371/journal.pcbi.1005379.s006` + +Records with `type=dataset` are mapped into OpenAIRE research products of type dataset. All others are mapped as OpenAIRE research products of type publication. + +### Mapping Crossref properties into the OpenAIRE Graph + +Properties in OpenAIRE research products are set based on the logic described in the following table: + +| OpenAIRE Research Product field path | Crossref path(s) | Notes | +|----------------------------------------|--------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `id` | `doi` | id in the form `doi_________::md5(doi)` | +| `dateofcollection` | `indexed.datetime` | | +| `lastupdatetimestamp` | `indexed.timestamp` | | +| `type` | `type` | Using the **_dnet:result_typologies_** vocabulary, we look up the `instance.type` synonym to generate one of the following main entities:+ +
+ + +* data sources: it is possible to list a set of data sources relevant for the RC/RI. All research products collected from these data sources will be linked to the RC/RI ++ +
+ +When only some research products collected from a datasource are relevant for the RC/RI, it is possible to specify a set of selection constraints (SC) that have to be verified before linking the research product to the +community. The selection constraint has the form SC = S1 or S2 or ... or Sn. The generic Si has the form Si = si1 and si2 and ...and sin and each sij is a condition on a specific field of the research product. The set of fields that can be specified is F={title, author, contributor, description, orcid}, +while the set of condition can be among V={contains, equals, not_contains, not_equals, contains_ignorecase, equals_ignorecase, not_contains_ignorecase, not_equal_ignorecase}, and the value is free text. +A possible selection criteria can be: “All the products whose contributor contains DARIAH “ + ++ +
+ +* Zenodo community: it is possible to list a set of Zenodo communities relevant for the RC/RI. All the products collected from the listed Zenodo communities are linked to the RC/RI + + ++ +
+ + +The list of subjects, Zenodo communities and data sources used to enrich the products are defined by the managers of the community gateway or infrastructure monitoring dashboard associated with the RC/RI. diff --git a/versioned_docs/version-8.0.1/graph-production-workflow/deduction-and-propagation/propagation.md b/versioned_docs/version-8.0.1/graph-production-workflow/deduction-and-propagation/propagation.md new file mode 100644 index 0000000..93e71e9 --- /dev/null +++ b/versioned_docs/version-8.0.1/graph-production-workflow/deduction-and-propagation/propagation.md @@ -0,0 +1,55 @@ +# Propagation + +This process enriches the graph by adding new links and/or new properties. The new information is added by exploiting existing semantic +relationships and values between the involved entities + +As of November 2022, the following procedures are in place: + +* Country propagation: updates the property “country” of a research product. This happens when the research product is collected from an institutional datasource or when the datasource hosting the research product is inserted in a whitelist. For all the research products whose hosting datasource verifies one of the conditions above, the country of the organization providing the datasource is added to the country of the research product: e.g. publication collected from an institutional repository maintained by an italian university will be enriched with the property “country = IT”. ++ +
+ +* Project propagation: adds a "isProducedBy" relationship (and its inverse) between a Project P and research product R1, if R1 has a strong semantic relationship with another research product R2 and P produces R2: e.g. publication linked to project P “is supplemented by” a dataset D. Dataset D will get the link to project P. The relationships considered for this procedure are “isSupplementedBy” and “isSupplementTo”. ++ +
+* Research product to RC/RI through organization propagation. The manager of the RC/RI can specify a set of organizations whose product are relevant for the +community. +Each research product having such a relation of affiliation with at least one organization relevant for the RC/RI will be linked to it. ++ +
+ +* Research product to RC/RI through semantic relation: extends the set of products linked to a RC/RI by exploiting strong semantic relationships between the research products; +e.g. if a research product R1 is associated to the community C and is supplemented by a research product R2 then R2 will be linked to the community. The relationships considered for this procedure are “isSupplementedBy” and “supplements”. ++ +
+* ORCID identifiers to research product through semantic relation. This propagation enriches the research products by adding ORCID identifiers to authors. The added ORCID will be marked as "potential" since they have been inserted through propagation. +The process considers the set of overlapping authors between research products (R1 and R2) linked with a strong semantic relationship (IsSupplementedBy, IsSupplementTo). +For each author A in the overlapping set, if R1 provides the ORCID value for A and R2 does not, then the author A in R2 will be enriched with the information of the ORCID found in R1. + ++ +
+ +* affiliation to organization through institutional repository. This propagation adds one "hasAuthorInstitution" relationship (and its inverse) +between a research product R and Organization O, +if R was collected from a datasource D with type institutional repository, and D was provided by O. ++ +
+ +* affiliation to organization through semantic relation. This propagation adds one "hasAuthorInstitution" relationship (and its inverse) between a +research product R and an Organization O, +if R has an affiliation relation with an organization O1 that is in relation "isChildOf" with O. + ++ +
+ The algorithm exploits only the organization leaves that are in a "IsChildOf" relation with another organization. So far one single step is done ++ +
\ No newline at end of file diff --git a/versioned_docs/version-8.0.1/graph-production-workflow/deduplication/_category_.json b/versioned_docs/version-8.0.1/graph-production-workflow/deduplication/_category_.json new file mode 100644 index 0000000..c80249b --- /dev/null +++ b/versioned_docs/version-8.0.1/graph-production-workflow/deduplication/_category_.json @@ -0,0 +1,8 @@ +{ + "label": "Deduplication", + "position": 2, + "link": { + "type": "doc", + "id": "deduplication" + } +} \ No newline at end of file diff --git a/versioned_docs/version-8.0.1/graph-production-workflow/deduplication/clustering-functions.md b/versioned_docs/version-8.0.1/graph-production-workflow/deduplication/clustering-functions.md new file mode 100644 index 0000000..ded6c57 --- /dev/null +++ b/versioned_docs/version-8.0.1/graph-production-workflow/deduplication/clustering-functions.md @@ -0,0 +1,93 @@ +--- +sidebar_position: 3 +--- +# Clustering functions + +## Ngrams + +It creates ngrams from the input field.+ +
+ +[//]: # (Link to the image: https://docs.google.com/drawings/d/1lLLSU3wsWighmxGQMNMZbgP3mg3BfDVAGVLwt4_OFA8/edit?usp=sharing) + +### Collection import + +The nodes in the graph represent entities of different types. This phase is responsible for identifying all the nodes with a given type and make them available to the subsequent phases representing them in the deduplication record model. + +### Candidate identification (clustering) + +Clustering is a common heuristics used to overcome the N x N complexity required to match all pairs of objects to identify the equivalent ones. The challenge is to identify a [clustering function](./clustering-functions) that maximizes the chance of comparing only records that may lead to a match, while minimizing the number of records that will not be matched while being equivalent. Since the equivalence function is to some level tolerant to minimal errors (e.g. switching of characters in the title, or minimal difference in letters), we need this function to be not too precise (e.g. a hash of the title), but also not too flexible (e.g. random ngrams of the title). On the other hand, reality tells us that in some cases equality of two records can only be determined by their PIDs (e.g. DOI) as the metadata properties are very different across different versions and no [clustering function](./clustering-functions) will ever bring them into the same cluster. + +### Duplicates identification (pair-wise comparisons) + +Pair-wise comparisons are conducted over records in the same cluster following the strategy defined in the decision tree. A different decision tree is adopted depending on the type of the entity being processed. + +To further limit the number of comparisons, a sliding window mechanism is used: (i) records in the same cluster are lexicographically sorted by their title, (ii) a window of K records slides over the cluster, and (iii) records ending up in the same window are pair-wise compared. The result of each comparison produces a similarity relation when the pair of record matches. Such relations will be consequently used as input for the duplicates grouping stage. + +### Duplicates grouping (transitive closure) + +Once the similarity relations between pairs of records are drawn, the groups of equivalent records are obtained (transitive closure, i.e. “mesh”). From such sets a new **representative record** is obtained, which inherits properties from the merged records and keeps track of their provenance. + +### Relation redistribution + +Relations involved in nodes identified as duplicated are eventually marked as virtually deleted and used as template for creating a new relation pointing to the new representative record. +Note that nodes and relationships marked as virtually deleted are not exported. + ++ +
+ +[//]: # (Link to the image: https://docs.google.com/drawings/d/1cDEuVhWnSO8lUZs_Nd748vKfIPxg10jbwKSVZlv33Mg/edit?usp=sharing) \ No newline at end of file diff --git a/versioned_docs/version-8.0.1/graph-production-workflow/deduplication/organizations.md b/versioned_docs/version-8.0.1/graph-production-workflow/deduplication/organizations.md new file mode 100644 index 0000000..c2c57e1 --- /dev/null +++ b/versioned_docs/version-8.0.1/graph-production-workflow/deduplication/organizations.md @@ -0,0 +1,70 @@ +--- +sidebar_position: 2 +--- + +# Organizations + +The organizations in OpenAIRE are aggregated from different registries (e.g. CORDA, OpenDOAR, Re3data, ROR). In some cases, a registry provides organizations as entities with their own persistent identifier. In other cases, those organizations are extracted from other main entities provided by the registry (e.g. datasources, projects, etc.). + +The deduplication of organizations is enhanced by the [OpenOrgs](https://orgs.openaire.eu), a tool that combines an automated approach for identifying duplicated instances +of the same organization record with a "humans in the loop" approach, in which the equivalences produced by a duplicate identification algorithm are suggested to data curators, in charge for validating them. +The data curation activity is twofold, on one end pivots around the disambiguation task, on the other hand assumes to improve the metadata describing the organization records +(e.g. including the translated name, or a different PID) as well as defining the hierarchical structure of existing large organizations (i.e. Universities comprising its departments or large research centers with all its sub-units or sub-institutes). + +Duplicates among organizations are therefore managed through three different stages: + * *Creation of Suggestions*: executes an automatic workflow that performs the deduplication and prepare new suggestions for the curators to be processed; + * *Curation*: manual editing of the organization records performed by the data curators; + * *Creation of Representative Organizations*: executes an automatic workflow that creates curated organizations and exposes them on the OpenAIRE Graph by using the curators' feedback from the OpenOrgs underlying database. + +The next sections describe the above mentioned stages. + +### Creation of Suggestions + +This stage executes an automatic workflow that faces the *candidate identification* and the *duplicates identification* stages of the deduplication to provide suggestions for the curators in the OpenOrgs. + +#### Candidate identification (clustering) + +To match the requirements of limiting the number of comparisons, OpenAIRE clustering for organizations aims at grouping records that would more likely be comparable. +It works with four functions: +* *URL-based function*: the function generates the URL domain when this is provided as part of the record properties from the organization's `websiteurl` field; +* *Title-based functions*: + * generate strings dependent to the keywords in the `legalname` field; + * generate strings obtained as an alternation of the function prefix(3) and suffix(3) (and vice versa) on the first 3 words of the `legalname` field; + * generate strings obtained as a concatenation of ngrams of the `legalname` field; + +#### Duplicates identification (pair-wise comparisons) + +For each pair of organization in a cluster the following strategy (depicted in the figure below) is applied. +The comparison goes through the following decision tree: +1. *grid id check*: comparison of the grid ids. If the grid id is equivalent, then the similarity relation is drawn. If the grid id is not available, the comparison proceeds to the next stage; +2. *early exits*: comparison of the numbers extracted from the `legalname`, the `country` and the `website` url. No similarity relation is drawn in this stage, the comparison proceeds only if the compared fields verified the conditions of equivalence; +3. *city check*: comparison of the city names in the `legalname`. The comparison proceeds only if the legalnames shares at least 10% of cities; +4. *keyword check*: comparison of the keywords in the `legalname`. The comparison proceeds only if the legalnames shares at least 70% of keywords; +5. *legalname check*: comparison of the normalized `legalnames` with the `Jaro-Winkler` distance to determine if it is higher than `0.9`. If so, a similarity relation is drawn. Otherwise, no similarity relation is drawn. + ++ +
+ +[//]: # (Link to the image: https://docs.google.com/drawings/d/1YKInGGtHu09QG4pT2gRLEum4LxU82d4nKkvGNvRQmrg/edit?usp=sharing) + +### Data Curation + +All the similarity relations drawn by the algorithm involving the decision tree are exposed in OpenOrgs, where are made available to the data curators to give feedbacks and to improve the organizations metadata. +A data curator can: + * *edit organization metadata*: legalname, pid, country, url, parent relations, etc.; + * *approve suggested duplicates*: establish if an equivalence relation is valid; + * *discard suggested duplicates*: establish if an equivalence relation is wrong; + * *create similarity relations*: add a new equivalence relation not drawn by the algorithm. + +Note that if a curator does not provide a feedback on a similarity relation suggested by the algorithm, then such relation is considered as valid. + +### Creation of Representative Organizations + +This stage executes an automatic workflow that faces the *duplicates grouping* stage to create representative organizations and to update them on the OpenAIRE Graph. Such organizations are obtained via transitive closure and the relations used comes from the curators' feedback gathered on the OpenOrgs underlying Database. + +#### Duplicates grouping (transitive closure) + +Once the similarity relations between pairs of organizations have been gathered, the groups of equivalent organizations are obtained (transitive closure, i.e. “mesh”). From such sets a new representative organization is obtained, which inherits all properties from the merged records and keeps track of their provenance. + +The IDs of the representative organizations are obtained by the OpenOrgs Database that creates a unique ``openorgs`` ID for each approved organization. In case an organization is not approved by the curators, the ID is obtained by appending the prefix ``pending_org`` to the MD5 of the first ID (given their lexicographical ordering). \ No newline at end of file diff --git a/versioned_docs/version-8.0.1/graph-production-workflow/deduplication/research-products.md b/versioned_docs/version-8.0.1/graph-production-workflow/deduplication/research-products.md new file mode 100644 index 0000000..52e5d2f --- /dev/null +++ b/versioned_docs/version-8.0.1/graph-production-workflow/deduplication/research-products.md @@ -0,0 +1,232 @@ +--- +sidebar_position: 1 +--- + +# Research products + +Duplicates among research products are identified among results of the same +type (publications, datasets, software, other research products). If two +duplicate research products are aggregated one as a dataset and one as a +software, for example, they will never be compared and they will never be +identified as duplicates. +OpenAIRE supports different deduplication strategies based on the type of +results. + +The next sections describe how each stage of the deduplication workflow is faced +for research products. + +### Candidate identification (clustering) + +To match the requirements of limiting the number of comparisons, OpenAIRE +clustering for research products works with two different strategies based on +entity types: + +#### Software + +* *Title extraction functions*: + two clustering functions are applied to the title (normalized, stemming, etc.) + * *stats and suffix prefix of words*: the function generates a key that + depends on (i) number of significant words in the title, (ii) module 10 of + the number of characters of such words, and (iii) a + string + obtained as an alternation of the function prefix(3) and suffix(3) (and + vice-versa) on the first 3 words (2 words if the title only has 2). For + example, the title ``Search for the Standard Model Higgs Boson`` + becomes the two keys ``5-3-seaardmod`` and ``5-3-rchstadel`` + * *n-grams*: the function generates ngrams from the + title. For example, the + title ``Search for the Standard Model Higgs Boson`` + becomes the keys ``tan``, ``sta``, ``ode``, ``mod``, ``ear``, ``hig``, + ``igg``, ``sea`` +* *DOI extraction function*: the function generates the DOI when this is + provided as part of the record properties +* *URL extraction function*: the function generates the hostname part provided + by the URL of the software, if any + +#### Publication, Dataset and Other Research Product + +* *PID extraction function*: the function generates the PIDs when at least one + is provided as part of the ``pid`` record properties +* *Author and Title extraction function*: the function generates a key that + depends on (i) the number of authors of the product, with a cap of 21 + authors (ii) number of significant words in the title (normalized, stemming, + etc.), divided by 10, and (iii) a string obtained as an alternation of the + function prefix(3) and suffix(3) (and vice versa) on the first 3 words (2 + words if the title only has 2). ++ +
+ +[//]: # (Link to the image: https://docs.google.com/drawings/d/19SIilTp1vukw6STMZuPMdc0pv0ODYCiOxP7OU3iPWK8/edit?usp=sharing) + +#### Datasets and Other types of research products + +For each pair of datasets or other types of research products in a cluster the +strategy depicted in the figure below is applied. +The decision tree is almost identical to the publication decision tree, with the +only exception of the *instance type check* stage. Since such type of record +does not have a relatable instance type, the check is not performed and the +decision tree node is skipped. + ++ +
+ +[//]: # (Link to the image: https://docs.google.com/drawings/d/1uBa7Bw2KwBRDUYIfyRr_Keol7UOeyvMNN7MPXYLg4qw/edit?usp=sharing) + +#### Software + +For each pair of software in a cluster the following strategy (depicted in the +figure below) is applied. +The comparison goes through different stages: + +1. *DOI pids and URLs check*: comparison of the pids of type DOI and URLs in the + records. If at least 1 DOI is equivalent or 1 URL is equivalent, then records + match and the similarity relation is drawn +2. *title check*: comparison of the record titles with Levenshtein distance, + excluding versioning information. + If the distance is below 0.95 then the records does not match. Otherwise, the + comparison proceeds to the next stage +3. *untrusted DOI check*: comparison of all the available DOIs (in the `pid` and + the `alternateid` fields of the record). If at least 1 DOI is equivalent, + records match and the similarity relation is drawn +4. *authors check*: "smart" comparison of the author lists to check if the two + products share all authors + ++ +
+ +[//]: # (Link to the image: https://docs.google.com/drawings/d/19gd1-GTOEEo6awMObGRkYFhpAlO_38mfbDFFX0HAkuo/edit?usp=sharing) + +### Duplicates grouping + +The aim of the final stage is the creation of records that group all the +equivalent entities discovered pairwise by the previous step. This is done in +multiple phases. + +#### Transitive closure + +As the concluding step of duplicate identification, a transitive closure is +performed against similarity relations to identify complete groups of duplicated +records (cliques). If a group exceeds 200 elements, only the first 200 elements +are included in the group, while the remaining elements are kept ungrouped. + +#### Selection of the pivot record + +Each group of duplicate records needs to be identified in the final graph with +an OpenAIRE identifier, derived from a record of the group known as the _pivot +record_. It is determined after sorting the group of duplicate records by the +following criteria: + +1. Records with identifiers from a [PID authority](/data-model/pids-and-identifiers#pid-authorities). +2. Records chosen as pivots in the graph's previous generations. +3. Publications from CrossRef or datasets from DataCite. +4. Records with an earlier date of acceptance. +5. Records with smaller IDs in lexicographical order. + +The first sorting criterion is possible because a state table, called "pivot +history", is maintained across graph generations. It keeps track of which +records were used as pivot records in what graph, guaranteed to retain data for +the last 12 months. + +#### Creation of representative records + +The representative record, also known as the "dedup record", replaces the group +of deduplicated records in the graph. + +##### OpenAIRE identifier of the representative record + +The OpenAIRE identifier of the representative record is generated based on the +identifier of the record chosen as the pivot of the group: + +- if the pivot record comes from a "PID authority", the identifier of the + representative record is the same, but the "PID Type Prefix" part of the + identifier is modified to append ``_dedup``.+ +
+ diff --git a/versioned_docs/version-8.0.1/graph-production-workflow/indexing.md b/versioned_docs/version-8.0.1/graph-production-workflow/indexing.md new file mode 100644 index 0000000..759f1a2 --- /dev/null +++ b/versioned_docs/version-8.0.1/graph-production-workflow/indexing.md @@ -0,0 +1,17 @@ +# Indexing + +The final version of the OpenAIRE Graph is indexed on a Solr server that is used by the OpenAIRE portals ([EXPLORE](https://explore.openaire.eu), [CONNECT](https://connect.openaire.eu), [PROVIDE](https://provide.openaire.eu)) and APIs, the latter adopted by several third-party applications and organizations, such as: + +* The OpenAIRE Graph APIs and Portals will offer to the EOSC (European Open Science Cloud) an Open Science Resource Catalogue, keeping an up to date map of all research products (publications, datasets, software), services, organizations, projects, funders in Europe and beyond. + +* DSpace & EPrints repositories can install the OpenAIRE plugin to expose OpenAIRE compliant metadata records via their OAI-PMH endpoint and offer to researchers the possibility to link their depositions to the funding project, by selecting it from the list of project provided by OpenAIRE. + +* EC participant portal (Sygma - System for Grant Management) uses the OpenAIRE API in the “Continuous Reporting” section. Sygma automatically fetches from the OpenAIRE Search API the list of publications and datasets in the OpenAIRE Graph that are linked to the project. The user can select the research products from the list and easily compile the continuous reporting data of the project. + +* ScholExplorer is used by different players of the scholarly communication ecosystem. For example, [Elsevier](https://www.elsevier.com/authors/tools-and-resources/research-data/data-base-linking) uses its API to make the links between +publications and datasets automatically appear on ScienceDirect. +ScholExplorer indexes the links among the four major types of research products (API v3) available in the OpenAIRE Graph and makes them available through an HTTP API that allows +to search them by the following criteria: + * Links whose source object has a given PID or PID type; + * Links whose source object has been published by a given data source ("data source as publisher"); + * Links that were collected from a given data source ("data source as provider"). diff --git a/versioned_docs/version-8.0.1/graph-production-workflow/indicators-ingestion/fos-classification.md b/versioned_docs/version-8.0.1/graph-production-workflow/indicators-ingestion/fos-classification.md new file mode 100644 index 0000000..8a3270a --- /dev/null +++ b/versioned_docs/version-8.0.1/graph-production-workflow/indicators-ingestion/fos-classification.md @@ -0,0 +1,2 @@ +# Field of Science + diff --git a/versioned_docs/version-8.0.1/graph-production-workflow/indicators-ingestion/impact-indicators.md b/versioned_docs/version-8.0.1/graph-production-workflow/indicators-ingestion/impact-indicators.md new file mode 100644 index 0000000..23476e1 --- /dev/null +++ b/versioned_docs/version-8.0.1/graph-production-workflow/indicators-ingestion/impact-indicators.md @@ -0,0 +1,170 @@ +# Citation-based impact indicators + +This page summarises all calculated citation-based impact indicators, provided by [BIP!](https://bip.imsi.athenarc.gr/), which are included in the [bipIndicators](../../data-model/entities/other#bipindicators) property (found under the [indicators](../../data-model/entities/research-product#indicators) property of the reseach product). + +It should be noted that the citation-based impact indicators are being calculated on the level of the research output. +Below we explain their main intuition, the way they are calculated, and their most important limitations, in an attempt help avoiding common pitfalls and misuses. + + +## Citation Count (CC) • influence_alt + +***Short description:*** +This is the most widely used citation-based impact indicator, which sums all citations received by each article. +Citation count can be viewed as a measure of a publication's overall (citation-based) impact, since it conveys the number of other works that directly +drew on it. + +***Algorithmic details:*** +The citation count of a +publication $i$ corresponds to the in-degree of the corresponding node in the underlying citation network: $s_i = \sum_{j} A_{i,j}$, +where $A$ is the adjacency matrix of the network (i.e., $A_{i,j}=1$ when paper $j$ cites paper $i$, while $A_{i,j}=0$ otherwise). + +***Parameters:*** - + +***Limitations:*** +OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator. +Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source. + +***Environment:*** PySpark + +***References:*** - + +***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker) + + +## "Incubation" Citation Count (iCC) • impulse + +***Short description:*** +This measure is essentially a time-restricted version of the citation count, where the time window is distinct for each paper, i.e., +only citations $y$ years after its publication are counted. + +***Algorithmic details:*** +The "incubation" citation count of a paper $i$ is +calculated as: $s_i = \sum_{j,t_j \leq t_i+y} A_{i,j}$, where $A$ is the adjacency matrix and $t_j, t_i$ are the citing and cited paper's +publication years, respectively. $t_i$ is cited paper $i$'s publication year. iCC can be seen as an indicator of a paper's initial momentum +(impulse) directly after its publication. + +***Parameters:*** +$y=3$ + +***Limitations:*** +OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator. +Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source. + +***Environment:*** PySpark + +***References:*** +* Vergoulis, T., Kanellos, I., Atzori, C., Mannocci, A., Chatzopoulos, S., Bruzzo, S. L., Manola, N., & Manghi, P. (2021, April). Bip! db: A dataset of impact measures for scientific publications. In Companion Proceedings of the Web Conference 2021 (pp. 456-460). + +***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker) + + + ## PageRank (PR) • influence + +***Short description:*** +Originally developed to rank Web pages, PageRank has been also widely used to rank publications in citation +networks. In this latter context, a publication's PageRank +score also serves as a measure of its influence. + +***Algorithmic details:*** +The PageRank score of a publication is calculated +as its probability of being read by a researcher that either randomly selects publications to read or selects +publications based on the references of her latest read. Formally, the score of a publication $i$ is given by: + +$$ +s_i = \alpha \cdot \sum_{j} P_{i,j} \cdot s_j + (1-\alpha) \cdot \frac{1}{N} +$$ + +where $P$ is the stochastic transition matrix, which corresponds to the column normalised version of adjacency +matrix $A$, $\alpha \in [0,1]$, and $N$ is the number of publications in the citation network. The first addend +of the equation corresponds to the selection (with probability $\alpha$) of following a reference, while the +second one to the selection of randomly choosing any publication in the network. It should be noted that the +score of each publication relies of the score of publications citing it (the algorithm is executed iteratively +until all scores converge). As a result, PageRank differentiates citations based on the importance of citing +articles, thus alleviating the corresponding issue of the Citation Count. + +***Parameters:*** +$\alpha = 0.5, convergence\_error = 10^{-12}$ + +***Limitations:*** +OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator. +Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source. + +***Environment:*** PySpark + +***References:*** +* Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford InfoLab. + +***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker) + + +## RAM • popularity_alt + +***Short description:*** +RAM is essentially a modified Citation Count, where recent citations are considered of higher importance compared to older ones. +Hence, it better captures the popularity of publications. This "time-awareness" of citations +alleviates the bias of methods like Citation Count and PageRank against recently published articles, which have +not had "enough" time to gather as many citations. + +***Algorithmic details:*** +The RAM score of each paper $i$ is calculated as follows: + +$$ +s_i = \sum_j{R_{i,j}} +$$ + +where $R$ is the so-called Retained Adjacency Matrix (RAM) and $R_{i,j}=\gamma^{t_c-t_j}$ when publication $j$ cites publication +$i$, and $R_{i,j}=0$ otherwise. Parameter $\gamma \in (0,1)$, $t_c$ corresponds to the current year and $t_j$ corresponds to the +publication year of citing article $j$. + +***Parameters:*** +$\gamma = 0.6$ + +***Limitations:*** +OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator. +Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source. + +***Environment:*** PySpark + +***References:*** +* Ghosh, R., Kuo, T. T., Hsu, C. N., Lin, S. D., & Lerman, K. (2011, December). Time-aware ranking in dynamic citation networks. In 2011 ieee 11^{th} international conference on data mining workshops (pp. 373-380). IEEE. + +***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker) + + +## AttRank • popularity + +***Short description:*** +AttRank is a PageRank variant that alleviates its bias against recent publications (i.e., it is tailored to capture popularity). +AttRank achieves this by modifying PageRank's probability of randomly selecting a publication. Instead of using a uniform probability, +AttRank defines it based on a combination of the publication's age and the citations it received in recent years. + +***Algorithmic details:*** +The AttRank score +of each publication $i$ is calculated based on: + +$$ +s_i = \alpha \cdot \sum_{j} P_{i,j} \cdot s_j + + \beta \cdot Att(i)+ \gamma \cdot c \cdot e^{-\rho \cdot (t_c-t_i)} +$$ + +where $\alpha + \beta + \gamma =1$ and $\alpha,\beta,\gamma \in [0,1]$. $Att(i)$ denotes a recent attention-based score for publication $i$, +which reflects its share of citations in the $y$ most recent years, $t_i$ is the publication year of article $i$, $t_c$ denotes the current +year, and $c$ is a normalisation constant. Finally, $P$ is the stochastic transition matrix. + +***Parameters:*** +$\alpha = 0.2, \beta = 0.5, \gamma = 0.3, \rho = -0.16, convergence\_error = 10^-{12}$ + +Note that recent attention is based on the 3 most recent years (including current one). + +***Limitations:*** +OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator. +Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source. + +***Environment:*** PySpark + +***References:*** +* Kanellos, I., Vergoulis, T., Sacharidis, D., Dalamagas, T., & Vassiliou, Y. (2021, April). Ranking papers by their short-term scientific impact. In 2021 IEEE 37th International Conference on Data Engineering (ICDE) (pp. 1997-2002). IEEE. + +***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker) + + \ No newline at end of file diff --git a/versioned_docs/version-8.0.1/graph-production-workflow/indicators-ingestion/indicators-ingestion.md b/versioned_docs/version-8.0.1/graph-production-workflow/indicators-ingestion/indicators-ingestion.md new file mode 100644 index 0000000..285a1dc --- /dev/null +++ b/versioned_docs/version-8.0.1/graph-production-workflow/indicators-ingestion/indicators-ingestion.md @@ -0,0 +1,8 @@ +import DocCardList from '@theme/DocCardList'; + +# Indicators ingestion + +In this step, research products are enriched with Impact and Usage Statistics indicators. +The former are provided by [BIP!](https://bip.imsi.athenarc.gr/) while the latter are computed by OpenAIRE's [UsageCounts service](https://usagecounts.openaire.eu/). + ++ +
+ +## The 17 Sustainable Development Goals + +1. [**No Poverty**](https://sdgs.un.org/goals/goal1): End poverty in all its forms everywhere. +2. [**Zero Hunger**](https://sdgs.un.org/goals/goal2): End hunger, achieve food security and improved nutrition, and + promote sustainable agriculture. +3. [**Good Health and Well-being**](https://sdgs.un.org/goals/goal3): Ensure healthy lives and promote well-being + for all at all ages. +4. [**Quality Education**](https://sdgs.un.org/goals/goal4): Ensure inclusive and equitable quality education and + promote lifelong learning opportunities for all. +5. [**Gender Equality**](https://sdgs.un.org/goals/goal5): Achieve gender equality and empower all women and girls. +6. [**Clean Water and Sanitation**](https://sdgs.un.org/goals/goal6): Ensure availability and sustainable + management of water and sanitation for all. +7. [**Affordable and Clean Energy**](https://sdgs.un.org/goals/goal7): Ensure access to affordable, reliable, + sustainable, and modern energy for all. +8. [**Decent Work and Economic Growth**](https://sdgs.un.org/goals/goal8): Promote sustained, inclusive, and + sustainable economic growth, full and productive employment, and decent work for all. +9. [**Industry, Innovation, and Infrastructure**](https://sdgs.un.org/goals/goal9): Build resilient infrastructure, + promote inclusive and sustainable industrialization, and foster innovation. +10. [**Reduced Inequalities**](https://sdgs.un.org/goals/goal10): Reduce inequality within and among countries. +11. [**Sustainable Cities and Communities**](https://sdgs.un.org/goals/goal11): Make cities and human settlements + inclusive, safe, resilient, and sustainable. +12. [**Responsible Consumption and Production**](https://sdgs.un.org/goals/goal12): Ensure sustainable consumption + and production patterns. +13. [**Climate Action**](https://sdgs.un.org/goals/goal13): Take urgent action to combat climate change and its impacts. +14. [**Life Below Water**](https://sdgs.un.org/goals/goal14): Conserve and sustainably use the oceans, seas, and + marine resources for sustainable development. +15. [**Life on Land**](https://sdgs.un.org/goals/goal15): Protect, restore, and promote sustainable use of + terrestrial ecosystems, manage forests sustainably, combat desertification, and halt and reverse land + degradation and halt biodiversity loss. +16. [**Peace, Justice, and Strong Institutions**](https://sdgs.un.org/goals/goal16): Promote peaceful and inclusive + societies for sustainable development, provide access to justice for all, and build effective, accountable, and + inclusive institutions at all levels. +17. [**Partnerships for the Goals**](https://sdgs.un.org/goals/goal17): Strengthen the means of implementation and + revitalize the global partnership for sustainable development. + +## Application in Classification of Research Products + +The SDG taxonomy is used to classify research products based on their relevance to the overarching goals. This +classification helps in identifying the impact of research on sustainable development and aligning research efforts +with global priorities. Here’s how it can be applied: + +1. **Mapping Research Outputs**: Research outputs such as publications are be mapped to specific SDGs based on their + objectives, methodologies, and outcomes. +2. **Evaluating Impact**: The classification allows for the evaluation of the impact of research on achieving the + SDGs, helping to highlight contributions to specific goals. +3. **Funding and Collaboration**: Aligning research with SDGs can attract funding from organizations focused on + sustainable development and foster collaborations with other researchers and institutions working towards + similar goals. +4. **Policy and Decision-Making**: Policymakers can use the classification to identify research that supports + sustainable development policies and make informed decisions based on evidence from relevant research. + +By integrating the SDG taxonomy into the classification of research products, we can ensure that research efforts +are directed towards addressing the most pressing global challenges and contributing to a sustainable future. + +## Conclusion + +The Sustainable Development Goals provide a comprehensive framework for addressing global challenges. By applying +the SDG taxonomy to classify research products, we can better understand and enhance the impact of research on +sustainable development, ensuring that scientific advancements contribute to a more equitable and sustainable world. + +Check an example of how the SDG classification appears in the OpenAIRE data in the +[data model](../../data-model/entities/research-product#subjects) section. \ No newline at end of file diff --git a/versioned_docs/version-8.0.1/graph-production-workflow/indicators-ingestion/usage-counts.md b/versioned_docs/version-8.0.1/graph-production-workflow/indicators-ingestion/usage-counts.md new file mode 100644 index 0000000..b1a86bd --- /dev/null +++ b/versioned_docs/version-8.0.1/graph-production-workflow/indicators-ingestion/usage-counts.md @@ -0,0 +1,7 @@ +# Usage Statistics indicators + +Usage Statistics indicators for research products, like publications, datasets,etc., are an important complement to other (traditional and alternative) bibliometric indicators to provide a comprehensive and recent view of the impact of such resources but also about their authors, institutions and the platforms themselves. They are taking into account different levels of information: the usage of data sources, the usage of individual items in the context of their resource type and the usage of individual web resources or files. + +Usage Statistics Indicators are built by the OpenAIRE's UsageCounts service. The service collects usage data and consolidated usage statistics reports respectively, from its distributed network of data providers (repositories, e-journals, CRIS) by utilizing open standards and protocols and delivers reliable, consolidated and comparable usage metrics like counts of item downloads and metadata views conformant to COUNTER Code of Practice. + +You can find more information about the UsageCounts service [here](https://usagecounts.openaire.eu/). \ No newline at end of file diff --git a/versioned_docs/version-8.0.1/graph-production-workflow/merge-by-id.md b/versioned_docs/version-8.0.1/graph-production-workflow/merge-by-id.md new file mode 100644 index 0000000..9e994c7 --- /dev/null +++ b/versioned_docs/version-8.0.1/graph-production-workflow/merge-by-id.md @@ -0,0 +1,28 @@ +# Merge by id + +In the metadata aggregation system it is common to find the same record provided by +different datasources and, sometimes, even inside the same datasource (especially in +case of aggregators). As the harmonisation processes are performed per datasource +contents, the relative records are the output of different mapping implementations. +This approach has the advantage to be deeply customisable to catch datasource specific +aspects, but it leaves room for inconsistencies when evaluating the different mappings +across the various datasources. + +This phase is therefore responsible to compensate for such inconsistencies and performs +a global grouping of every record available in the graph: + +- entities are grouped by [`id`](../data-model/entities/research-product#id) +- relations are grouped by [`source`, `target`, `reltype`](../data-model/relationships/relationship-object) + +This ensures that the same record, possibly assigned to different types by different +mappings, appears only once in the graph and under a single typing. In case of clashing +identifiers, the properties are merged (including the provenance information), considering +the following precedence order for the research product typing: + +``` +publication > dataset > software > other +``` + +The same holds for relationships, as the same (e.g.) DOI-to-DOI citation relation could +be aggregated from multiple sources, this grouping phase would collapse all the different +duplicates onto a single relation that would however include all the individual provenances. diff --git a/versioned_docs/version-8.0.1/graph-production-workflow/stats.md b/versioned_docs/version-8.0.1/graph-production-workflow/stats.md new file mode 100644 index 0000000..9d0de86 --- /dev/null +++ b/versioned_docs/version-8.0.1/graph-production-workflow/stats.md @@ -0,0 +1,12 @@ +# Stats analysis + +The OpenAIRE Graph is also processed by a pipeline for extracting the statistics +and producing the charts for funders, research initiative, research infrastructures, +and policymakers available on [MONITOR](https://monitor.openaire.eu). + +Based on the information available on the graph, OpenAIRE provides a set of +indicators for monitoring the funding and research impact and the uptake of +Open Science publishing practices, such as Open Access publishing of publications +and datasets, availability of interlinks between research products, availability +of post-print versions in institutional or thematic Open Access repositories, etc. + diff --git a/versioned_docs/version-8.0.1/intro.md b/versioned_docs/version-8.0.1/intro.md new file mode 100644 index 0000000..5bbf407 --- /dev/null +++ b/versioned_docs/version-8.0.1/intro.md @@ -0,0 +1,34 @@ +--- +slug: / +id: intro +sidebar_position: 1 +--- + +# Overview + +The [OpenAIRE Graph](https://graph.openaire.eu/) (formerly known as the OpenAIRE Research Graph) is one of the largest open scholarly record collections worldwide, key in fostering Open Science and establishing its practices in the daily research activities. +Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back in the hands of the scientific community. + +Imagine a vast collection of research products all linked together, contextualised and openly available. For the past years OpenAIRE has been working to gather this valuable record. It is a massive collection of metadata and links between scientific products such as articles, datasets, software, and other research products, entities like organisations, funders, funding streams, projects, communities, and data sources. + +The OpenAIRE Graph aggregates millions of metadata records collected from trusted data sources, including: + +* Open Access journals registered in DOAJ +* Crossref +* Unpaywall +* ORCID +* Microsoft Academic Graph +* Datacite + +And repositories registered in OpenDOAR, re3data.org, FAIRSharing.org, and the EOSC Service Catalogue. Among these, prominent repositories such as: + +* UKPubMed +* ArXiv +* HAL +* Zenodo +* Figshare +* Dryad +* Repec + +After cleaning, deduplication, enrichment and full-text mining processes, the graph is analysed to produce statistics for the [OpenAIRE MONITOR](https://monitor.openaire.eu), the [Open Science Observatory](https://osobservatory.openaire.eu), made discoverable via the [OpenAIRE EXPLORE](https://explore.openaire.eu) and programmatically accessible via [OpenAIRE Public APIs](https://develop.openaire.eu). +Last but not least, the Graph data are openly available and can be used by third-parties to create added value services. diff --git a/versioned_docs/version-8.0.1/license.md b/versioned_docs/version-8.0.1/license.md new file mode 100644 index 0000000..b55436d --- /dev/null +++ b/versioned_docs/version-8.0.1/license.md @@ -0,0 +1,10 @@ +--- +sidebar_position: 11 +--- + +# License + +OpenAIRE Graph is available for download and re-use as CC-BY (due to some input sources whose license is CC-BY). Parts of the graphs can be re-used as CC-0. + +If you are using data from the OpenAIRE Graph, please find the appropriate way to acknowledge this [here](downloads/full-graph#how-to-acknowledge-this-work). + diff --git a/versioned_docs/version-8.0.1/publications.md b/versioned_docs/version-8.0.1/publications.md new file mode 100644 index 0000000..250af64 --- /dev/null +++ b/versioned_docs/version-8.0.1/publications.md @@ -0,0 +1,80 @@ +--- +sidebar_position: 7 +--- + +# Relevant publications + +Open Science services are open and transparent and survive thanks to your active support and to the visibility and reward they gather. If you use one of the [OpenAIRE Graph Datasets](https://doi.org/10.5281/zenodo.3516917) for your research, please provide a proper citation following the recommendation that you find on the dataset's Zenodo page or as provided below. + +:::note How to cite + +Manghi P., Atzori C., Bardi A., Baglioni M., Schirrwagen J., Dimitropoulos H., La Bruzzo S., Foufoulas I., Mannocci A., Horst M., Czerniak A., Iatropoulou K., Kokogiannaki A., De Bonis M., Artini M., Lempesis A., Ioannidis A., Manola N., Principe P., Vergoulis T., Chatzopoulos S., Pierrakos D. (2022). "OpenAIRE Research Graph Dataset", *Dataset*, Zenodo. [doi:10.5281/zenodo.3516917](https://doi.org/10.5281/zenodo.3516917) ([BibTex](/bibtex/OpenAIRE_Research_Graph_dump.bib)) +::: + +## Other relevant research products + +Please also consider citing the related research products listed below. + +### Aggregation system + +Manghi P., Artini M., Atzori C., Bardi A., Mannocci A., La Bruzzo S., Candela L., Castelli D., Pagano P. (2014). "The D-NET software toolkit: A framework for the realization, maintenance, and operation of aggregative infrastructures", Program: electronic library and information systems, Vol. 48 No. 4, pp. 322-354. [doi:10.1108/prog-08-2013-0045](http://doi.org/10.1108/prog-08-2013-0045) + +Atzori C., Bardi A., Manghi P., Mannocci A. (2017). "The OpenAIRE workflows for data management", In Italian Research Conference on Digital Libraries (IRCDL), pp. 95-107, Springer, Cham. [doi:10.1007/978-3-319-68130-6_8](https://doi.org/10.1007/978-3-319-68130-6_8) + +Artini M., Atzori C., Bardi A., La Bruzzo S., Manghi P., Mannocci A. (2016). "The D-NET software toolkit: dnet-basic-aggregator (Version 1.3.0)". *Software*, Zenodo. [doi:10.5281/zenodo.168356](https://doi.org/10.5281/zenodo.168356) + +Mannocci A., Manghi P. (2016). "DataQ: a data flow quality monitoring system for aggregative data infrastructures", International Conference on Theory and Practice of Digital Libraries (TPDL), pp. 357-369, Springer, Cham. [doi:10.1007/978-3-319-43997-6_28](https://doi.org/10.1007/978-3-319-43997-6_28) + +### Deduplication + +Vichos K., De Bonis M., Kanellos I., Chatzopoulos S., Atzori C., Manola N., Manghi P., Vergoulis T. (2022). "A preliminary assessment of the article deduplication algorithm used for the OpenAIRE Research Graph", In Italian Research Conference on Digital Libraries (IRCDL), Padua, Italy, CEUR-WS Proceedings. [http://ceur-ws.org/Vol-3160](http://ceur-ws.org/Vol-3160/) + +De Bonis M., Manghi P., Atzori C. (2022). "FDup: a framework for general-purpose and efficient entity deduplication of record collections", PeerJ Computer Science, 8, e1058. [https://peerj.com/articles/cs-1058](https://peerj.com/articles/cs-1058) + +Manghi P., Atzori C., De Bonis M., Bardi, A. (2020). "Entity deduplication in big data graphs for scholarly communication", Data Technologies and Applications. [doi:10.1108/dta-09-2019-0163](https://doi.org/10.1108/dta-09-2019-0163) + + +Atzori C., Manghi P., Bardi, A. (2018). "GDup: de-duplication of scholarly communication big graphs", In 2018 IEEE/ACM 5th International Conference on Big Data Computing Applications and Technologies (BDCAT) (pp. 142-151). IEEE. [doi:10.1109/bdcat.2018.00025](https://doi.org/10.1109/bdcat.2018.00025) + +Atzori C., & Paolo Manghi. (2017). "GDup: a big graph entity deduplication system" (Version 4.0.5), *Software*, Zenodo. [doi:/10.5281/zenodo.292980](https://doi.org/10.5281/zenodo.292980) + +Atzori C. (2016). "GDup: an Integrated, Scalable Big Graph Deduplication System.". [doi:10.5281/zenodo.1454879](https://doi.org/10.5281/zenodo.1454879) + +Manghi P., Mikulicic M., Atzori C. (2012). "De-duplication of aggregation authority files." International Journal of Metadata, Semantics and Ontologies 7.2: 114-130. [doi:10.1504/ijmso.2012.050014](https://doi.org/10.1504/ijmso.2012.050014) + +Manghi P., Mikulicic M. (2011). "PACE: A general-purpose tool for authority control", In Research Conference on Metadata and Semantic Research, pp. 80-92, Springer, Berlin, Heidelberg. [doi:10.1007/978-3-642-24731-6_8](https://doi.org/10.1007/978-3-642-24731-6_8) + +### Mining + +Giannakopoulos T., Foufoulas Y., Dimitropoulos H., Manola N. (2019). "Interactive Text Analysis and Information Extraction", In Italian Research Conference on Digital Libraries (IRCDL), vol 988. Springer, Cham. [doi:10.1007/978-3-030-11226-4_27](https://doi.org/10.1007/978-3-030-11226-4_27) + +Foufoulas Y., Stamatogiannakis L., Dimitropoulos H., Ioannidis Y. (2017). "High-Pass Text Filtering for Citation Matching", In International Conference on Theory and Practice of Digital Libraries (TPDL). Springer, Cham. [doi:10.1007/978-3-319-67008-9_28](https://doi.org/10.1007/978-3-319-67008-9_28) + +Chronis Y., Foufoulas Y., Nikolopoulos V., Papadopoulos A., Stamatogiannakis L., Svingos C., Ioannidis Y. E. (2016). "A Relational Approach to Complex Dataflows", In Workshop Proceedings of the EDBT/ICDT 2016 (MEDAL 2016) Joint Conference on CEUR-WS.org (ISSN 1613-0073) [http://ceur-ws.org/Vol-1558/paper45.pdf](http://ceur-ws.org/Vol-1558/paper45.pdf) + +Giannakopoulos T., Foufoulas I., Stamatogiannakis E., Dimitropoulos H., Manola N., Ioannidis Y. (2015). "Visual-Based Classification of Figures from Scientific Literature", In Proceedings of the 24th International Conference on World Wide Web (WWW), Association for Computing Machinery, New York, NY, USA, 1059–1060. [doi:10.1145/2740908.2742024](https://doi.org/10.1145/2740908.2742024) + +Giannakopoulos T., Foufoulas I., Stamatogiannakis E., Dimitropoulos H., Manola N., Ioannidis Y. (2014). "Discovering and Visualizing Interdisciplinary Content Classes in Scientific Publications". D-Lib Mag., Volume 20, Number 11/12. [doi:10.1045/november14-giannakopoulos](https://doi.org/10.1045/november14-giannakopoulos) + +Giannakopoulos T., Stamatogiannakis E., Foufoulas I., Dimitropoulos H., Manola N., Ioannidis Y. (2014). "Content Visualization of Scientific Corpora Using an Extensible Relational Database Implementation", International Conference on Theory and Practice of Digital Libraries (TPDL), Springer, Cham. [doi:10.1007/978-3-319-08425-1_10](https://doi.org/10.1007/978-3-319-08425-1_10) + +Giannakopoulos T., Dimitropoulos H., Metaxas O., Manola N., Ioannidis Y. (2013). "Supervised Content Visualization of Scientific Publications: A Case Study on the ArXiv Dataset", Intelligent Information Systems Symposium (IIS) vol 7912, Springer, Berlin, Heidelberg. [doi:10.1007/978-3-642-38634-3_23](https://doi.org/10.1007/978-3-642-38634-3_23) + +Tkaczyk, D., Szostek, P., Fedoryszak, M., Jan Dendek P., Bolikowski Ł. (2015). "CERMINE: automatic extraction of structured metadata from scientific literature", International Journal on Document Analysis and Recognition (IJDAR), 317–335. [doi:10.1007/s10032-015-0249-8](https://doi.org/10.1007/s10032-015-0249-8) + +Kobos M., Bolikowski Ł., Horst M., Manghi P., Μanola N., Schirrwagen J. (2014). "Information inference in scholarly communication infrastructures: the OpenAIREplus project experience", Procedia Computer Science 38, 92-99. [doi:10.1016/j.procs.2014.10.016](https://doi.org/10.1016/j.procs.2014.10.016) + +### Portals + +Baglioni Μ., Bardi Α., Kokogiannaki Α., Manghi P., Iatropoulou K., Principe P., Vieira A., Nielsen L. H., Dimitropoulos H., Foufoulas I., Manola N., Atzori C., La Bruzzo S., Lazzeri E., Artini M., De Bonis M., Dell’Amico A. (2019). "The OpenAIRE Research Community Dashboard: On Blending Scientific Workflows and Scientific Publishing", +International Conference on Theory and Practice of Digital Libraries (TPDL). Lecture Notes in Computer Science, vol 11799. Springer, Cham. [doi:10.1007/978-3-030-30760-8_5](https://doi.org/10.1007/978-3-030-30760-8_5) + +### Broker Service + +Manghi P., Atzori C., Bardi A., La Bruzzo S., Artini M. (2016). "Realizing a Scalable and History-Aware Literature Broker Service for OpenAIRE", Italian Research Conference on Digital Libraries (IRCDL), pp. 92-103, Springer, Cham. [doi:10.1007/978-3-319-56300-8_9](https://doi.org/10.1007/978-3-319-56300-8_9) + +Artini M., Atzori C., Bardi A., La Bruzzo S., Manghi P., Mannocci A. (2015). "The OpenAIRE literature broker service for institutional repositories", D-Lib Magazine, 21(11/12), 1. [doi:10.1045/november2015-artini](https://doi.org/10.1045/november2015-artini) + + + + diff --git a/versioned_docs/version-9.0.0/apis/_category_.json b/versioned_docs/version-9.0.0/apis/_category_.json new file mode 100644 index 0000000..36617e4 --- /dev/null +++ b/versioned_docs/version-9.0.0/apis/_category_.json @@ -0,0 +1,8 @@ +{ + "label": "Public API", + "position": 4, + "link": { + "type": "doc", + "id": "api" + } +} \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/apis/authentication.md b/versioned_docs/version-9.0.0/apis/authentication.md new file mode 100644 index 0000000..9051b5e --- /dev/null +++ b/versioned_docs/version-9.0.0/apis/authentication.md @@ -0,0 +1,308 @@ +# Guide for authenticated requests + +The OpenAIRE APIs can be accessed over HTTPS both by authenticated and non authenticated requests. +You can use authenticated requests to increase the rate limit of your requests (please refer [here](./terms#authentication--limits) for the current API rate limits). +There are 2 main modes that you can use to authenticate API requests: + +* [Personal access tokens](#personal-access-token) +* [Registered services](#registered-services) + + +In the following, we elaborate on these modes. + +## Personal access token + +To access the OpenAIRE APIs with better rate limits you can use your personal access token. To have access to the following functionalities you need to login to OpenAIRE. In case you are not already a member you will need to register first and provide your [Personal information](https://develop.openaire.eu/personal-info). + +:::info New! +The registration process has been updated! In order to visit the Personal Token and Registered Services functionalities you need to fill in the Personal Information form available [here](https://develop.openaire.eu/personal-info). This update will not affect the operation of your existing services. However, if you want to register a new service or access/modify an existing one, you will need to provide your personal information first. +::: + +### How to create your personal access token + +To create your personal access token go to [your personal access token page](https://develop.openaire.eu/personal-token) and copy it! + +:::info +Your access token is valid for an hour. +::: + +:::caution +Do not share your personal access token. Send your personal access token over HTTPS. +::: + +### How to use your personal access token + +To access the OpenAIRE APIs send your personal access token using the Authorization header. +```js +GET https://api.openaire.eu/{resourceServicePath} +Authorization: Bearer {ACCESS_TOKEN} +``` + +### An hour is not enough? What to do. + +To prolong your access to our APIs you can use a **refresh token** that allows you to programmatically issue a new access token. + +To get your refresh tokeng go to [your personal access token page](https://develop.openaire.eu/personal-token) and click the **"Get a refresh token"** button to get your refresh token. + +OpenAIRE refresh token expires after 1 month. + +In case you already have a refresh token a new one will be issued and the old one will no longer be valid. + +Please copy your refresh token and store it confidentially. You will not be able to retrieve it. Do not share your refresh token. Send your refresh token over HTTPS. + +Since the OpenAIRE refresh token expires after one month, when a client gets a refresh token, this token must be stored securely to keep it from being used by potential attackers. If a refresh token is leaked, it may be used to obtain new access tokens and access protected resources until a new one is issued or it expires. + +To get a personal access token using your refresh token you need to make the following request: +```js +GET https://services.openaire.eu/uoa-user-management/api/users/getAccessToken?refreshToken={your_refresh_token} +``` + +The response has the following format: +```json +{ + "access_token": "...", + "token_type": "Bearer", + "refresh_token": "...", + "expires_in": "...", + "scope": "...", + "id_token": "..." +} +``` + +## Registered services + +If you have a service (client) that you want to interact with the OpenAIRE APIs you need to register it. + +:::info +You can register up to 5 services. +::: + +We offer two ways of authenticting your service: the Basic Authentication and the Advanced Authentication. + +### Which one is for me? + +| | How | Client Credential Issuer | Authentication Method | +| --- | --- | --- | --- | +| **Basic** | Client ID & Client Secret | OpenAIRE AAI server | Client Secret (Basic) | +| **Advanced** | Private Key signed JWT | Service owner | Private Key JWT Client Authentication | + +For the **Basic Authentication** method the OpenAIRE AAI server generates a pair of _Client ID_ and _Client Secret_ credentials for your service upon its registration. The service sends the client id and client secret when authenticating to the OpenAIRE AAI Server to obtain the access token for the OpenAIRE APIs. The OpenAIRE AAI server checks whether the client id and client secret sent is valid. [Continue reading for the Basic Authentication](#basic-service-authentication-and-registration). + +For the **Advanced Authentication** method your service does not send a client secret but it uses a _self signed client assertion_ to authenticate to the OpenAIRE AAI server in order to obtain the access token for the OpenAIRE APIs. The client assertion is a JWT that must be signed with RSASSA using SHA-256 hash algorithm. The OpenAIRE AAI server validates the client assertion using the public key that you have provided upon the service registration. [Continue reading for the Advanced Authentication](#advanced-service-authentication-and-registration). + +:::info +The Advanced Authentication method allows the OpenAIRE AAI server to verify that the client authentication request at the token endpoint was signed by your service and not altered in any way. This is more computation intensive compared to the Basic Authentication but it ensures non-repudiation. On the other hand, the Basic Authentication is more lightweight and easy to deploy but it does not provide signature verification, and there is always a possibility of the Client ID/secret credentials being stolen. Note that tThe Advanced authentication method gives a higher level of security to the process as long as it is used correctly, i.e. when the signed JWT has a short duration. When the duration of the JWT is long, the process is no different from the basic one. +::: + +### Basic service authentication and registration + +To have access to the following functionalities you need to login to OpenAIRE. In case you are not already a member you will need to register first and provide your [Personal information](https://develop.openaire.eu/personal-info). + +:::info New! +The registration process has been updated! In order to visit the Personal Token and Registered Services functionalities you need to fill in the Personal Information form available [here](https://develop.openaire.eu/personal-info). This update will not affect the operation of your existing services. However, if you want to register a new service or access/modify an existing one, you will need to provide your personal information first. +::: + +For the **Basic Authentication** method the OpenAIRE AAI server generates a pair of _Client ID_ and _Client Secret_ for your service upon its registration. The service uses the client id and client secret to obtain the access token for the OpenAIRE APIs. The OpenAIRE AAI server checks whether the client id and client secret sent is valid. + +#### How to register your service + +To register your service you need to: + +1. Go to your [Registered Services](https://develop.openaire.eu/apis) page and click the **\+ New Service** button. +2. Provide the mandatory information for your service. +3. Select the **Basic** Security level. +4. Click the **Create** button. + +Once your service is created, the _Client ID_ and _Client Secret_ will appear on your screen. Click "OK" and your new service will be appear in the list of your [Registered Services](https://develop.openaire.eu/apis) page. + +#### How to make a request + +##### Step 1. Request for an access token + +To make an access token request use the _Client ID_ and _Client Secret_ of your service. +```js +curl -u {CLIENT_ID}:{CLIENT_SECRET} \ +-X POST 'https://aai.openaire.eu/oidc/token' \ +-d 'grant_type=client_credentials' +``` + +where **{CLIENT_ID}** and **{CLIENT_SECRET}** are the _Client ID_ and _Client Secret_ assigned to your service upon registration. + +The response is: +```json +{ + "access_token": ..., + "token_type": "Bearer", + "expires_in": ... +} +``` + +Store the access token confidentially on the service side. + +##### Step 2. Make a request + +To access the OpenAIRE APIs send the access token returned in **Step 1**. +```js +GET https://api.openaire.eu/{resourceServicePath} +Authorization: Bearer {ACCESS_TOKEN} +``` + +### Advanced service authentication and registration + + +To have access to the following functionalities you need to login to OpenAIRE. In case you are not already a member you will need to register first and provide your [Personal information](https://develop.openaire.eu/personal-info). + +:::info New! +The registration process has been updated! In order to visit the Personal Token and Registered Services functionalities you need to fill in the Personal Information form available [here](https://develop.openaire.eu/personal-info). This update will not affect the operation of your existing services. However, if you want to register a new service or access/modify an existing one, you will need to provide your personal information first. +::: + +For the **Advanced Authentication** method your service does not send a client secret but it uses a _self signed client assertion_ to obtain the access token for the OpenAIRE APIs. The client assertion is a JWT that must be signed with RSASSA using SHA-256 hash algorithm. The OpenAIRE AAI server validates the client assertion using the public key that you have provided upon the service registration. + +#### Prepare to register your service + +Before you register your service you need to prepare a pair of a private key and a public key on your side. + +:::info +We accept keys signed with RSASSA using SHA-256 hash algorithm. +::: + +To create the key pair you have the following options: + +* Use OpenAIRE authorization server built in tool. You can access the service here: [https://aai.openaire.eu/oidc/generate-oidc-keystore](https://aai.openaire.eu/oidc/generate-oidc-keystore). + The response is your **Public and Private Keypair** and has the following format: + ```json + { + "p" : ..., + "kty" : "RSA", + "q" : ..., + "d" : ..., + "e" : "AQAB", + "kid" : ..., + "qi" : ..., + "dp" : ..., + "alg" : "RS256", + "dq" : ..., + "n" : .... + } + ``` + + Use the public key parameters (kty, e, kid, alg, n) to create your **Public Key** in the following format: + ```json + { + "kty": "RSA", + "e": "AQAB", + "kid": ..., + "alg": "RS256", + "n": ... + } + ``` + +:::info +Store both the **Public and Private keypair** and the **Public key**. You will need them to register your service. +::: + +:::caution +Store the **Public and Private keypair** confidentially on the service side. +::: + +* Use openssl and then convert the keys to jwk format using PEM to JWK scripts, such as [https://github.com/danedmunds/pem-to-jwk](https://github.com/danedmunds/pem-to-jwk). Alternatively, the client application can read the key pair in PEM format and then convert them, using JWK libraries. Use the public key parameters (kty, e, kid, alg, n) to the service registration. + +:::info +You can also provide a public key in JWK format that can be accessed using a link. +::: + +#### How to register your service + +To register your service you need to: + +1. Go to your [Registered Services](https://develop.openaire.eu/apis) page and click the **\+ New Service** button. +2. Provide the mandatory information for your service. +3. Select the **Advanced** Security level. +4. Use the public key parameters (kty, e, kid, alg, n) you previously produced to declare your **"Public Key"** **"By value"** in the following format: + ```json + { + "kty": "RSA", + "e": "AQAB", + "kid": ..., + "alg": "RS256", + "n": ... + } + ``` + **\- OR -** + + If your service has a public key in JWK format that can be accessed using a link, you can set **“Public Key”** to **“By URL”**. + +5. Click the **Create** button. + +Once your service is created it will appear in the list of your [Registered Services](https://develop.openaire.eu/apis) page, with the **Service Id** that was automatically assigned to it by the AAI OpenAIRE service. + +#### How to make a request + +##### Step 1. Create and sign a JWT + +Your service must create and sign a JWT and include it in the request to token endpoint as described in the [OpenID Connect Core 1.0, 9. Client Authentication](https://openid.net/specs/openid-connect-core-1_0.html#ClientAuthentication). + +To create a JWT you can use [https://mkjose.org/](https://mkjose.org/). To do so you need to create a **payload** that should contain the following claims: + +```json +{ + "iss": "{SERVICE_ID}", + "sub": "{SERVICE_ID}", + "aud": "https://aai.openaire.eu/oidc/token", + "jti": "{RANDOM_STRING}", + "exp": {EXPIRATION_TIME_OF_SIGNED_JWT} +} +``` + +* **iss**, _(required)_ the “issuer” claim identifies the principal that issued the JWT. The value is the **Service Id** that was created when you registered your service. +* **sub**, _(required)_ the “subject” claim identifies the principal that is the subject of the JWT. The value is the **Service Id** that was created when you registered your service. +* aud, _(required)_ the “audience” claim identifies the recipients that the JWT is intended for. The value is **https://aai.openaire.eu/oidc/token**>. +* **jti**, _(required)_ The “JWT ID” claim provides a unique identifier for the JWT. The value is a random string. +* **exp**, _(required)_ the “expiration time” claim identifies the expiration time on or after which the JWT **MUST NOT** be accepted for processing. The value is a timestamp in **epoch format**. + +Fill in the payload in the form available at [https://mkjose.org/](https://mkjose.org/), select the Signing Algorithm to be **RS256 using SHA-256** and paste the **Public and Private Keypair** previously created. + +To check your JWT you can go to [https://jwt.io/](https://jwt.io/). The **header** should contain the following claims: +```json +{ + "alg": "RS256", + "kid": ... +} +``` + +where **kid** is the one of your **Public and Private Keypair** you used to sign the JWT in **Step 1**. + +:::caution +Store the signed key confidentially on the service side. You will need it in Step 2. +::: + +##### Step 2. Request for an access token + +To make an access token request use the _signed JWT_ that you created in **Step 1**. The OpenAIRE AAI server will check if the signed JWT is valid using the public key that you declared in the **"How to register your service"** process. +```js + curl -k -X POST "https://aai.openaire.eu/oidc/token" \ + -d "grant_type=client_credentials" \ + -d "client_assertion_type=urn:ietf:params:oauth:client-assertion-type:jwt-bearer" \ + -d "client_assertion={signedJWT}" +``` +where **{signedJWT}** is the signed JWT created in **Step 1**. + +The response is: +```json +{ + "access_token": {ACCESS_TOKEN} + "token_type":"Bearer", + "expires_in": ..., + "scope":"openid" +} +``` + +Store the access token confidentially on the service side. + +##### Step 3. Make a request + +To access the OpenAIRE APIs send the access token returned in **Step 2**. +```js + GET https://test.openaire.eu/{resourceServicePath} + Authorization: Bearer {ACCESS_TOKEN} +``` \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/apis/broker-api.md b/versioned_docs/version-9.0.0/apis/broker-api.md new file mode 100644 index 0000000..20c9589 --- /dev/null +++ b/versioned_docs/version-9.0.0/apis/broker-api.md @@ -0,0 +1,50 @@ +# Broker API + + +## Introduction + +The Broker Service is available to use via the OpenAIRE Content Provider Dashboard. Thanks to the Broker, repositories, publishers or aggregators can exchange metadata and enrich their local metadata collection by subscribing to notifications of different types. The Broker is able to notify providers when the OpenAIRE Graph contains information that is not available in the original collection of the data source. In particular, the data source manager can subscribe via the [Content Provider Dashboard](https://provide.openaire.eu) and be notified about: + +* Additional PIDs of its publications (e.g. DOIs) +* Links to projects +* ORCID that can be associated to an author of datasource publications +* Links to Open Access versions +* Additional classification subjects (e.g. subjects from standard schemes like ACM, JEL and DDC) +* Abstracts identified in duplicate publications +* Missing publication dates + +All Repository managers approaching the Content Provider Dashboard will be offered the possibility to preview a set of enrichments relative to their repository that OpenAIRE can derive from the Graph. More specifically, enrichments will be organized into categories named topics and representing the different types of enrichments OpenAIRE can build. For each topic the preview consists of 100 “enrichment events”, a subset of all the possible enrichments pertinent to a given repository in the OpenAIRE Graph, that the user can explore by applying filters on different criteria and the total number of events that can be potentially built is highlighted in the UI. Repository managers can create subscriptions for specific topics and that include the filtering criteria they used to analyze the enrichments preview, or can subscribe to all the available topics with no restrictions at once. Once the repository manager creates a subscription, the algorithm analyzing the OpenAIRE Graph will produce the full set of enrichments for the manager's repository, possibly far beyond the 100 enrichments available in the preview. The enrichments will be made available as notifications in a dedicated section in the Content Provider Dashboard UI to be further checked as well as through the broker service API for programmatic access. Notifications will be sent to subscribers every time the OpenAIRE Graph will be updated and analyzed to derive the enrichments. + +## Usage Example + +The following commands indicate how the broker API documented at [api.openaire.eu/broker](https://api.openaire.eu/broker/swagger-ui/index.html) can be used to access the set of enrichments: + +1. Get the list of subscriptions for a given subscriber, e.g. + + ```js + curl -X GET --header 'Accept: application/json' 'https://api.openaire.eu/broker/subscriptions?email=[subscriber_email]' + ``` + +2. Extract the subscription ID and use it to access the 1st page of enrichment notification records + + ```js + curl -X GET --header 'Accept: application/json' 'https://api.openaire.eu/broker/scroll/notifications/bySubscriptionId/[sub-1234]' + ``` + +3. Extract the scroll ID from the response to request subsequent pages + + ```js + curl -X GET --header 'Accept: application/json' 'https://api.openaire.eu/broker/scroll/notifications/[scroll_id]' + ``` + +To simplify accessing the enrichment notification records, please check the OpenAIRE broker cmdline client available on [GitHub](https://github.com/openaire/broker-cmdline-client). + +## Terms of Use and SLA + +APIs are free-to-use (no sign-up needed) by any third-party service + +**Metadata license is CC-BY**: the metadata records retuned by the service can be freely re-used by commercial and non-commercial partners under CC-BY license, hence as long as OpenAIRE is acknowledged as data source. + +**Quality of Service**: all API services are running in production 24/7 within the OpenAIRE infrastructure premises deployed at the [data center](http://icm.edu.pl/en/centre-of-technology/) facilities of the [Interdisciplinary Centre for Mathematical and Computational Modelling](http://icm.edu.pl/en/) (ICM). + +**APIs rate limits**: please check [here](./authentication). \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/apis/dspace-eprints-api.md b/versioned_docs/version-9.0.0/apis/dspace-eprints-api.md new file mode 100644 index 0000000..93f15f8 --- /dev/null +++ b/versioned_docs/version-9.0.0/apis/dspace-eprints-api.md @@ -0,0 +1,61 @@ +# Dspace & EPrints API + + +The APIs offer custom access to metadata about projects funded by a selection of international funders for the **DSpace** and **EPrints** platforms. The currently supported funders and relative codes are: + +* **FP7:** The 7th Framework Programme funded by the European Commission +* **H2020:** Horizon2020 Programme funded by the European Commission +* **HE:** Horizon Europe Programme funded by the European Commission +* **AKA:** Academy of Finland +* **ARC:** Australian Research Council +* **FWF:** Austrian Science Foundation +* **CHISTERA:** CHIST-ERA +* **CIHR:** Canadian Institutes of Health Research +* **HRZZ:** Croatian Science Foundation +* **EEA:** European Environemnt Agency +* **ANR:** French National Research Agency +* **FCT:** The funding programme of Fundação para a Ciência e a Tecnologia, the national funding agency of Portugal +* **MESTD:** The Ministry of Education, Science and Technological Development of Serbia +* **MZOS:** Ministry of Science, Education and Sports of the Republic of Croatia +* **NHMRC:** Australian National Health and Medical Research Council +* **NIH:** US National Institutes of Health +* **NSF:** US National Science Foundation +* **NSERC:** Natural Sciences and Engineering Research Council of Canada +* **NWO:** The Netherlands Organisation for Scientific Research +* **SFI:** Science Foundation Ireland +* **SSHRC:** Social Sciences and Humanities Research Council +* **SNSF:** Swiss National Science Foundation +* **TARA:** Tara Expeditions Foundation +* **TUBITAK:** The National funder of Turkey +* **UKRI:** United Kingdom Research and Innovation +* **WT:** Wellcome Trust + +## DSpace/ePrints + +DSpace endpoint: http://api.openaire.eu/projects/dspace/$fundingStream/ALL/ALL + +ePrints endpoint: http://api.openaire.eu/projects/eprints/$fundingStream/ALL/ALL + +The URLs embed the parameters needed to collect projects funded by specific funding stream, where the pattern is FundingStream/FundingSubStream/FundingSubSubStream. +Additional parameters can be concatenated to the URL to refine the results by date (date must be in the form `YYYY-MM-DD`): + +* startFrom +* startUntil +* endFrom +* endUntil + +## Examples + +Get Wellcome Trust projects for EPrints: [http://api.openaire.eu/projects/eprints/WT/ALL/ALL](http://api.openaire.eu/projects/eprints/WT/ALL/ALL) +Get EC-FP7 projects of the specific programme “SP2-IDEAS” for EPrints: [http://api.openaire.eu/projects/eprints/FP7/SP2/ALL](http://api.openaire.eu/projects/eprints/FP7/SP2/ALL) +Get EC-FP7 projects for DSpace that started after the given date: [http://api.openaire.eu/projects/dspace/FP7/ALL/ALL?startFrom=2011-01-01](http://api.openaire.eu/projects/dspace/FP7/ALL/ALL?startFrom=2011-01-01). + +## Terms of Use and SLA + +APIs are free-to-use (no sign-up needed) by any third-party service. + +**Metadata license is CC-BY**: the metadata records retuned by the service can be freely re-used by commercial and non-commercial partners under CC-BY license, hence as long as OpenAIRE is acknowledged as data source. + +**Quality of Service**: all API services are running in production 24/7 within the OpenAIRE infrastructure premises deployed at the [data center](http://icm.edu.pl/en/centre-of-technology/) facilities of the [Interdisciplinary Centre for Mathematical and Computational Modelling](http://icm.edu.pl/en/) (ICM). + +**APIs rate limits**: please check [here](./authentication). \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/apis/graph-api/getting-a-single-entity.md b/versioned_docs/version-9.0.0/apis/graph-api/getting-a-single-entity.md new file mode 100644 index 0000000..432bb68 --- /dev/null +++ b/versioned_docs/version-9.0.0/apis/graph-api/getting-a-single-entity.md @@ -0,0 +1,50 @@ +# Getting a single entity + +This is a guide on how to retrieve detailed information on a single entity using the OpenAIRE Graph API. + +## Endpoints +Currently, the Graph API supports the following entity types: + +- Research products - endpoint: `GET /researchProducts/{id}` +- Organizations - endpoint: `GET /organizations/{id}` +- Data sources - endpoint: `GET /dataSources/{id}` +- Projects - endpoint: `GET /projects/{id}` + +You can retrieve the data of a single entity by providing the entity's OpenAIRE identifier (id) in the corresponding endpoint. +The OpenAIRE id is the primary key of an entity in the OpenAIRE Graph. + +:::note +Note that if you want to retrieve multiple entities based on their OpenAIRE ids, you can use the [search endpoints and filter](./searching-entities/filtering-search-results.md#or-operator) by the `id` field using `OR`. +::: + +## Response +The response of the Graph API is a [Research product](../../data-model/entities/research-product.md), [Organization](../../data-model/entities/organization.md), [Data Source](../../data-model/entities/data-source.md), or [Project](../../data-model/entities/project.md), depending on the endpoint used. + +## Example + +In order to retrieve the research product with OpenAIRE id: `doi_dedup___::2b3cb7130c506d1c3a05e9160b2c4108`, +you have to perform the following API call: + +[https://api-beta.openaire.eu/graph/researchProducts/doi_dedup___::a55b42c0d32a4a24cf99e621623d110e](https://api-beta.openaire.eu/graph/researchProducts/doi_dedup___::a55b42c0d32a4a24cf99e621623d110e) + +This will return all the data of the research product with the provided identifier: + +```json +{ + id: "doi_dedup___::a55b42c0d32a4a24cf99e621623d110e", + mainTitle: "OpenAIRE Graph Dataset", + description: [ + "The OpenAIRE Graph is exported as several dataseta, so you can download the parts you are interested into. publication_[part].tar: metadata records about research literature (includes types of publications listed here)+ +
+ +The figure above, presents the graph's data model. +Its main entities are described in brief below: + +* [Research products](./entities/research-product) represent the outcomes (or products) of research activities. +* [Data sources](./entities/data-source) are the sources from which the metadata of graph objects are collected. +* [Organizations](./entities/organization) correspond to companies or research institutions involved in projects, +responsible for operating data sources or consisting the affiliations of Product creators. +* [Projects](./entities/project) are research project grants funded by a Funding Stream of a Funder. +* [Communities](./entities/community) are groups of people with a common research intent (e.g. research infrastructures, university alliances). +* Persons correspond to individual researchers who are involved in the design, creation or maintenance of research products. Currently, this is a non-materialized entity type in the Graph, which means that the respective metadata (and relationships) are encapsulated in the author field of the respective research products. + +:::note Further reading + +A detailed report on the OpenAIRE Graph Data Model can be found on [Zenodo](https://zenodo.org/record/2643199). +::: + diff --git a/versioned_docs/version-9.0.0/data-model/entities/_category_.json b/versioned_docs/version-9.0.0/data-model/entities/_category_.json new file mode 100644 index 0000000..8161451 --- /dev/null +++ b/versioned_docs/version-9.0.0/data-model/entities/_category_.json @@ -0,0 +1,8 @@ +{ + "label": "Entities", + "position": 1, + "link": { + "type": "generated-index", + "description": "The main entities of the OpenAIRE Graph are listed below." + } +} \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/data-model/entities/community.md b/versioned_docs/version-9.0.0/data-model/entities/community.md new file mode 100644 index 0000000..bf057cf --- /dev/null +++ b/versioned_docs/version-9.0.0/data-model/entities/community.md @@ -0,0 +1,82 @@ +--- +sidebar_position: 6 +--- + +# Communities + +Research communities and research initiatives are intended as groups of people with a common research intent and can be of two types: research initiatives or research communities: + +* Research initiatives are intended to capture a view of the information space that is "research impact"-oriented, i.e. all products generated due to my research initiative; +* Research communities the latter “research activity” oriented, i.e. all products that may be of interest or related to my research initiative. + +For example, the organizations supporting a research infrastructure fall in the first category, while the researchers involved in a discipline fall in the second. + +## The `Community` object + +### id +_Type: String • Cardinality: ONE_ + +The OpenAIRE id for the community/research infrastructure, created according to the [OpenAIRE entity identifier and PID mapping policy](../pids-and-identifiers). + +```json + "id": "context_____::5b7f9fa40bdc12072249204cedfa7808" +``` + +### acronym +_Type: String • Cardinality: ONE_ + +The acronym of the community. + +```json +"acronym": "covid-19" +``` + +### description +_Type: String • Cardinality: ONE_ + +Description of the research community/research infrastructure + +```json +"description": "This portal provides access to publications, research data, projects and software that may be relevant to the Corona Virus Disease (COVID-19). The OpenAIRE COVID-19 Gateway aggregates COVID-19 related records, links them and provides a single access point for discovery and navigation. We tag content from the OpenAIRE Graph (10,000+ data sources) and additional sources. All COVID-19 related research results are linked to people, organizations and projects, providing a contextualized navigation." +``` + +### name +_Type: String • Cardinality: ONE_ + +The long name of the community. + +```json +"name": "Corona Virus Disease" +``` + +### subject +_Type: String • Cardinality: MANY_ + +The list of the subjects associated to the research community (only appies to research communities). + +```json +"subject": [ + "COVID19", + "SARS-CoV", + "HCoV-19", + ... +] +``` + +### type +_Type: String • Cardinality: ONE_ + +The type of the community; one of `{ Research Community, Research infrastructure }`. + +```json +"type": "Research Community" +``` + +### zenodoCommunity +_Type: String • Cardinality: ONE_ + +The URL of the Zenodo community associated to the Research community/Research infrastructure. + +```json +"zenodoCommunity": "https://zenodo.org/communities/covid-19" +``` diff --git a/versioned_docs/version-9.0.0/data-model/entities/data-source.md b/versioned_docs/version-9.0.0/data-model/entities/data-source.md new file mode 100644 index 0000000..e01ea7c --- /dev/null +++ b/versioned_docs/version-9.0.0/data-model/entities/data-source.md @@ -0,0 +1,294 @@ +--- +sidebar_position: 2 +--- + +# Data sources + +OpenAIRE entity instances are created out of data collected from various data sources of different kinds, such as publication repositories, research data archives, CRIS systems, funder databases, etc. Data sources export information packages (e.g., XML records, HTTP responses, RDF data, JSON) that may contain information on one or more of such entities and possibly relationships between them. + +For example, a metadata record about a project carries information for the creation of a Project entity and its participants (as Organization entities). It is important, once each piece of information is extracted from such packages and inserted into the OpenAIRE information space as an entity, for such pieces to keep provenance information relative to the originating data source. This is to give visibility to the data source, but also to enable the reconstruction of the very same piece of information if problems arise. + +--- + +## The `DataSource` object + +### id +_Type: String • Cardinality: ONE_ + +The OpenAIRE id of the data source, created according to the [OpenAIRE entity identifier and PID mapping policy](../pids-and-identifiers). + +```json +"id": "issn___print::22c514d022b199c346e7f29ca06efc95" +``` + +### originalId +_Type: String • Cardinality: MANY_ + +The list of original identifiers associated to the datasource. + +```json +"originalId": [ + "issn___print::2451-8271", + ... +] +``` + +### pid + +_Type: [ControlledField](other#controlledfield) • Cardinality: MANY_ + +The persistent identifiers for the datasource. + +```json +"pid": [ + { + "scheme": "DOI", + "value": "10.5281/zenodo.4707307" + }, + ... +] +``` + +### type +_Type: [ControlledField](other#controlledfield) • Cardinality: ONE_ + +The datasource type; see the vocabulary [dnet:datasource_typologies](https://api.openaire.eu/vocabularies/dnet:datasource_typologies). + +```json +"type": { + "scheme": "pubsrepository::journal", + "value": "Journal" +} +``` + +### openaireCompatibility +_Type: String • Cardinality: ONE_ + +The OpenAIRE compatibility of the ingested research products, indicates which guidelines they are compliant according to the vocabulary [dnet:datasourceCompatibilityLevel](https://api.openaire.eu/vocabularies/dnet:datasourceCompatibilityLevel). + +```json +"openaireCompatibility": "collected from a compatible aggregator" +``` + +### officialName +_Type: String • Cardinality: ONE_ + +The official name of the datasource. + +```json +"officialBame": "Recent Patents and Topics on Medical Imaging" +``` + +### englishName +_Type: String • Cardinality: ONE_ + +The English name of the datasource. + +```json +"englishName": "Recent Patents and Topics on Medical Imaging" +``` + +### websiteUrl +_Type: String • Cardinality: ONE_ + +The URL of the website of the datasource. + +```json +"websiteUrl": "http://dspace.unict.it/" +``` + +### logoUrl +_Type: String • Cardinality: ONE_ + +The URL of the logo for the datasource. + +```json +"logoUrl": "https://impactum-journals.uc.pt/public/journals/26/pageHeaderLogoImage_en_US.png" +``` + +### dateOfValidation +_Type: String • Cardinality: ONE_ + +The date of validation against the OpenAIRE guidelines for the datasource records. + +```json +"dateOfValidation": "2016-10-10" +``` + +### description +_Type: String • Cardinality: ONE_ + +The description for the datasource. + +```json +"description": "Recent Patents on Medical Imaging publishes review and research articles, and guest edited single-topic issues on recent patents in the field of medical imaging. It provides an important and reliable source of current information on developments in the field. The journal is essential reading for all researchers involved in Medical Imaging." +``` + +### subjects +_Type: String • Cardinality: MANY_ + +List of subjects associated to the datasource + +```json +"subjects": [ + "Medicine", + "Imaging", + ... +] +``` + +### languages +_Type: String • Cardinality: MANY_ + +The languages present in the data source's content, as defined by OpenDOAR. + +```json +"languages": [ + "eng", + ... +] +``` + +### contentTypes +_Type: String • Cardinality: MANY_ + +Types of content in the data source, as defined by OpenDOAR + +```json +"contentTypes": [ + "Journal articles", + ... +] +``` + +### releaseStartDate +_Type: String • Cardinality: ONE_ + +Releasing date of the data source, as defined by re3data.org. + +```json +"releaseStartDate": "2010-07-24" +``` + +### releaseEndDate +_Type: String • Cardinality: ONE_ + +Date when the data source went offline or stopped ingesting new research data. As defined by re3data.org + +```json +"releaseEndDate": "2016-03-28" +``` + +### accessRights +_Type: String • Cardinality: ONE_ + +Type of access to the data source, as defined by re3data.org. Possible values: `{ open, restricted, closed }`. + +```json +"accessRights": "open" +``` + +### uploadRights +_Type: String • Cardinality: ONE_ + +Type of data upload, as defined by re3data.org; one of `{ open, restricted, closed }`. + +```json +"uploadRights": "closed" +``` + +### databaseAccessRestriction +_Type: String • Cardinality: ONE_ + +Access restrictions to the research data repository. Allowed values are: `{ feeRequired, registration, other }`. + +This field only applies for re3data data source; see [re3data schema specification](https://gfzpublic.gfz-potsdam.de/rest/items/item_758898_6/component/file_775891/content) for more details. + +```json +"databaseAccessRestriction": "registration" +``` + +### dataUploadRestriction +_Type: String • Cardinality: ONE_ + +Upload restrictions applied by the datasource, as defined by re3data.org. One of `{ feeRequired, registration, other }`. + +This field only applies for re3data data source; see [re3data schema specification](https://gfzpublic.gfz-potsdam.de/rest/items/item_758898_6/component/file_775891/content) for more details. + +```json +"dataUploadRestriction": "feeRequired registration" +``` + +### versioning +_Type: Boolean • Cardinality: ONE_ + +Whether the research data repository supports versioning: +`yes` if the data source supports versioning, `no` otherwise. + +This field only applies for re3data data source; see [re3data schema specification](https://gfzpublic.gfz-potsdam.de/rest/items/item_758898_6/component/file_775891/content) for more details. + +```json +"versioning": true +``` + +### citationGuidelineUrl +_Type: String • Cardinality: ONE_ + +The URL of the data source providing information on how to cite its items. The DataCite citation format is recommended (http://www.datacite.org/whycitedata). + +This field only applies for re3data data source; see [re3data schema specification](https://gfzpublic.gfz-potsdam.de/rest/items/item_758898_6/component/file_775891/content) for more details. + +```json +"citationGuidelineUrl": "https://physionet.org/about/#citation" +``` + +### pidSystems +_Type: String • Cardinality: ONE_ + +The persistent identifier system that is used by the data source. As defined by re3data.org. + +```json +"pidSystems": "hdl" +``` + +### certificates +_Type: String • Cardinality: ONE_ + +The certificate, seal or standard the data source complies with. As defined by re3data.org. + +```json +"certificates": "WDS" +``` + +### policies +_Type: String • Cardinality: MANY_ + +Policies of the data source, as defined in OpenDOAR. + +### journal +_Type: [Container](other#container) • Cardinality: ONE_ + +Information about the journal, if this data source is of type Journal. + +```json +"container": { + "edition": "", + "iss": "5", + "issnLinking": "", + "issnOnline": "1873-7625", + "issnPrinted":"2451-8271", + "name": "Recent Patents and Topics on Imaging", + "sp": "12", + "ep": "22", + "vol": "50" +} +``` + +### missionStatementUrl +_Type: String • Cardinality: ONE_ + +The URL of a mission statement describing the designated community of the data source. As defined by re3data.org + +```json +"missionStatementUrl": "https://www.sigma2.no/content/nird-research-data-archive" +``` \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/data-model/entities/organization.md b/versioned_docs/version-9.0.0/data-model/entities/organization.md new file mode 100644 index 0000000..c0c8f6a --- /dev/null +++ b/versioned_docs/version-9.0.0/data-model/entities/organization.md @@ -0,0 +1,93 @@ +--- +sidebar_position: 3 +--- + +# Organizations + +Organizations include companies, research centers or institutions involved as project partners or as responsible of operating data sources. Information about organizations are collected from funder databases like CORDA, registries of data sources like OpenDOAR and re3Data, and CRIS systems, as being related to projects or data sources. + + +--- + +## The `Organization` object + +### id +_Type: String • Cardinality: ONE_ + +The OpenAIRE id for the organization, created according to the [OpenAIRE entity identifier and PID mapping policy](../pids-and-identifiers). + +```json +"id": "openorgs____::b84450f9864182c67b8611b5593f4250" +``` + +### legalShortName +_Type: String • Cardinality: ONE_ + +The legal name in short form of the organization. + +```json +"legalShortName": "ARC" +``` + +### legalName +_Type: String • Cardinality: ONE_ + +The legal name of the organization. + +```json +"legalName": "Athena Research and Innovation Center In Information Communication & Knowledge Technologies" +``` + +### alternativeNames +_Type: String • Cardinality: MANY_ + +Alternative names that identify the organization. + +```json +"alternativeNames": [ + "Athena Research and Innovation Center In Information Communication & Knowledge Technologies", + "Athena RIC", + "ARC", + ... +] +``` + +### websiteUrl +_Type: String • Cardinality: ONE_ + +The websiteurl of the organization. + +```json +"websiteUrl": "https://www.athena-innovation.gr/el/announce/pressreleases.html" +``` + +### country +_Type: [Country](other#country) • Cardinality: ONE_ + +The country where the organization is located. + +```json +"country":{ + "code": "GR", + "label": "Greece" +} +``` + +### pid +_Type: [OrganizationPid](other#organizationpid) • Cardinality: MANY_ + +The list of persistent identifiers for the organization. + +```json +"pid": [ + { + "scheme": "ISNI", + "value": "0000 0004 0393 5688" + }, + { + "scheme": "GRID", + "value": "grid.19843.37" + }, + ... +] +``` \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/data-model/entities/other.md b/versioned_docs/version-9.0.0/data-model/entities/other.md new file mode 100644 index 0000000..f0ed18e --- /dev/null +++ b/versioned_docs/version-9.0.0/data-model/entities/other.md @@ -0,0 +1,831 @@ +--- +sidebar_position: 7 +--- + +# Other component objects + +Here, we describe other component objects that are used as part of the main graph entities. + +## AccessRight + +Subclass of [BestAccessRight](#bestaccessright), indicates information about rights held in and over the resource and the open Access Route. + +### openAccessRoute +_Type: One of `{ gold, green, hybrid, bronze }` • Cardinality: ONE_ + +Indicates the OpenAccess status. Values are set according to the [Unpaywall methodology](https://support.unpaywall.org/support/solutions/articles/44001777288-what-do-the-types-of-oa-status-green-gold-hybrid-and-bronze-mean-). + +```json +"openAccessRoute": "gold" +``` + +## AlternateIdentifier +Type used to represent the information associated to persistent identifiers associated to the research product that have not been forged by an authority for that pid type. For example we collect metadata from an institutional repository that provides as identifier for the research product also the DOI. + +### scheme +_Type: String • Cardinality: ONE_ + +Vocabulary reference. + +```json +"scheme": "doi" +``` + +### value +_Type: String • Cardinality: ONE_ + +Value from the given scheme/vocabulary. + +```json +"value": "10.1016/j.respol.2021.104226" +``` + +## APC +Indicates the money spent to make a book or article available in Open Access. Sources for this information includes the OpenAPC initiative. + +### currency +_Type: String • Cardinality: ONE_ + +The system of money in which the amount is expressed (Euro, USD, etc). + +```json +"currency": "EU" +``` + +### amount +_Type: String • Cardinality: ONE_ + +The quantity of money. + +```json +"amount": "1000" +``` + +## Author + +Represents the research product author. + +### fullName +_Type: String • Cardinality: ONE_ + +Author's full name. + +```json +"fullName": "Turunen, Heidi" +``` + +### name +_Type: String • Cardinality: ONE_ + +Author's given name. + +```json +"name": "Heidi" +``` + +### surname +_Type: String • Cardinality: ONE_ + +Author's family name. + +```json +"surname": "Turunen" +``` + +### rank +_Type: String • Cardinality: ONE_ + +Author's order in the list of authors for the given research product. + +```json +"rank": 1 +``` + +### pid +_Type: [AuthorPid](#authorpid) • Cardinality: ONE_ + +Persistent identifier associated with this author. + +```json +"pid": { + "id": { + "scheme": "orcid", + "value": "0000-0001-7169-1177" + }, + "provenance": { + "provenance": "Harvested", + "trust": "0.9" + } +} +``` + +## AuthorPid + +The author's persistent identifier. + +### id +_Type: [AuthorPidSchemaValue](#authorpidschemavalue) • Cardinality: ONE_ + +```json +"id": { + "scheme": "orcid", + "value": "0000-0001-7169-1177" +} +``` + +### provenance +_Type: [Provenance](#provenance-2) • Cardinality: ONE_ + +The reason why the pid was associated to the author. + +```json +"provenance": { + "provenance": "Inferred by OpenAIRE", + "trust": "0.85" +} +``` + +## AuthorPidSchemaValue +Type used to represent the scheme and value for the author's pid. + +### schema +_Type: String • Cardinality: ONE_ + +The author's pid scheme. OpenAIRE currently supports ORCID. + +```json +"scheme": "orcid" +``` + +### value +_Type: String • Cardinality: ONE_ + +The author's pid value in that scheme. + +```json +"value": "0000-1111-2222-3333" +``` + +## BestAccessRight +Indicates the most open access rights \*available among the research product instances. + +\* where the openness is defined by the ordering of the access right terms in the following. +``` +OPEN SOURCE > OPEN > EMBARGO (6MONTHS) > EMBARGO (12MONTHS) > RESTRICTED > CLOSED > UNKNOWN +``` + +### code +_Type: String • Cardinality: ONE_ + +COAR access mode code: http://vocabularies.coar-repositories.org/documentation/access_rights/. + +```json +"code": "c_16ec" +``` + +### label +_Type: String • Cardinality: ONE_ + +Label for the access mode. + +```json +"label": "RESTRICTED" +``` + +### scheme +_Type: String • Cardinality: ONE_ + +Scheme of reference for access right code. Currently, always set to COAR access rights vocabulary: http://vocabularies.coar-repositories.org/documentation/access_rights/. + +```json +"scheme": "http://vocabularies.coar-repositories.org/documentation/access_rights/" +``` + +## CitationImpact + +The different citation-based impact indicators as computed by [BIP!](https://bip.imsi.athenarc.gr/). + + +### indicator +_Type: String • Cardinality: ONE_ + +The name of indicator; it can be either one of: +* `influence`: it reflects the overall/total (citation-based) impact of an article in the research community at large, based on the underlying citation network (diachronically). +* `citationCount`: it is an alternative to the "Influence" indicator, which also reflects the overall/total (citation-based) impact of an article in the research community at large, based on the underlying citation network (diachronically). +* `popularity`: it reflects the "current" (citation-based) impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. +* `impulse`: it reflects the initial momentum of an article directly after its publication, based on the underlying citation network. + +For more details on how these indicators are calculated, please refer [here](/graph-production-workflow/indicators-ingestion/impact-indicators). + +```json +"citationImpact": { + "influence": 123, + "influenceClass": "C2", + "citationCount": 456, + "citationClass": "C3", + "popularity": 234, + "popularityClass": "C1", + "impulse": 987, + "impulseClass": "C3" +} +``` + +### class +_Type: String • Cardinality: ONE_ + +The impact class assigned based on the indicator score. + +To facilitate comprehension, BIP! also offers impact classes for articles, to group together those that have similar impact. The following 5 classes are provided: +* `C1`: Top 0.01% +* `C2`: Top 0.1% +* `C3`: Top 1% +* `C4`: Top 10% +* `C5`: Bottom 90% + +## Container +This field has information about the conference or journal where the research product has been presented or published. + +```json +"container": { + "name": "Research Policy", + "edition": "xyz", + "issnLinking": "0048-7333", + "issnOnline": "1873-7625", + "issnPrinted": "1377-9655", + "sp": "xyz", + "ep": "xyz", + "iss": "xyz", + "vol": "xyz" +} +``` + +```json +"container": { + "name": "Research Policy", + "conferenceDate": "2022-09-22", + "conferencePlace": "Padua, Italy" +} +``` + +### name +_Type: String • Cardinality: ONE_ + +Name of the journal or conference. + +### issnPrinted +_Type: String • Cardinality: ONE_ + +The journal printed issn. + +### issnOnline +_Type: String • Cardinality: ONE_ + +The journal online issn. + +### issnLinking +_Type: String • Cardinality: ONE_ + +The journal linking issn. + +### iss +_Type: String • Cardinality: ONE_ + +The journal issue. + +### sp +_Type: String • Cardinality: ONE_ + +The start page. + +### ep +_Type: String • Cardinality: ONE_ + +The end page. + +### vol +_Type: String • Cardinality: ONE_ + +The journal volume. + +### edition +_Type: String • Cardinality: ONE_ + +The edition of the journal or conference. + +### conferencePlace +_Type: String • Cardinality: ONE_ + +The place of the conference. + +### conferenceDate +_Type: String • Cardinality: ONE_ + +The date of the conference. + +## ControlledField + + +Generic type used to represent the information described by a scheme and a value in that scheme (i.e. pid). + +```json +{ + "scheme": "DOI", + "value": "10.5281/zenodo.4707307" +} +``` + +### scheme +_Type: String • Cardinality: ONE_ + +Vocabulary reference. + +### value +_Type: String • Cardinality: ONE_ + +Value from the given scheme/vocabulary. + +## Country +To represent the generic country code and label. + +```json +{ + "code" : "IT", + "label": "Italy" +} +``` + +### code +_Type: String • Cardinality: ONE_ + +ISO 3166-1 alpha-2 country code. + +### label +_Type: String • Cardinality: ONE_ + +The country label. + +## Funding +Funding information for a project. + +### fundingStream +_Type: [FundingStream](#fundingstream) • Cardinality: ONE_ + +Funding information for the project. + +```json +"fundingStream": { + "description": "Horizon 2020 Framework Programme - Research and Innovation action", + "id": "EC::H2020::RIA" +} +``` +### jurisdiction +_Type: String • Cardinality: ONE_ + +Geographical jurisdiction (e.g. for European Commission is EU, for Croatian Science Foundation is HR). + +```json +"jurisdiction": "EU" +``` + +### name +_Type: String • Cardinality: ONE_ + +The name of the funder. + +```json +"name": "European Commission" +``` + +### shortName +_Type: String • Cardinality: ONE_ + +The short name of the funder. + +```json +"shortName": "EC" +``` + +## FundingStream +Description of a funding stream. + +### id +_Type: String • Cardinality: ONE_ + +The identifier of the funding stream. + +```json +"id": "EC::H2020::RIA" +``` + +### description +_Type: String • Cardinality: ONE_ + +Short description of the funding stream. + +```json +"description": "Horizon 2020 Framework Programme - Research and Innovation action" +``` + +## GeoLocation +Represents the geolocation information. + +### point +_Type: String • Cardinality: ONE_ + +A point with Latitude and Longitude. + +```json +"point": "7.72486 50.1084" +``` + +### box +_Type: String • Cardinality: ONE_ + +A specified bounding box defined by two longitudes (min and max) and two latitudes (min and max). + + +```json +"box": "18.569386 54.468973 18.066832 54.83707" +``` + +### place +_Type: String • Cardinality: ONE_ + +The name of a specific place. + +```json +"place": "Tübingen, Baden-Württemberg, Southern Germany" +``` + +## Grant +The money granted to a project. + +### currency +_Type: String • Cardinality: ONE_ + +The currency of the granted amount (e.g. EUR). + +```json +"currency": "EUR" +``` + +### fundedAmount +_Type: Number • Cardinality: ONE_ + +The funded amount. + +```json +"fundedAmount": 1.0E7 +``` + +### totalCost +_Type: Number • Cardinality: ONE_ + +The total cost of the project. + +```json +"totalcost": 1.0E7 +``` + +## H2020Programme +The H2020 programme funding a project. + +### code +_Type: String • Cardinality: ONE_ + +The code of the programme. + +```json +"code": "H2020-EU.1.4.1.3." +``` + +### description +_Type: String • Cardinality: ONE_ + +The description of the programme. + +```json +"description": "Development, deployment and operation of ICT-based e-infrastructures" +``` + +## Instance +An instance is one specific materialization or version of the research product. For example, you can have one research product with three instances due to deduplication: + +* one is the pre-print +* one is the post-print +* one is the published version + +Each instance is characterized by the properties that follow. + +### accessRight +_Type: [AccessRight](#accessright) • Cardinality: ONE_ + +Maps [dc:rights](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/elements11/rights/), describes the access rights of the web resources relative to this instance. + +```json +"accessRight": { + "code": "c_abf2", + "label": "OPEN", + "openAccessRoute": "gold", + "scheme": "http://vocabularies.coar-repositories.org/documentation/access_rights/" +} +``` + +### alternateIdentifier +_Type: [AlternateIdentifier](#alternateidentifier) • Cardinality: MANY_ + +All the identifiers associated to the research product other than the authoritative ones. + +```json +"alternateIdentifier": [ + { + "scheme": "doi", + "value": "10.1016/j.respol.2021.104226" + }, + ... +] +``` + +### articleProcessingCharge +_Type: [APC](#apc) • Cardinality: ONE_ + +The money spent to make this book or article available in Open Access. Source for this information is the OpenAPC initiative. + +```json +"articleProcessingCharge": { + "currency": "EUR", + "amount": "1000" +} +``` + +### license +_Type: String • Cardinality: ONE_ + +The license URL. + +```json +"license": "http://creativecommons.org/licenses/by-nc/4.0" +``` + +### pid +_Type: [ResultPid](#resultpid) • Cardinality: MANY_ + +The set of persistent identifiers associated to this instance that have been collected from an authority for the pid type (i.e. Crossref/Datacite for doi). See the [OpenAIRE entity identifier and PID mapping policy](../pids-and-identifiers) for more information. + +```json +"pid": [ + { + "scheme": "pmc", + "value": "PMC8024784" + }, + ... +] +``` + +### publicationDate +_Type: String • Cardinality: ONE_ + +The publication date of the research product. + +```json +"publicationDate": "2009-02-12" +``` + +### refereed +_Type: String • Cardinality: ONE_ + +Describes if this instance has been peer-reviewed or not. Allowed values are peerReviewed, nonPeerReviewed, UNKNOWN (as defined in https://api.openaire.eu/vocabularies/dnet:review_levels). For example: + +* peerReviewed: https://api.openaire.eu/vocabularies/dnet:review_levels/0001 +* nonPeerReviewed: https://api.openaire.eu/vocabularies/dnet:review_levels/0002 + +based on guidelines covers the vocabularies + +* [DRIVE guidelines 2.0 - info:eu-repo/semantic](https://wiki.surfnet.nl/download/attachments/10851536/DRIVER_Guidelines_v2_Final_2008-11-13.pdf) (OpenAIRE v1.0 till v3.0 - Literature) +* [COAR Vocabulary v2.0 and v3.0](https://vocabularies.coar-repositories.org/resource_types/) (OpenAIRE v4 - Inst.+Them.) + +```json +"refereed": "UNKNOWN" +``` + +### type +_Type: String • Cardinality: ONE_ + +The specific sub-type of this instance (see https://api.openaire.eu/vocabularies/dnet:result_typologies following the links) + +```json +"type": "Article" +``` + +### url +_Type: String • Cardinality: MANY_ + +URLs to the instance. They may link to the actual full-text or to the landing page at the hosting source. + +```json +"url": [ + "https://periodicos2.uesb.br/index.php/folio/article/view/4296", + ... +] +``` + +## Indicator + +These are indicators computed for a specific OpenAIRE research product. + +Each Indicator object is composed of the following properties: + +### citationImpact +_Type: [CitationImpact](#citationImpact) • Cardinality: MANY_ + +These indicators, provided by [BIP!](https://bip.imsi.athenarc.gr/), estimate the citation-based impact of a research product. + +For details about their calculation, please refer [here](/graph-production-workflow/indicators-ingestion/impact-indicators). + +```json +"citationImpact": { + "influence": 123, + "influenceClass": "C2", + "citationCount": 456, + "citationClass": "C3", + "popularity": 234, + "popularityClass": "C1", + "impulse": 987, + "impulseClass": "C3" +} +``` + +### usageCounts +_Type: [UsageCounts](#usagecounts-1) • Cardinality: ONE_ + +These measures, computed by the [UsageCounts Service](https://usagecounts.openaire.eu/), are based on usage statistics. + +Please refer [here](/graph-production-workflow/indicators-ingestion/usage-counts) for more details. + +```json +"usageCounts": { + "downloads": "10", + "views": "20" +} +``` +## Language +Represents information for the language of the research product. + +```json +"language": { + "code": "eng", + "label": "English" +} +``` + +### code +_Type: String • Cardinality: ONE_ + +Alpha-3/ISO 639-2 code of the language. Values controlled by the [dnet:languages vocabulary](https://api.openaire.eu/vocabularies/dnet:languages). + +### label +_Type: String • Cardinality: ONE_ + +Language label in English. + +## OrganizationPid + +The schema and value for identifiers of the organization. + +```json +{ + "scheme" : "GRID", + "value" : "grid.7119.e" +} +``` + +### scheme +_Type: String • Cardinality: ONE_ + +Vocabulary reference (i.e. isni). + +### value +_Type: String • Cardinality: ONE_ + +Value from the given scheme/vocabulary (i.e. 0000000090326370). + +## Provenance +Indicates the process that produced (or provided) the information, and the trust associated to the information. + +```json +{ + "provenance" : "Harvested", + "trust": "0.9" +} +``` + +### provenance +_Type: String • Cardinality: ONE_ + +Provenance term from the vocabulary [dnet:provenanceActions](https://api.openaire.eu/vocabularies/dnet:provenanceActions). + +### trust +_Type: String • Cardinality: ONE_ + +Trust, expressed as a number in the range [0-1]. + +## ResultCountry +Indicates the country associated to the research product. +It is a subclass of [Country](#country) and extends it with provenance information. + +### provenance +_Type: [Provenance](#provenance-2) • Cardinality: ONE_ + +Indicates the reason why this country is associated to this research product. + +```json +{ + "code" : "IT", + "label": "Italy", + "provenance": { + "provenance": "inferred by OpenAIRE", + "trust": "0.85" + } +} +``` + +## ResultPid +Type used to represent the information associated to persistent identifiers for the research product that have been forged by an authority for that pid type. + + + +```json +{ + "scheme" : "doi", + "value" : "10.21511/bbs.13(3).2018.13" +} +``` + +### scheme +_Type: String • Cardinality: ONE_ + +The scheme of the persistent identifier for the research product (i.e. doi). If the pid is here it means the information for the pid has been collected from an authority for that pid type (i.e. Crossref/Datacite for doi). The set of authoritative pid is: `doi` when collected from Crossref or Datacite, `pmid` when collected from EuroPubmed, `arxiv` when collected from arXiv, `handle` from the repositories. + +### value +_Type: String • Cardinality: ONE_ + +The value expressed in the scheme (i.e. 10.1000/182). + +## Subject +Represents keywords associated to the research product. + +### subject +_Type: [SubjectSchemeValue](#subjectschemevalue) • Cardinality: ONE_ + +Contains the subject term: subject type (keyword, MeSH, etc) and the subject term (medicine, chemistry, etc.). + +```json +"subject": { + "scheme": "keyword", + "value": "SVOC", + "provenance": { + "provenance": "Harvested", + "trust": "0.9" + } +} +``` + +### scheme +_Type: String • Cardinality: ONE_ + +OpenAIRE subject classification scheme (https://api.openaire.eu/vocabularies/dnet:subject_classification_typologies). + +```json +"scheme" : "keyword" +``` + +### value +_Type: String • Cardinality: ONE_ + +The value for the subject in the selected scheme. When the scheme is 'keyword', it means that the subject is free-text (i.e. not a term from a controlled vocabulary). + +### provenance +_Type: [Provenance](#provenance-2) • Cardinality: ONE_ + +Contains provenance information for the subject term. + +## UsageCounts + +The usage counts indicator computed for this research product. + +```json +"usageCounts": { + "downloads": "10", + "views": "20" +} +``` + +### views +_Type: String • Cardinality: ONE_ + +The number of views for this research product. + +### downloads +_Type: String • Cardinality: ONE_ + +The number of downloads for this research product. diff --git a/versioned_docs/version-9.0.0/data-model/entities/project.md b/versioned_docs/version-9.0.0/data-model/entities/project.md new file mode 100644 index 0000000..5476cce --- /dev/null +++ b/versioned_docs/version-9.0.0/data-model/entities/project.md @@ -0,0 +1,171 @@ +--- +sidebar_position: 4 +--- + +# Projects + +Of crucial interest to OpenAIRE is also the identification of the funders (e.g. European Commission, WellcomeTrust, FCT Portugal, NWO The Netherlands) that co-funded the projects that have led to a given research product. Projects are characterized by a list of funding streams (e.g. FP7, H2020 for the EC), which identify the strands of fundings. Funding streams can be nested to form a tree of sub-funding streams. + +--- + +## The `Project` object + +### id +_Type: String • Cardinality: ONE_ + +Main entity identifier, created according to the [OpenAIRE entity identifier and PID mapping policy](../pids-and-identifiers). + +```json +"id": "corda__h2020::70ea22400fd890c5033cb31642c4ae68" +``` + +### code +_Type: String • Cardinality: ONE_ + +Τhe grant agreement code of the project. + +```json +"code": "777541" +``` + +### acronym +_Type: String • Cardinality: ONE_ + +Project's acronym. + +```json +"acronym": "OpenAIRE-Advance" +``` + +### title +_Type: String • Cardinality: ONE_ + +Project's title. + +```json +"title": "OpenAIRE Advancing Open Scholarship" +``` + +### callIdentifier +_Type: String • Cardinality: ONE_ + +The identifier of the research call. + +```json +"callIdentifier": "H2020-EINFRA-2017"` +``` + +### funding +_Type: [Funding](other#funding) • Cardinality: MANY_ + +Funding information for the project. + +```json +"funding": [ + { + "fundingStream": { + "description": "Horizon 2020 Framework Programme - Research and Innovation action", + "id": "EC::H2020::RIA" + }, + "jurisdiction": "EU", + "name": "European Commission", + "shortName": "EC" + } +] +``` +### granted +_Type: [Grant](other#grant) • Cardinality: ONE_ + +The money granted to the project. + +```json +"granted": { + "currency": "EUR", + "fundedAmount": 1.0E7, + "totalCost": 1.0E7 +} +``` + +### h2020programme +_Type: [H2020Programme](other#h2020programme) • Cardinality: MANY_ + +The H2020 programme funding the project. + +```json +"h2020programme":[ + { + "code": "H2020-EU.1.4.1.3.", + "description": "Development, deployment and operation of ICT-based e-infrastructures" + } +] +``` +### keywords +_Type: String • Cardinality: ONE_ + +```json +"keywords": [ + "Open Science", + ... +] +``` + +### openAccessMandateForDataset +_Type: Boolean • Cardinality: ONE_ + +```json +"openAccessMandateForDataset": true +``` + +### openAccessMandateForPublications +_Type: Boolean • Cardinality: ONE_ + +```json +"openAccessMandateForPublications": true +``` + +### startDate +_Type: String • Cardinality: ONE_ + +The start year of the project. + +```json +"startDate": "2018-01-01" +``` + +### endDate +_Type: String • Cardinality: ONE_ + +The end year pf the project. + +```json +"endDate": "2021-02-28" +``` + +### subject +_Type: String • Cardinality: MANY_ + +The subjects of the project + +```json +"subject": [ + "Data and Distributed Computing e-infrastructures for Open Science", + ... +] +``` +### summary +_Type: String • Cardinality: ONE_ + +Short summary of the project. + +```json +"summary": "OpenAIRE-Advance continues the mission of OpenAIRE to support the Open Access/Open Data mandates in Europe. By sustaining the current successful infrastructure, comprised of a human network and robust technical services, it consolidates its achievements while working to shift the momentum among its communities to Open Science, aiming to be a trusted e-Infrastructurewithin the realms of the European Open Science Cloud.In this next phase, OpenAIRE-Advance strives to empower its National Open Access Desks (NOADs) so they become a pivotal part within their own national data infrastructures, positioningOA and open science onto national agendas. The capacity building activities bring together experts ontopical task groups in thematic areas(open policies, RDM, legal issues, TDM), promoting a train the trainer approach, strengthening and expanding the pan-European Helpdesk with support and training toolkits, training resources and workshops.It examines key elements of scholarly communication, i.e., co-operative OA publishing and next generation repositories, to develop essential building blocks of the scholarly commons.On the technical level OpenAIRE-Advance focuses on the operation and maintenance of the OpenAIRE technical TRL8/9 services,and radically improvesthe OpenAIRE services on offer by: a) optimizing their performance and scalability, b) refining their functionality based on end-user feedback, c) repackagingthem into products, taking a professional marketing approach with well-defined KPIs, d)consolidating the range of services/products into a common e-Infra catalogue to enable a wider uptake.OpenAIRE-Advancesteps up its outreach activities with concrete pilots with three major RIs,citizen science initiatives, and innovators via a rigorous Open Innovation programme. Finally, viaits partnership with COAR, OpenAIRE-Advance consolidatesOpenAIRE’s global roleextending its collaborations with Latin America, US, Japan, Canada, and Africa." +``` + +### websiteUrl +_Type: String • Cardinality: ONE_ + +The website of the project + +```json +"websiteUrl": "https://www.openaire.eu/advance/" +``` diff --git a/versioned_docs/version-9.0.0/data-model/entities/research-product.md b/versioned_docs/version-9.0.0/data-model/entities/research-product.md new file mode 100644 index 0000000..28c3b27 --- /dev/null +++ b/versioned_docs/version-9.0.0/data-model/entities/research-product.md @@ -0,0 +1,527 @@ +--- +sidebar_position: 1 +--- + +# Research products + +Research products are intended as digital objects, described by metadata, resulting from a scientific process. +In this page, we descibe the properties of the `ResearchProduct` object. + +Moreover, there are the following sub-types of a `ResearchProduct`, that inherit all its properties and further extend it: +* [Publication](#publication) +* [Data](#data) +* [Software](#software) +* [Other research product](#other-research-product) + +--- + +## The `ResearchProduct` object + +### id +_Type: String • Cardinality: ONE_ + +Main entity identifier, created according to the [OpenAIRE entity identifier and PID mapping policy](../pids-and-identifiers). + +```json +"id": "doi_dedup___::80f29c8c8ba18c46c88a285b7e739dc3" +``` + +### type +_Type: String • Cardinality: ONE_ + +Type of the research products. Possible types: + +* `publication` +* `data` +* `software` +* `other` + +as declared in the terms from the [dnet:result_typologies vocabulary](https://api.openaire.eu/vocabularies/dnet:result_typologies). + +```json +"type": "publication" +``` + +### originalId +_Type: String • Cardinality: MANY_ + +Identifiers of the record at the original sources. + +```json +"originalId": [ + "oai:pubmedcentral.nih.gov:8024784", + "S0048733321000305", + "10.1016/j.respol.2021.104226", + "3136742816" +] +``` + +### mainTitle +_Type: String • Cardinality: ONE_ + +A name or title by which a research product is known. It may be the title of a publication or the name of a piece of software. + +```json +"mainTitle": "The fall of the innovation empire and its possible rise through open science" +``` + +### subTitle + +_Type: String • Cardinality: ONE_ + +Explanatory or alternative name by which a research product is known. + +```json +"subTitle": "An analysis of cases from 1980 - 2020" +``` + +### author +_Type: [Author](other#author) • Cardinality: MANY_ + +The main researchers involved in producing the data, or the authors of the publication. + +```json +"author": [ + { + "fullName": "E. Richard Gold", + "rank": 1, + "name": "Richard", + "surname": "Gold", + "pid": { + "id": { + "scheme": "orcid", + "value": "0000-0002-3789-9238" + }, + "provenance": { + "provenance": "Harvested", + "trust": "0.9" + } + } + }, + ... +] +``` +### bestAccessRight +_Type: [BestAccessRight](other#bestaccessright) • Cardinality: ONE_ + +The most open access right associated to the manifestations of this research product. + +```json +"bestAccessRight": { + "code": "c_abf2", + "label": "OPEN", + "scheme": "http://vocabularies.coar-repositories.org/documentation/access_rights/" +} +``` + +### contributor +_Type: String • Cardinality: MANY_ + +The institution or person responsible for collecting, managing, distributing, or otherwise contributing to the development of the resource. + +```json +"contributor": [ + "University of Zurich", + "Wright, Aidan G C", + "Hallquist, Michael", + ... +] +``` + +### country +_Type: [ResultCountry](other#resultcountry) • Cardinality: MANY_ + +Country associated with the research product: it is the country of the organisation that manages the institutional repository or national aggregator or CRIS system from which this record was collected. +Country of affiliations of authors can be found instead in the affiliation relation. + +```json +"country": [ + { + "code": "CH", + "label": "Switzerland", + "provenance": { + "provenance": "Inferred by OpenAIRE", + "trust": "0.85" + } + }, + ... +] +``` + +### coverage +_Type: String • Cardinality: MANY_ + +### dateOfCollection +_Type: String • Cardinality: ONE_ + +When OpenAIRE collected the record the last time. + +```json +"dateOfCollection": "2021-06-09T11:37:56.248Z" +``` + +### description +_Type: String • Cardinality: MANY_ + +A brief description of the resource and the context in which the resource was created. + +```json +"description": [ + "Open science partnerships (OSPs) are one mechanism to reverse declining efficiency. OSPs are public-private partnerships that openly share publications, data and materials.", + "There is growing concern that the innovation system's ability to create wealth and attain social benefit is declining in effectiveness. This article explores the reasons for this decline and suggests a structure, the open science partnership, as one mechanism through which to slow down or reverse this decline.", + "The article examines the empirical literature of the last century to document the decline. This literature suggests that the cost of research and innovation is increasing exponentially, that researcher productivity is declining, and, third, that these two phenomena have led to an overall flat or declining level of innovation productivity.", + ... +] +``` + +### embargoEndDate +_Type: String • Cardinality: ONE_ + +Date when the embargo ends and this research product turns Open Access. + +```json +"embargoEndDate": "2017-01-01" +``` + +### indicators +_Type: [Indicator](other#indicator-1) • Cardinality: ONE_ + +The indicators computed for this research product; +currently, the following types of indicators are supported: + +* [Citation-based impact indicators by BIP!](other#citationimpact) +* [Usage Statistics indicators](other#usagecounts) + +```json +"indicators": { + "citationImpact": { + "influence": 123, + "influenceClass": "C2", + "citationCount": 456, + "citationClass": "C3", + "popularity": 234, + "popularityClass": "C1", + "impulse": 987, + "impulseClass": "C3" + }, + "usageCounts": { + "downloads": "10", + "views": "20" + } +} +``` + +### instance +_Type: [Instance](other#instance) • Cardinality: MANY_ + +Specific materialization or version of the research product. For example, you can have one research product with three instances: one is the pre-print, one is the post-print, one is the published version. + +```json +"instance": [ + { + "accessRight": { + "code": "c_abf2", + "label": "OPEN", + "openAccessRoute": "gold", + "scheme": "http://vocabularies.coar-repositories.org/documentation/access_rights/" + }, + "alternateIdentifier": [ + { + "scheme": "doi", + "value": "10.1016/j.respol.2021.104226" + }, + ... + ], + "articleProcessingCharge": { + "amount": "4063.93", + "currency": "EUR" + }, + "license": "http://creativecommons.org/licenses/by-nc/4.0", + "pid": [ + { + "scheme": "pmc", + "value": "PMC8024784" + }, + ... + ], + + "publicationDate": "2021-01-01", + "refereed": "UNKNOWN", + "type": "Article", + "url": [ + "http://europepmc.org/articles/PMC8024784" + ] + }, + ... +] +``` + +### language +_Type: [Language](other#language) • Cardinality: ONE_ + +The alpha-3/ISO 639-2 code of the language. Values controlled by the [dnet:languages vocabulary](https://api.openaire.eu/vocabularies/dnet:languages). + +```json +"language": { + "code": "eng", + "label": "English" +} +``` +### lastUpdateTimeStamp +_Type: Long • Cardinality: ONE_ + +Timestamp of last update of the record in OpenAIRE. + +```json +"lastUpdateTimeStamp": 1652722279987 +``` + +### pid +_Type: [ResultPid](other#resultpid) • Cardinality: MANY_ + +Persistent identifiers of the research product. See also the [OpenAIRE entity identifier and PID mapping policy](../pids-and-identifiers) to learn more. + +```json +"pid": [ + { + "scheme": "pmc", + "value": "PMC8024784" + }, + { + "scheme": "doi", + "value": "10.1016/j.respol.2021.104226" + }, + ... +] +``` + +### publicationDate +_Type: String • Cardinality: ONE_ + +Main date of the research product: typically the publication or issued date. In case of a research product with different versions with different dates, the date of the research product is selected as the most frequent well-formatted date. If not available, then the most recent and complete date among those that are well-formatted. For statistics, the year is extracted and the research product is counted only among the research products of that year. Example: Pre-print date: 2019-02-03, Article date provided by repository: 2020-02, Article date provided by Crossref: 2020, OpenAIRE will set as date 2019-02-03, because it’s the most recent among the complete and well-formed dates. If then the repository updates the metadata and set a complete date (e.g. 2020-02-12), then this will be the new date for the research product because it becomes the most recent most complete date. However, if OpenAIRE then collects the pre-print from another repository with date 2019-02-03, then this will be the “winning date” because it becomes the most frequent well-formatted date. + +```json +"publicationDate": "2021-03-18" +``` + +### publisher +_Type: String • Cardinality: ONE_ + +The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource. + +```json +"publisher": "Elsevier, North-Holland Pub. Co" +``` + +### source +_Type: String • Cardinality: MANY_ + +A related resource from which the described resource is derived. See definition of Dublin Core field [dc:source](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/elements11/source). + +```json +"source": [ + "Research Policy", + "Crossref", + ... +] +``` + +### subjects +_Type: [Subject](other#subject) • Cardinality: MANY_ + +Subject, keyword, classification code, or key phrase describing the resource. + +OpenAIRE classifies research products according to the [Field of Science](../../graph-production-workflow/indicators-ingestion/fos-classification.md) +and [Sustainable Development Goals](../../graph-production-workflow/indicators-ingestion/sdg-classification.md) taxonomies. +Check out the relative sections to know more. + +```json +"subjects": [ + { + "subject": { + "scheme": "FOS", + "value": "01 natural sciences" + }, + "provenance": { + "provenance": "inferred by OpenAIRE", + "trust": "0.85" + } + }, + { + "subject": { + "scheme": "SDG", + "value": "2. Zero hunger" + }, + "provenance": { + "provenance": "inferred by OpenAIRE", + "trust": "0.83" + } + }, + { + "subject": { + "scheme": "keyword", + "value": "Open science" + }, + "provenance": { + "provenance": "Harvested", + "trust": "0.9" + } + }, + ... +] +``` + +### isGreen +_Type: Boolean • Cardinality: ONE_ + +Indicates whether or not the scientific result was published following the green open access model. + +### openAccessColor +_Type: String • Cardinality: ONE_ + + +Indicates the specific open access model used for the publication; possible value is one of `bronze, gold, hybrid`. + +### isInDiamondJournal +_Type: Boolean • Cardinality: ONE_ + +Indicates whether or not the publication was published in a diamond journal. + +### publiclyFunded +_Type: String • Cardinality: ONE_ + +Discloses whether the publication acknowledges grants from public sources. + +--- + +## Sub-types + +There are the following sub-types of `Result`. Each inherits all its fields and extends them with the following. + +### Publication + +Metadata records about research literature (includes types of publications listed [here](http://api.openaire.eu/vocabularies/dnet:result_typologies/publication)). + +#### container +_Type: [Container](other#container) • Cardinality: ONE_ + +Container has information about the conference or journal where the research product has been presented or published. + +```json +"container": { + "edition": "", + "iss": "5", + "issnLinking": "", + "issnOnline": "1873-7625", + "issnPrinted": "0048-7333", + "name": "Research Policy", + "sp": "12", + "ep": "22", + "vol": "50" +} +``` +### Data + +Metadata records about research data (includes the subtypes listed [here](http://api.openaire.eu/vocabularies/dnet:result_typologies/dataset)). + +#### size +_Type: String • Cardinality: ONE_ + +The declared size of the research data. + +```json +"size": "10129818" +``` + +#### version +_Type: String • Cardinality: ONE_ + +The version of the research data. + +```json +"version": "v1.3" +``` + +#### geolocation +_Type: [GeoLocation](other#geolocation) • Cardinality: MANY_ + +The list of geolocations associated with the research data. + +```json +"geolocation": [ + { + "box": "18.569386 54.468973 18.066832 54.83707", + "place": "Tübingen, Baden-Württemberg, Southern Germany", + "point": "7.72486 50.1084" + }, + ... +] +``` + +### Software + +Metadata records about research software (includes the subtypes listed [here](http://api.openaire.eu/vocabularies/dnet:result_typologies/software)). + +#### documentationUrl +_Type: String • Cardinality: MANY_ + +The URLs to the software documentation. + +```json +"documentationUrl": [ + "https://github.com/openaire/iis/blob/master/README.markdown", + ... +] +``` + +#### codeRepositoryUrl +_Type: String • Cardinality: ONE_ + +The URL to the repository with the source code. + +```json +"codeRepositoryUrl": "https://github.com/openaire/iis" +``` + +#### programmingLanguage +_Type: String • Cardinality: ONE_ + +The programming language. + +```json +"programmingLanguage": "Java" +``` + +### Other research product + +Metadata records about research products that cannot be classified as research literature, data or software (includes types of products listed [here](http://api.openaire.eu/vocabularies/dnet:result_typologies/other)). + +#### contactPerson +_Type: String • Cardinality: MANY_ + +Information on the person responsible for providing further information regarding the resource. + +```json +"contactPerson": [ + "Noémie Dominguez", + ... +] +``` + +#### contactGroup +_Type: String • Cardinality: MANY_ + +Information on the group responsible for providing further information regarding the resource. + +```json +"contactGroup": [ + "Networked Multimedia Information Systems (NeMIS)", + ... +] +``` + +#### tool +_Type: String • Cardinality: MANY_ + +Information about tool useful for the interpretation and/or re-use of the research product. + diff --git a/versioned_docs/version-9.0.0/data-model/pids-and-identifiers.md b/versioned_docs/version-9.0.0/data-model/pids-and-identifiers.md new file mode 100644 index 0000000..05e33ab --- /dev/null +++ b/versioned_docs/version-9.0.0/data-model/pids-and-identifiers.md @@ -0,0 +1,80 @@ +# PIDs and identifiers + +One of the challenges towards the stability of the contents in the OpenAIRE Graph consists of making its identifiers and records stable over time. +The barriers to this scenario are many, as the Graph keeps a map of data sources that is subject to constant variations: records in repositories vary in content, +original IDs, and PIDs, may disappear or reappear, and the same holds for the repository or the metadata collection it exposes. +Not only, but the mappings applied to the original contents may also change and improve over time to catch up with the changes in the input records. + +## PID Authorities + +One of the fronts regards the attribution of the identity to the objects populating the graph. The basic idea is to build the identifiers of the objects in the graph from the PIDs available in some authoritative sources while considering all the other sources as by definition “unstable”. Examples of authoritative sources are Crossref and DataCite. Examples of non-authoritative ones are institutional repositories, aggregators, etc. PIDs from the authoritative sources would form the stable OpenAIRE ID skeleton of the Graph, precisely because they are immutable by construction. + +Such a policy defines a list of data sources that are considered authoritative for a specific type of PID they provide, whose effect is twofold: +* OpenAIRE IDs depend on persistent IDs when they are provided by the authority responsible to create them; +* PIDs are included in the graph according to a tight criterion: the PID Types declared in the table below are considered to be mapped as PIDs only when they are collected from the relative PID authority data source. + +| PID Type | Authority | +|-----------|-----------------------------------------------------------------------------------------------------| +| doi | [Crossref](https://www.crossref.org), [Datacite](https://datacite.org) | +| pmc, pmid | [Europe PubMed Central](https://europepmc.org/), [PubMed Central](https://www.ncbi.nlm.nih.gov/pmc) | +| arXiv | [arXiv.org e-Print Archive](https://arxiv.org/) | +| uniprot | [Protein Data Bank](http://www.pdb.org/) | +| ena | [Protein Data Bank](http://www.pdb.org/) | +| pdb | [Protein Data Bank](http://www.pdb.org/) | + + +There is an exception though: Handle(s) are minted by several repositories; as listing them all would not be a viable option, to avoid losing them as PIDs, Handles bypass the PID authority filtering rule. +In all other cases, PIDs are included in the graph as alternate Identifiers. + +## Delegated authorities + +When a record is aggregated from multiple sources considered authoritative for minting specific PIDs, different mappings could be applied to them and, depending on the case, +this could result in inconsistencies in the attribution of the field values. +To overcome the issue, the intuition is to include such records only once in the graph. To do so, the concept of "delegated authorities" defines a list of datasources that +assigns PIDs to their scientific products from a given PID minter. + +This "selection" can be performed when the entities in the graph sharing the same identifier are grouped together. The list of the delegated authorities currently includes + +| Datasource delegated | Datasource delegating | Pid Type | +|--------------------------------------|----------------------------------|----------| +| [Zenodo](https://zenodo.org) | [Datacite](https://datacite.org) | doi | +| [RoHub](https://reliance.rohub.org/) | [W3ID](https://w3id.org/) | w3id | + + +## Identifiers in the Graph + +OpenAIRE assigns internal identifiers for each object it collects. +By default, the internal identifier is generated as `sourcePrefix::md5(localId)` where: + +* `sourcePrefix` is a namespace prefix of 12 chars assigned to the data source at registration time +* `localΙd` is the identifier assigned to the object by the data source + +After years of operation, we can say that: + +* `localId` are generally unstable +* objects can disappear from sources +* PIDs provided by sources that are not PID agencies (authoritative sources for a specific type of PID) are often wrong (e.g. pre-print with the DOI of the published version, DOIs with typos) + +Therefore, when the record is collected from an authoritative source: + +* the identity of the record is forged using the PID, like `pidTypePrefix::md5(lowercase(doi))` +* the PID is added in a `pid` element of the data model + +When the record is collected from a source which is not authoritative for any type of PID: +* the identity of the record is forged as usual using the local identifier +* the PID, if available, is added as `alternateIdentifier` + +Currently, the following data sources are used as "PID authorities": + +| PID Type | Prefix (12 chars) | Authority | +|----------|-----------------------|-----------------------------------------| +| doi | `doi_________` | Crossref, Datacite, Zenodo | +| pmc | `pmc_________` | Europe PubMed Central, PubMed Central | +| pmid | `pmid________` | Europe PubMed Central, PubMed Central | +| arXiv | `arXiv_______` | arXiv.org e-Print Archive | +| ena | `ena_________` | EMBL-EBI | +| pdb | `pdb_________` | EMBL-EBI | +| uniprot | `uniprot_____` | EMBL-EBI | + +OpenAIRE also perform duplicate identification (see the [dedicated section for details](/graph-production-workflow/deduplication)). +All duplicates are **merged** together in a **representative record** which must be assigned a [dedicated OpenAIRE identifier](/graph-production-workflow/deduplication/research-products#openaire-identifier-of-the-representative-record) (i.e. it cannot have the identifier of one of the aggregated record). diff --git a/versioned_docs/version-9.0.0/data-model/relationships/relationship-object.md b/versioned_docs/version-9.0.0/data-model/relationships/relationship-object.md new file mode 100644 index 0000000..1945717 --- /dev/null +++ b/versioned_docs/version-9.0.0/data-model/relationships/relationship-object.md @@ -0,0 +1,109 @@ +--- +title: The Relationship object +--- + +# The `Relationship` object + +A relationship in the Graph is represented with the data type presented in this page, which aims to model a directed edge between two nodes, providing information about its semantics, provenance and validation. + +### source +_Type: String • Cardinality: ONE_ + +OpenAIRE identifier of the node in the graph. + +```json +"source": "openorgs____::1cb75a3ad756e4c83e455e3e7347643b" +``` + +### sourceType +_Type: String • Cardinality: ONE_ + +Graph node type. + +```json +"sourceType": "organization" +``` + +### target +_Type: String • Cardinality: ONE_ + +OpenAIRE identifier of the node in the graph. + +```json +"target": "doajarticles::022409068174087a003647ff46070f7f" +``` + +### targetType +_Type: String • Cardinality: ONE_ + +Graph node type. + +```json +"target": "datasource" +``` + +### relType +_Type: [RelType](#the-reltype-object) • Cardinality: ONE_ + +Represent the semantics of the relationship between two nodes of the graph. + +```json +"relType": { + "name": "provides", + "type": "provision" +} +``` +### provenance +_Type: [Provenance](/data-model/entities/other#provenance-1) • Cardinality: ONE_ + +Indicates the process that produced (or provided) the information. + +```json +"provenance": { + "provenance": "Harvested", + "trust":"0.900" +} +``` + +### validated +_Type: Boolean • Cardinality: ONE_ + +Indicates weather or not the relationship was validated. + +```json +"validated": true +``` + +### validationDate +_Type: String • Cardinality: ONE_ + +Indicates the validation date of the relationship - applies only when the validated flag is set to true. + +```json +"validationDate": "2022-09-02" +``` + +--- + +## The `RelType` object + +The RelType data type models the semantic of the relationship among two nodes. + +### type +_Type: String • Cardinality: ONE_ + +The relationship category, e.g. affiliation, citation. (see [relationship types](./relationship-types)). + +```json +"name": "provides" +``` + +### name +_Type: String • Cardinality: ONE_ + +Further specifies the relationship semantic, indicating the relationship direction, e.g. Cites, isCitedBy. + +```json +"type": "provision" +``` +--- \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/data-model/relationships/relationship-types.md b/versioned_docs/version-9.0.0/data-model/relationships/relationship-types.md new file mode 100644 index 0000000..cc7e135 --- /dev/null +++ b/versioned_docs/version-9.0.0/data-model/relationships/relationship-types.md @@ -0,0 +1,37 @@ +# Relationship types + +The following table lists all the possible relation semantics found in the Graph Dataset. + +Note: the labels used to specify the semantic of the relationships are (for the large) inherited from the [DataCite metadata kernel](https://schema.datacite.org/meta/kernel-4.4/doc/DataCite-MetadataKernel_v4.4.pdf), which provides a description for them. + +| # | Source entity type | Target entity type | Relation name / inverse | Provenance | +|:--:|:--------------------------------------:|:--------------------------------------:|:----------------------------------------------------------:|:-----------------------------------------------:| +| 1 | [Project](/data-model/entities/project) | [ResearchProduct](../../data-model/entities/research-product) | produces / isProducedBy | Harvested, Inferred by OpenAIRE, Linked by user | +| 2 | [Project](/data-model/entities/project) | [Organization](/data-model/entities/organization) | hasParticipant / isParticipant | Harvested | +| 3 | [Project](/data-model/entities/project) | [Community](/data-model/entities/community) | IsRelatedTo / IsRelatedTo | Linked by user | +| 4 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsAmongTopNSimilarDocuments / HasAmongTopNSimilarDocuments | Inferred by OpenAIRE | +| 5 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsSupplementTo / IsSupplementedBy | Harvested | +| 6 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsRelatedTo / IsRelatedTo | Harvested, Inferred by OpenAIRE, Linked by user | +| 7 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsPartOf / HasPart | Harvested | +| 8 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsDocumentedBy / Documents | Harvested | +| 9 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsObsoletedBy / Obsoletes | Harvested | +| 10 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsSourceOf / IsDerivedFrom | Harvested | +| 11 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsCompiledBy / Compiles | Harvested | +| 12 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsRequiredBy / Requires | Harvested | +| 13 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsCitedBy / Cites | Harvested, Inferred by OpenAIRE | +| 14 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsReferencedBy / References | Harvested | +| 15 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsReviewedBy / Reviews | Harvested | +| 16 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsOriginalFormOf / IsVariantFormOf | Harvested | +| 17 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsVersionOf / HasVersion | Harvested | +| 18 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsIdenticalTo / IsIdenticalTo | Harvested | +| 19 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsPreviousVersionOf / IsNewVersionOf | Harvested | +| 20 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsContinuedBy / Continues | Harvested | +| 21 | [ResearchProduct](../../data-model/entities/research-product) | [ResearchProduct](../../data-model/entities/research-product) | IsDescribedBy / Describes | Harvested | +| 22 | [ResearchProduct](../../data-model/entities/research-product) | [Organization](/data-model/entities/organization) | hasAuthorInstitution / isAuthorInstitutionOf | Harvested, Inferred by OpenAIRE | +| 23 | [ResearchProduct](../../data-model/entities/research-product) | [Data source](/data-model/entities/data-source) | isHostedBy / hosts | Harvested, Inferred by OpenAIRE | +| 24 | [ResearchProduct](../../data-model/entities/research-product) | [Data source](/data-model/entities/data-source) | isProvidedBy / provides | Harvested | +| 25 | [ResearchProduct](../../data-model/entities/research-product) | [Community](/data-model/entities/community) | IsRelatedTo / IsRelatedTo | Harvested, Inferred by OpenAIRE, Linked by user | +| 26 | [Organization](/data-model/entities/organization) | [Community](/data-model/entities/community) | IsRelatedTo / IsRelatedTo | Linked by user | +| 27 | [Organization](/data-model/entities/organization) | [Organization](/data-model/entities/organization) | IsChildOf / IsParentOf | Linked by user | +| 28 | [Data source](/data-model/entities/data-source) | [Community](/data-model/entities/community) | IsRelatedTo / IsRelatedTo | Linked by user | +| 29 | [Data source](/data-model/entities/data-source) | [Organization](/data-model/entities/organization) | isProvidedBy / provides | Harvested | diff --git a/versioned_docs/version-9.0.0/downloads/alternative-model/cfhb.md b/versioned_docs/version-9.0.0/downloads/alternative-model/cfhb.md new file mode 100644 index 0000000..4d9863d --- /dev/null +++ b/versioned_docs/version-9.0.0/downloads/alternative-model/cfhb.md @@ -0,0 +1,30 @@ +--- + +sidebar_position: 1 + +--- + +# CfHbKeyValue + +Information about the sources from which the record has been collected. + + + @JsonSchema(description = "the OpenAIRE identifier of the data source") +### key +_Type: String • Cardinality: ONE_ + +the OpenAIRE identifier of the data source + +```json +"key":"openaire____::081b82f96300b6a6e3d282bad31cb6e2" +``` + +### value +_Type: String • Cardinality: ONE_ + +The name of the data source. + +```json +"value":"Crossref" +``` + diff --git a/versioned_docs/version-9.0.0/downloads/alternative-model/communityInstance.md b/versioned_docs/version-9.0.0/downloads/alternative-model/communityInstance.md new file mode 100644 index 0000000..e883626 --- /dev/null +++ b/versioned_docs/version-9.0.0/downloads/alternative-model/communityInstance.md @@ -0,0 +1,37 @@ +--- + +sidebar_position: 1 + +--- + +# CommunityInstance + +It is a subclass of [Instance](../../data-model/entities/research-product#instance) extended with information regarding the collection and hosting source for this materialization of the research product. + +### hostedby +_Type: [CfHbKeyValue](./cfhb) • Cardinality: ONE_ + +Information about the source from which the instance can be viewed or downloaded. + +```json + +"hostedby": { + "key": "issn___print::35ee75a5ad42581d604be113a8f56427", + "value": "New Phytologist" + }, + +``` + +### collectedfrom +_Type: [CfHbKeyValue](./cfhb) • Cardinality: ONE_ + +Information about the source from which the record has been collected + + +```json + +"collectedfrom": { + "key": "openaire____::081b82f96300b6a6e3d282bad31cb6e2", + "value": "Crossref" + } +``` \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/downloads/alternative-model/context.md b/versioned_docs/version-9.0.0/downloads/alternative-model/context.md new file mode 100644 index 0000000..51cf14e --- /dev/null +++ b/versioned_docs/version-9.0.0/downloads/alternative-model/context.md @@ -0,0 +1,46 @@ +--- + +sidebar_position: 1 + +--- + +# Context + +Information related to research initiative/community (RI/RC) related to the research product. + +### code +_Type: String • Cardinality: ONE_ + +Code identifying the RI/RC. + +```json +"code":"sdsn-gr" + +``` + + +### label +_Type: String • Cardinality: ONE_ + +Label of the RI/RC. + +```json +"label":"SDSN - Greece" +``` + +### provenance +_Type: [Provenance](/data-model/entities/other#provenance-2) • Cardinality: MANY_ + +Why this research product is associated to the RI/RC. + +```json + +"provenance":[{ + "provenance":"Inferred by OpenAIRE", + "trust":"0.9" + }, + ... + ] + +``` + diff --git a/versioned_docs/version-9.0.0/downloads/alternative-model/extended-research-product.md b/versioned_docs/version-9.0.0/downloads/alternative-model/extended-research-product.md new file mode 100644 index 0000000..51edaec --- /dev/null +++ b/versioned_docs/version-9.0.0/downloads/alternative-model/extended-research-product.md @@ -0,0 +1,140 @@ +--- + +sidebar_position: 1 + +--- + + +# Extended Research Product + + +It is a subclass of [ResearchProduct](../../data-model/entities/research-product) extended with information regarding projects (and funders), research communities/infrastructure and related data sources. + + +### projects + +_Type: [Project](project.md) • Cardinality: MANY_ + + +List of projects (i.e. grants) that (co-)funded the production of the research products. + + +```json + + +"projects": [ + { + "id": "corda__h2020::94c4a066401e22002c4811a301bb4655", + "code": "727929", + "acronym": "TomRes", + "title": "A NOVEL AND INTEGRATED APPROACH TO INCREASE MULTIPLE AND COMBINED STRESS TOLERANCE IN PLANTS USING TOMATO AS A MODEL", + "funder": { + "shortName": "EC", + "name": "European Commission", + "jurisdiction": "EU", + "fundingStream": "H2020" + }, + "provenance": { + "provenance": "Harvested", + "trust": "0.900000000000000022" + }, + "validated": { + "validationDate": "2021-0101", + "validatedByFunder": true + } + }, + ... + ] + +``` + +### context + +_Type: [Context](./context) • Cardinality: MANY_ + + +Reference to relevant research infrastructure, initiative or communities (RI/RC) among those collaborating with OpenAIRE. Please see https://connect.openaire.eu that are publicly visible. + + +```json + + +"context":[ + { + "code":"sdsn-gr", + "label":"SDSN - Greece", + "provenance":[ + { + "provenance":"Inferred by OpenAIRE", + "trust":"0.9" + } + ] + }, + ... + ] + +``` + + + +### collectedfrom + +_Type: [CfHbKeyValue](./cfhb) • Cardinality: MANY_ + + +Information about the sources from which the record has been collected. + + +```json + +"collectedfrom":[ + { + "key":"openaire____::081b82f96300b6a6e3d282bad31cb6e2", + "value":"Crossref" + }, + ... + ] + +``` + + +### instance + +_Type: [CommunityInstance](./communityInstance) • Cardinality: MANY_ + +Information about the source from which the instance can be viewed or downloaded. + +```json + + +"instance": [ + { + "license": "http://doi.wiley.com/10.1002/tdm_license_1.1", + "accessright": { + "code": "c_16ec", + "label": "RESTRICTED", + "scheme": "http://vocabularies.coar-repositories.org/documentation/access_rights/", + "openAccessRoute": null + }, + "type": "Article", + "url": [ + "https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1111%2Fnph.15014", + "http://onlinelibrary.wiley.com/wol1/doi/10.1111/nph.15014/fullpdf", + "http://dx.doi.org/10.1111/nph.15014" + ], + "publicationdate": "2018-02-09", + "refereed": "UNKNOWN", + "hostedby": { + "key": "issn___print::35ee75a5ad42581d604be113a8f56427", + "value": "New Phytologist" + }, + "collectedfrom": { + "key": "openaire____::081b82f96300b6a6e3d282bad31cb6e2", + "value": "Crossref" + } + }, + ... + ] + + +``` diff --git a/versioned_docs/version-9.0.0/downloads/alternative-model/funder.md b/versioned_docs/version-9.0.0/downloads/alternative-model/funder.md new file mode 100644 index 0000000..1da93a9 --- /dev/null +++ b/versioned_docs/version-9.0.0/downloads/alternative-model/funder.md @@ -0,0 +1,72 @@ +--- + +sidebar_position: 1 + +--- + +# Funder + + +Information about the funder funding the project. + + +### fundingStream + +_Type: String • Cardinality: ONE_ + + +Funding information for the project. + + +```json + +"funding_stream": "H2020" + + +``` + +### jurisdiction + +_Type: String • Cardinality: ONE_ + + +Geographical jurisdiction (e.g. for European Commission is EU, for Croatian Science Foundation is HR). + + +```json + +"jurisdiction": "EU" + +``` + + +### name + +_Type: String • Cardinality: ONE_ + + +The name of the funder. + + +```json + +"name": "European Commission" + +``` + + +### shortName + +_Type: String • Cardinality: ONE_ + + +The short name of the funder. + + +```json + +"shortName": "EC" + +``` + + diff --git a/versioned_docs/version-9.0.0/downloads/alternative-model/project.md b/versioned_docs/version-9.0.0/downloads/alternative-model/project.md new file mode 100644 index 0000000..985eecd --- /dev/null +++ b/versioned_docs/version-9.0.0/downloads/alternative-model/project.md @@ -0,0 +1,134 @@ +--- + +sidebar_position: 1 + +--- + + + +# Project + + +The information about the projects related to a research product. + + +### id + +_Type: String • Cardinality: ONE_ + + +Main entity identifier, created according to the [OpenAIRE entity identifier and PID mapping policy](../../data-model/pids-and-identifiers). + + +```json + +"id": "corda__h2020::70ea22400fd890c5033cb31642c4ae68" + +``` + + +### code + +_Type: String • Cardinality: ONE_ + + +Τhe grant agreement code of the project. + + +```json + +"code": "777541" + +``` + + +### acronym + +_Type: String • Cardinality: ONE_ + + +Project's acronym. + + +```json + +"acronym": "OpenAIRE-Advance" + +``` + + +### title + +_Type: String • Cardinality: ONE_ + + +Project's title. + + +```json + +"title": "OpenAIRE Advancing Open Scholarship" + +``` + + +### funder + +_Type [Funder](funder.md) • Cardinality: ONE_ + + +Information about the funder funding the project. + + +```json + + +"funder": { + "shortName": "EC", + "name": "European Commission", + "jurisdiction": "EU", + "fundingStream": "H2020" + } + + +``` + +### provenace + + +_Type [Provenance](../../data-model/entities/other#provenance-2) • Cardinality: ONE_ + + +The reason why the project is associated to the research product. + + +```json + + +"provenance": { + "provenance": "Harvested", + "trust": "0.900000000000000022" + } + +``` + + +### validated + + +_Type [Validated](validated.md) • Cardinality: ONE_ + + +Specifies whether the association between the project and the research product was validated. + + +```json + + +"validated": { + "validationDate": "2021-0101", + "validatedByFunder": true + } + +``` + diff --git a/versioned_docs/version-9.0.0/downloads/alternative-model/validated.md b/versioned_docs/version-9.0.0/downloads/alternative-model/validated.md new file mode 100644 index 0000000..3dcb572 --- /dev/null +++ b/versioned_docs/version-9.0.0/downloads/alternative-model/validated.md @@ -0,0 +1,41 @@ +--- + +sidebar_position: 1 + +--- + +# Validated + + +Information about the validtion of the association between the research product and the funding information. + + +### validationDate + +_Type: String • Cardinality: ONE_ + + +When OpenAIRE collected the association between the funding and the research product from an authoritative source (i.e. Sygma). + + +```json + +"validationDate": "2021-0101" + +``` + + +### validatedByFunder + +_Type: Boolean • Cardinality: ONE_ + + +Specifies if the validation comes from the funder. + + +```json + + +"validatedByFunder": true + +``` \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/downloads/beginners-kit.md b/versioned_docs/version-9.0.0/downloads/beginners-kit.md new file mode 100644 index 0000000..0dea0a1 --- /dev/null +++ b/versioned_docs/version-9.0.0/downloads/beginners-kit.md @@ -0,0 +1,16 @@ +--- +sidebar_position: 2 +--- + +# Beginner's kit + +The large size of the OpenAIRE Graph is a major impediment for beginners to familiarise with the underlying data model and explore its contents. +Working with the Graph in its full size typically requires access to a huge distributed computing infrastructure which cannot be easily accessible to everyone. +[The OpenAIRE Beginner’s Kit](https://doi.org/10.5281/zenodo.7490191) aims to address this issue. It consists of two components: + + + +* A subset of the Graph composed of the research products published between 2022-06-29 and 2022-12-29, all the entities connected to them and the respective relationships. +* A Zeppelin notebook that demonstrates how you can use PySpark to analyse the Graph and get answers to some interesting research questions. A guide to Apache Zeppelin can be found [here](https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.5/bk_zeppelin-component-guide/content/ch_overview.html). \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/downloads/full-graph.md b/versioned_docs/version-9.0.0/downloads/full-graph.md new file mode 100644 index 0000000..0a1d399 --- /dev/null +++ b/versioned_docs/version-9.0.0/downloads/full-graph.md @@ -0,0 +1,50 @@ +--- +sidebar_position: 1 +--- + +# Full graph dataset + +You can download the full OpenAIRE Graph Dataset as well as its schema from the following links: + + Dataset: https://doi.org/10.5281/zenodo.3516917 + + Schema: https://doi.org/10.5281/zenodo.4238938 + +The schema used to create this dataset mirrors the one described in the [Data Model](/data-model). +This dataset is licensed under a Creative Commons Attribution 4.0 International License. +It is composed of several files so that you can download the parts you are interested into. The files are named after the entity they store (i.e. publication, dataset). Each file is at most 10GB and it is +a tar archive containing gz files, each with one json per line. + +## How to acknowledge this work + +Open Science services are open and transparent and survive thanks to your active support and to the visibility and reward they gather. If you use one of the [OpenAIRE Graph datasets](https://doi.org/10.5281/zenodo.3516917) for your research, please provide a proper citation following the recommendation that you find on the dataset's Zenodo page or as provided below. + +:::note How to cite + +Manghi P., Atzori C., Bardi A., Baglioni M., Schirrwagen J., Dimitropoulos H., La Bruzzo S., Foufoulas I., Mannocci A., Horst M., Czerniak A., Iatropoulou K., Kokogiannaki A., De Bonis M., Artini M., Lempesis A., Ioannidis A., Manola N., Principe P., Vergoulis T., Chatzopoulos S., Pierrakos D. (2022). "OpenAIRE Research Graph Dataset", *Dataset*, Zenodo. [doi:10.5281/zenodo.3516917](https://doi.org/10.5281/zenodo.3516917) ([BibTex](/bibtex/OpenAIRE_Research_Graph_dataset.bib)) +::: + +Please also consider citing [other relevant research products](/publications#relevant-research-products) that can be of interest. + +Also consider adding one of the following badges to your service with the appropriate link to [our website](https://graph.openaire.eu); click on the badges below to download the respective badge image files. + + + diff --git a/versioned_docs/version-9.0.0/downloads/related-datasets.md b/versioned_docs/version-9.0.0/downloads/related-datasets.md new file mode 100644 index 0000000..461fd2a --- /dev/null +++ b/versioned_docs/version-9.0.0/downloads/related-datasets.md @@ -0,0 +1,34 @@ +--- +sidebar_position: 4 +--- + +# Other related datasets + +In this page, we list other related datasets; please refer to their respective schema definitions for the data model they follow. + +## The dataset of ScholeXplorer + + Dataset: https://zenodo.org/doi/10.5281/zenodo.1200252 + + Schema (Scholix version 3): https://doi.org/10.5281/zenodo.1120275 + + Schema (Scholix version 4): https://doi.org/10.5281/zenodo.6351557 + +This dataset is licensed under a CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. +The dataset contains the GZ-compressed dataset of the Scholix links exposed by the OpenAIRE ScholeXplorer service. + +## The OpenAIRE LOD dataset + +:::caution + The OpenAIRE LOD dataset has been discontinued. The SPARQL Endpoint is no longer supported but old LOD datasets can be found in the link below. +::: + +Dataset (RDF): https://doi.org/10.5281/zenodo.609943 + + + + +The OpenAIRE Linked Open Data (LOD) Services and their integration with the OpenAIRE information space have been released as a beta version. The LOD exporting process started with a specification of the OpenAIRE data model as an RDF vocabulary, and then mapping of the OpenAIRE data to the graph-based RDF data model. To interlink the OpenAIRE data with related data on the Web, we have identified a list of potential datasets to interlinked with, including the DBpedia dataset extracted from Wikipedia and the publication databases DBLP and CiteSeer. + \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/downloads/subgraphs.md b/versioned_docs/version-9.0.0/downloads/subgraphs.md new file mode 100644 index 0000000..07d276d --- /dev/null +++ b/versioned_docs/version-9.0.0/downloads/subgraphs.md @@ -0,0 +1,70 @@ +--- +sidebar_position: 3 +--- + +# Sub-graph datasets + +In order to facilitate users, different datasets are available under the Zenodo community called [OpenAIRE Graph](https://zenodo.org/communities/openaire-research-graph). +This page lists all alternative datasets currently available. + + +## The OpenAIRE COVID-19 dataset + +Dataset: https://doi.org/10.5281/zenodo.3980490 + +Schema: https://doi.org/10.5281/zenodo.3974225 + +This dataset is licensed under a Creative Commons Attribution 4.0 International License. +It contains metadata records of publications, research data, software and projects on the topic of Corona Virus and COVID-19. +This dataset is part of the activities of OpenAIRE to support the fight against COVID-19 together with the OpenAIRE COVID-19 Gateway. +The dataset consists of a tar archive containing gzip files with one json per line. Please refer [here](#alternative-sub-graph-data-model) for details on the data model of this dataset. + +## The dataset of funded products + +Dataset: https://doi.org/10.5281/zenodo.4559725 + +Schema: https://doi.org/10.5281/zenodo.3974225 + +This dataset is licensed under a Creative Commons Attribution 4.0 International License. +It contains metadata records of research products (research literature, data, software, other types of research products) with funding +information available in the OpenAIRE Graph. Records are grouped by funder in a dedicated archive file. Each tar archive contains +gzip files, each with one json record per line. The model of this dataset differs from the one of the whole graph. +Please refer [here](#alternative-sub-graph-data-model) for details on the data model of this dataset. + +## The dataset of delta projects + +Dataset: https://doi.org/10.5281/zenodo.6419021 + +Schema: https://doi.org/10.5281/zenodo.4238938 + +This dataset is licensed under a Creative Commons Attribution 4.0 International License. +It contains the metadata records of projects collected by OpenAIRE in a given time frame. Usually one deposition of collected projects is done for each release of the OpenAIRE Graph +The deposition is one tar archive containing gzip files, each with one json record per line. + +## The datasets about research communities, initiatives and infrastructures + +Dataset: https://doi.org/10.5281/zenodo.3974604 + +Schema: https://doi.org/10.5281/zenodo.3974225 + +This dataset is licensed under a Creative Commons Attribution 4.0 International License. +The dataset contains one file per community/initiative/infrastructure collaborating with OpenAIRE. Check out also their community gateways on +CONNECT. Each file is a tar archive containing gzip files with one json per line. The only communities/research initiative/infrastructure included are publicly visible ones. +The model of this dataset differs from the one of the whole graph. +Please refer [here](#alternative-sub-graph-data-model) for details on the data model of this dataset. + +--- + +## Alternative sub-graph data model + +It should be noted that the datasets for research communities, infrastructures, and products related to projects do not strictly follow the main data model of the OpenAIRE Graph. In particular, they differ in the following: + +* only research products are included (no relations or other entities) +* the research products are extended with information that can be inferred in the whole dataset namely: + * funding information if present + * associated research community/infrastructure + * associated data sources + +So they have just one entity type, that is the [Extended Research Product](./alternative-model/extended-research-product.md). diff --git a/versioned_docs/version-9.0.0/faq.md b/versioned_docs/version-9.0.0/faq.md new file mode 100644 index 0000000..ace8840 --- /dev/null +++ b/versioned_docs/version-9.0.0/faq.md @@ -0,0 +1,7 @@ +--- +sidebar_position: 10 +--- + +# FAQ + +https://support.openaire.eu/projects/docs/wiki/FAQ \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/graph-production-workflow/_category_.json b/versioned_docs/version-9.0.0/graph-production-workflow/_category_.json new file mode 100644 index 0000000..8da8ce0 --- /dev/null +++ b/versioned_docs/version-9.0.0/graph-production-workflow/_category_.json @@ -0,0 +1,8 @@ +{ + "label": "Graph production workflow", + "position": 6, + "link": { + "type": "doc", + "id": "graph-production-workflow" + } +} \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/graph-production-workflow/aggregation/aggregation.md b/versioned_docs/version-9.0.0/graph-production-workflow/aggregation/aggregation.md new file mode 100644 index 0000000..f64c397 --- /dev/null +++ b/versioned_docs/version-9.0.0/graph-production-workflow/aggregation/aggregation.md @@ -0,0 +1,58 @@ +--- +sidebar_position: 1 +--- + +# Aggregation + +OpenAIRE materializes an open, participatory research graph (the OpenAIRE Graph) where products of the research life-cycle (e.g. scientific literature, research data, project, software) are semantically linked to each other and carry information about their access rights (i.e. if they are Open Access, Restricted, Embargoed, or Closed) and the sources from which they have been collected and where they are hosted. The OpenAIRE Graph is materialised via a set of autonomic, orchestrated workflows operating in a regimen of continuous data aggregation and integration. [1] + +## What does OpenAIRE collect? + +OpenAIRE aggregates metadata records describing objects of the research life-cycle from content providers compliant to the [OpenAIRE guidelines](https://guidelines.openaire.eu/) and from entity registries (i.e. data sources offering authoritative lists of entities, like [OpenDOAR](https://v2.sherpa.ac.uk/opendoar/), [re3data](https://www.re3data.org/), [DOAJ](https://doaj.org/), and various funder databases). After collection, metadata are transformed according to the OpenAIRE internal metadata model, which is used to generate the final OpenAIRE Graph, accessible from the [OpenAIRE EXPLORE portal](https://explore.openaire.eu) and the [APIs](https://graph.openaire.eu/develop/). + +The transformation process includes the application of cleaning functions whose goal is to ensure that values are harmonised according to a common format (e.g. dates as YYYY-MM-dd) and, whenever applicable, to a common controlled vocabulary. The controlled vocabularies used for cleansing are accessible at [api.openaire.eu/vocabularies](https://api.openaire.eu/vocabularies/). Each vocabulary features a set of controlled terms, each with one code, one label, and a set of synonyms. If a synonym is found as field value, the value is updated with the corresponding term. +In addition, the OpenAIRE Graph is extended with other relevant scholarly communication sources that need special handling, either because they do not strictly follow the OpenAIRE Guidelines or due to the vast amount of data of data they offer; these include Crossref, ORCID, Microsoft Academic Graph, Unpaywall). + ++ +
+ +The OpenAIRE aggregation system collects information about objects of the research life-cycle compliant to the [OpenAIRE acquisition policy](https://www.openaire.eu/content-acquisition-policy) from [different types of data sources](https://explore.openaire.eu/search/find/dataproviders): + +1. Scientific literature metadata and full-texts from institutional and thematic repositories, CRIS (Common Research Information Systems), Open Access journals and publishers; +2. Dataset metadata from data repositories and data journals; +3. Scientific literature, data and software metadata from Zenodo; +4. Metadata about data sources, organizations, projects, and funding programs from entity registries, i.e. authoritative sources such as CORDA and other funder databases for projects, OpenDOAR for publication repositories, re3data for data repositories, DOAJ for Open Access journals; +5. Metadata of open source research software from software repositories and SoftwareHeritge +6. Metadata about other types of research products, like workflow, protocols, methods, research packages + +Relationships between objects are collected from the data sources, but also automatically detected by [inference algorithms](https://www.openaire.eu/blogs/text-mining-services-in-openaire-1) and added by authenticated users, who can insert links between literature, datasets, software and projects via [the “Link” procedure available from the OpenAIRE explore portal](https://explore.openaire.eu). More information about the linking functionality can be found [here](https://www.openaire.eu/linking). + +## What kind of data sources are in OpenAIRE? + +Objects and relationships in the OpenAIRE Graph are extracted from information packages, i.e. metadata records, collected from data sources of the following kinds: + +- *Literature, Institutional and thematic repositories*: Information systems where scientists upload the bibliographic metadata and full-texts of their articles, due to obligations from their organization or due to community practices (e.g. ArXiv, Europe PMC); +- *Open Access Publishers and journals*: Information system of open access publishers or relative journals, which offer bibliographic metadata and PDFs of their published articles; +- *Data archives*: Information systems where scientists deposit descriptive metadata and files about their research data (also known as scientific data, datasets, etc.).; +- *Hybrid repositories/archives*: information systems where scientists deposit metadata and file of any kind of scientific products, incuding scientific literature, research data and research software (e.g. Zenodo) +- *Aggregator services*: Information systems that collect descriptive metadata about publications or datasets from multiple sources in order to enable cross-data source discovery of given research products. Examples are DataCite, BASE, DOAJ; +- *Entity Registries*: Information systems created with the intent of maintaining authoritative registries of given entities in the scholarly communication, such as OpenDOAR for the institutional repositories, re3data for the data repositories, CORDA and other funder databases for projects and funding information; +- *CRIS*: Information systems adopted by research and academic organizations to keep track of their research administration records and relative research products; examples of CRIS content are articles or datasets funded by projects, their principal investigators, facilities acquired thanks to funding, etc.. +- *Research Graphs*: services that maintain an information space of (possibly interlinked) scholalrly communication objects. Examples are CrossRef, ScholeXplorer and OpenAIRE itself. + +## How does OpenAIRE collect metadata records? + +OpenAIRE collects metadata records describing objects of the research life-cycle from content providers compliant to the OpenAIRE guidelines and from entity registries (i.e. data sources offering authoritative lists of entities, like OpenDOAR, re3data, DOAJ, and funder databases). + +The OpenAIRE aggregator collects metadata records in the majority of cases via [OAI-PMH](https://www.openarchives.org/pmh/), but also supports other standard exchange protocols like FTP(S), SFTP, and some RESTful API. +The whole list of available and used collectors could be found in the [RedMine Wiki - API Protocols](https://support.openaire.eu/projects/openaire/wiki/API_protocols) + +For additional details about the aggregation workflows, please refer to [2]. + + +## References + +[1] Manghi, P., Artini, M., Atzori, C., Bardi, A., Mannocci, A., La Bruzzo, S., Candela, L., Castelli, D. and Pagano, P. (2014), “The D-NET software toolkit: A framework for the realization, maintenance, and operation of aggregative infrastructures”, Program: electronic library and information systems, Vol. 48 No. 4, pp. 322-354. [doi:10.1108/prog-08-2013-0045](http://doi.org/10.1108/prog-08-2013-0045) + +[2] Atzori, C., Bardi, A., Manghi, P., & Mannocci, A. (2017, January). "The OpenAIRE workflows for data management". In Italian Research Conference on Digital Libraries (pp. 95-107). Springer, Cham. [doi:10.1007/978-3-319-68130-6_8](https://doi.org/10.1007/978-3-319-68130-6_8) \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/graph-production-workflow/aggregation/compatible-sources.md b/versioned_docs/version-9.0.0/graph-production-workflow/aggregation/compatible-sources.md new file mode 100644 index 0000000..48d831e --- /dev/null +++ b/versioned_docs/version-9.0.0/graph-production-workflow/aggregation/compatible-sources.md @@ -0,0 +1,11 @@ +--- +sidebar_position: 1 +--- + +# OpenAIRE compatible sources + +The OpenAIRE aggregator collects metadata records from content providers compliant to the OpenAIRE guidelines. + +The OpenAIRE Guidelines help repository managers expose publications, datasets and CRIS metadata via the OAI-PMH protocol in order to integrate with OpenAIRE infrastructure. + +You can find more information in https://guidelines.openaire.eu/en/latest/ \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/graph-production-workflow/aggregation/non-compatible-sources/crossref_unpaywall.md b/versioned_docs/version-9.0.0/graph-production-workflow/aggregation/non-compatible-sources/crossref_unpaywall.md new file mode 100644 index 0000000..54ab378 --- /dev/null +++ b/versioned_docs/version-9.0.0/graph-production-workflow/aggregation/non-compatible-sources/crossref_unpaywall.md @@ -0,0 +1,165 @@ +# Crossref & Unpaywall + +This section describes the procedure used to integrate the contents from [Crossref](https://www.crossref.org) and [Unpaywall](https://unpaywall.org) in the OpenAIRE Graph. + +## Data acquisition + +The dataset containing all the Crossref records is obtained via a complete data dump on a monthly basis. +The Unpaywall dataset is no longer updated anymore but its latest snapshot (Dec 2021) is used to enrich the Crossref contents. + +## Process + +In the following we describe the process applied to the Crossref & the Unpaywall contents. + +### Crossref filtering + +Records in Crossref are ruled out according to the following criteria + +* have blank title, examples: + * `10.1093/rheumatology/41.7.837` + * `10.1093/qjmed/95.7.430` + * `10.1371/journal.pone.0171434.g005` +* have one of the following publishers: `"Test accounts"`, `"CrossRef Test Account"` + * Examples from https://api.crossref.org/works?query.publisher-name=%22Test%20accounts%22 + * `10.1007/bf00344543` + * `10.1007/bf00186154` + * `10.1306/64ed947a-1724-11d7-8645000102c1865d` +* have authors matching the following invalid names: `",", "none none", "none, none", "none &na;", "(:null)", "test test test", "test test", "test", "&na; &na"` + * Examples for `"none"` author from https://api.crossref.org/works?query.author=%22none%22 + * `10.4007/annals.2016.184.3.11` + * `10.4007/annals.2012.176.1.6` + * `10.2172/6393585` + * Examples for `"test"` author from https://api.crossref.org/works?query.author=%22test%22 + * `10.5116/ijme.54ca.a5ae` + * `10.5755/j01.ss.71.2.544` + * `10.5755/j01.ee.22.2.319` +* have `"Addie Jackson"` as author and `"Elsevier BV"` as publisher (empirically we say they are test records) + * Examples from https://api.crossref.org/works?query.author=Addie+Jackson&query.publisher-name=%22Elsevier%20BV%22 + * `10.2139/ssrn.2082156` + * `10.2139/ssrn.2202300` + * `10.2139/ssrn.2255657` +* have not one of the following values in the field `type` : `"book-section"`, `"book"`, `"book-chapter"`, `"book-part"`, `"book-series"`, `"book-set"`, `"book-track"`, `"edited-book"`, `"reference-book"`, `"monograph"`, `"journal-article"`, `"dissertation"`, `"other"`, `"peer-review"`, `"proceedings"`, `"proceedings-article"`, `"reference-entry"`, `"report"`, `"report-series"`, `"standard"`, `"standard-series"`, `"posted-content"`, `"dataset"`, + * Example: + * `10.1371/journal.pone.0171434.g005` + * `10.7554/elife.21052.049` + * `10.1371/journal.pcbi.1005379.s006` + +Records with `type=dataset` are mapped into OpenAIRE research products of type dataset. All others are mapped as OpenAIRE research products of type publication. + +### Mapping Crossref properties into the OpenAIRE Graph + +Properties in OpenAIRE research products are set based on the logic described in the following table: + +| OpenAIRE Research Product field path | Crossref path(s) | Notes | +|----------------------------------------|--------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `id` | `doi` | id in the form `doi_________::md5(doi)` | +| `dateofcollection` | `indexed.datetime` | | +| `lastupdatetimestamp` | `indexed.timestamp` | | +| `type` | `type` | Using the **_dnet:result_typologies_** vocabulary, we look up the `instance.type` synonym to generate one of the following main entities:+ +
+ + +* data sources: it is possible to list a set of data sources relevant for the RC/RI. All research products collected from these data sources will be linked to the RC/RI ++ +
+ +When only some research products collected from a datasource are relevant for the RC/RI, it is possible to specify a set of selection constraints (SC) that have to be verified before linking the research product to the +community. The selection constraint has the form SC = S1 or S2 or ... or Sn. The generic Si has the form Si = si1 and si2 and ...and sin and each sij is a condition on a specific field of the research product. The set of fields that can be specified is F={title, author, contributor, description, orcid}, +while the set of condition can be among V={contains, equals, not_contains, not_equals, contains_ignorecase, equals_ignorecase, not_contains_ignorecase, not_equal_ignorecase}, and the value is free text. +A possible selection criteria can be: “All the products whose contributor contains DARIAH “ + ++ +
+ +* Zenodo community: it is possible to list a set of Zenodo communities relevant for the RC/RI. All the products collected from the listed Zenodo communities are linked to the RC/RI + + ++ +
+ + +The list of subjects, Zenodo communities and data sources used to enrich the products are defined by the managers of the community gateway or infrastructure monitoring dashboard associated with the RC/RI. diff --git a/versioned_docs/version-9.0.0/graph-production-workflow/deduction-and-propagation/propagation.md b/versioned_docs/version-9.0.0/graph-production-workflow/deduction-and-propagation/propagation.md new file mode 100644 index 0000000..93e71e9 --- /dev/null +++ b/versioned_docs/version-9.0.0/graph-production-workflow/deduction-and-propagation/propagation.md @@ -0,0 +1,55 @@ +# Propagation + +This process enriches the graph by adding new links and/or new properties. The new information is added by exploiting existing semantic +relationships and values between the involved entities + +As of November 2022, the following procedures are in place: + +* Country propagation: updates the property “country” of a research product. This happens when the research product is collected from an institutional datasource or when the datasource hosting the research product is inserted in a whitelist. For all the research products whose hosting datasource verifies one of the conditions above, the country of the organization providing the datasource is added to the country of the research product: e.g. publication collected from an institutional repository maintained by an italian university will be enriched with the property “country = IT”. ++ +
+ +* Project propagation: adds a "isProducedBy" relationship (and its inverse) between a Project P and research product R1, if R1 has a strong semantic relationship with another research product R2 and P produces R2: e.g. publication linked to project P “is supplemented by” a dataset D. Dataset D will get the link to project P. The relationships considered for this procedure are “isSupplementedBy” and “isSupplementTo”. ++ +
+* Research product to RC/RI through organization propagation. The manager of the RC/RI can specify a set of organizations whose product are relevant for the +community. +Each research product having such a relation of affiliation with at least one organization relevant for the RC/RI will be linked to it. ++ +
+ +* Research product to RC/RI through semantic relation: extends the set of products linked to a RC/RI by exploiting strong semantic relationships between the research products; +e.g. if a research product R1 is associated to the community C and is supplemented by a research product R2 then R2 will be linked to the community. The relationships considered for this procedure are “isSupplementedBy” and “supplements”. ++ +
+* ORCID identifiers to research product through semantic relation. This propagation enriches the research products by adding ORCID identifiers to authors. The added ORCID will be marked as "potential" since they have been inserted through propagation. +The process considers the set of overlapping authors between research products (R1 and R2) linked with a strong semantic relationship (IsSupplementedBy, IsSupplementTo). +For each author A in the overlapping set, if R1 provides the ORCID value for A and R2 does not, then the author A in R2 will be enriched with the information of the ORCID found in R1. + ++ +
+ +* affiliation to organization through institutional repository. This propagation adds one "hasAuthorInstitution" relationship (and its inverse) +between a research product R and Organization O, +if R was collected from a datasource D with type institutional repository, and D was provided by O. ++ +
+ +* affiliation to organization through semantic relation. This propagation adds one "hasAuthorInstitution" relationship (and its inverse) between a +research product R and an Organization O, +if R has an affiliation relation with an organization O1 that is in relation "isChildOf" with O. + ++ +
+ The algorithm exploits only the organization leaves that are in a "IsChildOf" relation with another organization. So far one single step is done ++ +
\ No newline at end of file diff --git a/versioned_docs/version-9.0.0/graph-production-workflow/deduplication/_category_.json b/versioned_docs/version-9.0.0/graph-production-workflow/deduplication/_category_.json new file mode 100644 index 0000000..c80249b --- /dev/null +++ b/versioned_docs/version-9.0.0/graph-production-workflow/deduplication/_category_.json @@ -0,0 +1,8 @@ +{ + "label": "Deduplication", + "position": 2, + "link": { + "type": "doc", + "id": "deduplication" + } +} \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/graph-production-workflow/deduplication/clustering-functions.md b/versioned_docs/version-9.0.0/graph-production-workflow/deduplication/clustering-functions.md new file mode 100644 index 0000000..ded6c57 --- /dev/null +++ b/versioned_docs/version-9.0.0/graph-production-workflow/deduplication/clustering-functions.md @@ -0,0 +1,93 @@ +--- +sidebar_position: 3 +--- +# Clustering functions + +## Ngrams + +It creates ngrams from the input field.+ +
+ +[//]: # (Link to the image: https://docs.google.com/drawings/d/1lLLSU3wsWighmxGQMNMZbgP3mg3BfDVAGVLwt4_OFA8/edit?usp=sharing) + +### Collection import + +The nodes in the graph represent entities of different types. This phase is responsible for identifying all the nodes with a given type and make them available to the subsequent phases representing them in the deduplication record model. + +### Candidate identification (clustering) + +Clustering is a common heuristics used to overcome the N x N complexity required to match all pairs of objects to identify the equivalent ones. The challenge is to identify a [clustering function](./clustering-functions) that maximizes the chance of comparing only records that may lead to a match, while minimizing the number of records that will not be matched while being equivalent. Since the equivalence function is to some level tolerant to minimal errors (e.g. switching of characters in the title, or minimal difference in letters), we need this function to be not too precise (e.g. a hash of the title), but also not too flexible (e.g. random ngrams of the title). On the other hand, reality tells us that in some cases equality of two records can only be determined by their PIDs (e.g. DOI) as the metadata properties are very different across different versions and no [clustering function](./clustering-functions) will ever bring them into the same cluster. + +### Duplicates identification (pair-wise comparisons) + +Pair-wise comparisons are conducted over records in the same cluster following the strategy defined in the decision tree. A different decision tree is adopted depending on the type of the entity being processed. + +To further limit the number of comparisons, a sliding window mechanism is used: (i) records in the same cluster are lexicographically sorted by their title, (ii) a window of K records slides over the cluster, and (iii) records ending up in the same window are pair-wise compared. The result of each comparison produces a similarity relation when the pair of record matches. Such relations will be consequently used as input for the duplicates grouping stage. + +### Duplicates grouping (transitive closure) + +Once the similarity relations between pairs of records are drawn, the groups of equivalent records are obtained (transitive closure, i.e. “mesh”). From such sets a new **representative record** is obtained, which inherits properties from the merged records and keeps track of their provenance. + +### Relation redistribution + +Relations involved in nodes identified as duplicated are eventually marked as virtually deleted and used as template for creating a new relation pointing to the new representative record. +Note that nodes and relationships marked as virtually deleted are not exported. + ++ +
+ +[//]: # (Link to the image: https://docs.google.com/drawings/d/1cDEuVhWnSO8lUZs_Nd748vKfIPxg10jbwKSVZlv33Mg/edit?usp=sharing) \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/graph-production-workflow/deduplication/organizations.md b/versioned_docs/version-9.0.0/graph-production-workflow/deduplication/organizations.md new file mode 100644 index 0000000..c2c57e1 --- /dev/null +++ b/versioned_docs/version-9.0.0/graph-production-workflow/deduplication/organizations.md @@ -0,0 +1,70 @@ +--- +sidebar_position: 2 +--- + +# Organizations + +The organizations in OpenAIRE are aggregated from different registries (e.g. CORDA, OpenDOAR, Re3data, ROR). In some cases, a registry provides organizations as entities with their own persistent identifier. In other cases, those organizations are extracted from other main entities provided by the registry (e.g. datasources, projects, etc.). + +The deduplication of organizations is enhanced by the [OpenOrgs](https://orgs.openaire.eu), a tool that combines an automated approach for identifying duplicated instances +of the same organization record with a "humans in the loop" approach, in which the equivalences produced by a duplicate identification algorithm are suggested to data curators, in charge for validating them. +The data curation activity is twofold, on one end pivots around the disambiguation task, on the other hand assumes to improve the metadata describing the organization records +(e.g. including the translated name, or a different PID) as well as defining the hierarchical structure of existing large organizations (i.e. Universities comprising its departments or large research centers with all its sub-units or sub-institutes). + +Duplicates among organizations are therefore managed through three different stages: + * *Creation of Suggestions*: executes an automatic workflow that performs the deduplication and prepare new suggestions for the curators to be processed; + * *Curation*: manual editing of the organization records performed by the data curators; + * *Creation of Representative Organizations*: executes an automatic workflow that creates curated organizations and exposes them on the OpenAIRE Graph by using the curators' feedback from the OpenOrgs underlying database. + +The next sections describe the above mentioned stages. + +### Creation of Suggestions + +This stage executes an automatic workflow that faces the *candidate identification* and the *duplicates identification* stages of the deduplication to provide suggestions for the curators in the OpenOrgs. + +#### Candidate identification (clustering) + +To match the requirements of limiting the number of comparisons, OpenAIRE clustering for organizations aims at grouping records that would more likely be comparable. +It works with four functions: +* *URL-based function*: the function generates the URL domain when this is provided as part of the record properties from the organization's `websiteurl` field; +* *Title-based functions*: + * generate strings dependent to the keywords in the `legalname` field; + * generate strings obtained as an alternation of the function prefix(3) and suffix(3) (and vice versa) on the first 3 words of the `legalname` field; + * generate strings obtained as a concatenation of ngrams of the `legalname` field; + +#### Duplicates identification (pair-wise comparisons) + +For each pair of organization in a cluster the following strategy (depicted in the figure below) is applied. +The comparison goes through the following decision tree: +1. *grid id check*: comparison of the grid ids. If the grid id is equivalent, then the similarity relation is drawn. If the grid id is not available, the comparison proceeds to the next stage; +2. *early exits*: comparison of the numbers extracted from the `legalname`, the `country` and the `website` url. No similarity relation is drawn in this stage, the comparison proceeds only if the compared fields verified the conditions of equivalence; +3. *city check*: comparison of the city names in the `legalname`. The comparison proceeds only if the legalnames shares at least 10% of cities; +4. *keyword check*: comparison of the keywords in the `legalname`. The comparison proceeds only if the legalnames shares at least 70% of keywords; +5. *legalname check*: comparison of the normalized `legalnames` with the `Jaro-Winkler` distance to determine if it is higher than `0.9`. If so, a similarity relation is drawn. Otherwise, no similarity relation is drawn. + ++ +
+ +[//]: # (Link to the image: https://docs.google.com/drawings/d/1YKInGGtHu09QG4pT2gRLEum4LxU82d4nKkvGNvRQmrg/edit?usp=sharing) + +### Data Curation + +All the similarity relations drawn by the algorithm involving the decision tree are exposed in OpenOrgs, where are made available to the data curators to give feedbacks and to improve the organizations metadata. +A data curator can: + * *edit organization metadata*: legalname, pid, country, url, parent relations, etc.; + * *approve suggested duplicates*: establish if an equivalence relation is valid; + * *discard suggested duplicates*: establish if an equivalence relation is wrong; + * *create similarity relations*: add a new equivalence relation not drawn by the algorithm. + +Note that if a curator does not provide a feedback on a similarity relation suggested by the algorithm, then such relation is considered as valid. + +### Creation of Representative Organizations + +This stage executes an automatic workflow that faces the *duplicates grouping* stage to create representative organizations and to update them on the OpenAIRE Graph. Such organizations are obtained via transitive closure and the relations used comes from the curators' feedback gathered on the OpenOrgs underlying Database. + +#### Duplicates grouping (transitive closure) + +Once the similarity relations between pairs of organizations have been gathered, the groups of equivalent organizations are obtained (transitive closure, i.e. “mesh”). From such sets a new representative organization is obtained, which inherits all properties from the merged records and keeps track of their provenance. + +The IDs of the representative organizations are obtained by the OpenOrgs Database that creates a unique ``openorgs`` ID for each approved organization. In case an organization is not approved by the curators, the ID is obtained by appending the prefix ``pending_org`` to the MD5 of the first ID (given their lexicographical ordering). \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/graph-production-workflow/deduplication/research-products.md b/versioned_docs/version-9.0.0/graph-production-workflow/deduplication/research-products.md new file mode 100644 index 0000000..52e5d2f --- /dev/null +++ b/versioned_docs/version-9.0.0/graph-production-workflow/deduplication/research-products.md @@ -0,0 +1,232 @@ +--- +sidebar_position: 1 +--- + +# Research products + +Duplicates among research products are identified among results of the same +type (publications, datasets, software, other research products). If two +duplicate research products are aggregated one as a dataset and one as a +software, for example, they will never be compared and they will never be +identified as duplicates. +OpenAIRE supports different deduplication strategies based on the type of +results. + +The next sections describe how each stage of the deduplication workflow is faced +for research products. + +### Candidate identification (clustering) + +To match the requirements of limiting the number of comparisons, OpenAIRE +clustering for research products works with two different strategies based on +entity types: + +#### Software + +* *Title extraction functions*: + two clustering functions are applied to the title (normalized, stemming, etc.) + * *stats and suffix prefix of words*: the function generates a key that + depends on (i) number of significant words in the title, (ii) module 10 of + the number of characters of such words, and (iii) a + string + obtained as an alternation of the function prefix(3) and suffix(3) (and + vice-versa) on the first 3 words (2 words if the title only has 2). For + example, the title ``Search for the Standard Model Higgs Boson`` + becomes the two keys ``5-3-seaardmod`` and ``5-3-rchstadel`` + * *n-grams*: the function generates ngrams from the + title. For example, the + title ``Search for the Standard Model Higgs Boson`` + becomes the keys ``tan``, ``sta``, ``ode``, ``mod``, ``ear``, ``hig``, + ``igg``, ``sea`` +* *DOI extraction function*: the function generates the DOI when this is + provided as part of the record properties +* *URL extraction function*: the function generates the hostname part provided + by the URL of the software, if any + +#### Publication, Dataset and Other Research Product + +* *PID extraction function*: the function generates the PIDs when at least one + is provided as part of the ``pid`` record properties +* *Author and Title extraction function*: the function generates a key that + depends on (i) the number of authors of the product, with a cap of 21 + authors (ii) number of significant words in the title (normalized, stemming, + etc.), divided by 10, and (iii) a string obtained as an alternation of the + function prefix(3) and suffix(3) (and vice versa) on the first 3 words (2 + words if the title only has 2). ++ +
+ +[//]: # (Link to the image: https://docs.google.com/drawings/d/19SIilTp1vukw6STMZuPMdc0pv0ODYCiOxP7OU3iPWK8/edit?usp=sharing) + +#### Datasets and Other types of research products + +For each pair of datasets or other types of research products in a cluster the +strategy depicted in the figure below is applied. +The decision tree is almost identical to the publication decision tree, with the +only exception of the *instance type check* stage. Since such type of record +does not have a relatable instance type, the check is not performed and the +decision tree node is skipped. + ++ +
+ +[//]: # (Link to the image: https://docs.google.com/drawings/d/1uBa7Bw2KwBRDUYIfyRr_Keol7UOeyvMNN7MPXYLg4qw/edit?usp=sharing) + +#### Software + +For each pair of software in a cluster the following strategy (depicted in the +figure below) is applied. +The comparison goes through different stages: + +1. *DOI pids and URLs check*: comparison of the pids of type DOI and URLs in the + records. If at least 1 DOI is equivalent or 1 URL is equivalent, then records + match and the similarity relation is drawn +2. *title check*: comparison of the record titles with Levenshtein distance, + excluding versioning information. + If the distance is below 0.95 then the records does not match. Otherwise, the + comparison proceeds to the next stage +3. *untrusted DOI check*: comparison of all the available DOIs (in the `pid` and + the `alternateid` fields of the record). If at least 1 DOI is equivalent, + records match and the similarity relation is drawn +4. *authors check*: "smart" comparison of the author lists to check if the two + products share all authors + ++ +
+ +[//]: # (Link to the image: https://docs.google.com/drawings/d/19gd1-GTOEEo6awMObGRkYFhpAlO_38mfbDFFX0HAkuo/edit?usp=sharing) + +### Duplicates grouping + +The aim of the final stage is the creation of records that group all the +equivalent entities discovered pairwise by the previous step. This is done in +multiple phases. + +#### Transitive closure + +As the concluding step of duplicate identification, a transitive closure is +performed against similarity relations to identify complete groups of duplicated +records (cliques). If a group exceeds 200 elements, only the first 200 elements +are included in the group, while the remaining elements are kept ungrouped. + +#### Selection of the pivot record + +Each group of duplicate records needs to be identified in the final graph with +an OpenAIRE identifier, derived from a record of the group known as the _pivot +record_. It is determined after sorting the group of duplicate records by the +following criteria: + +1. Records with identifiers from a [PID authority](/data-model/pids-and-identifiers#pid-authorities). +2. Records chosen as pivots in the graph's previous generations. +3. Publications from CrossRef or datasets from DataCite. +4. Records with an earlier date of acceptance. +5. Records with smaller IDs in lexicographical order. + +The first sorting criterion is possible because a state table, called "pivot +history", is maintained across graph generations. It keeps track of which +records were used as pivot records in what graph, guaranteed to retain data for +the last 12 months. + +#### Creation of representative records + +The representative record, also known as the "dedup record", replaces the group +of deduplicated records in the graph. + +##### OpenAIRE identifier of the representative record + +The OpenAIRE identifier of the representative record is generated based on the +identifier of the record chosen as the pivot of the group: + +- if the pivot record comes from a "PID authority", the identifier of the + representative record is the same, but the "PID Type Prefix" part of the + identifier is modified to append ``_dedup``.+ +
+ diff --git a/versioned_docs/version-9.0.0/graph-production-workflow/indexing.md b/versioned_docs/version-9.0.0/graph-production-workflow/indexing.md new file mode 100644 index 0000000..759f1a2 --- /dev/null +++ b/versioned_docs/version-9.0.0/graph-production-workflow/indexing.md @@ -0,0 +1,17 @@ +# Indexing + +The final version of the OpenAIRE Graph is indexed on a Solr server that is used by the OpenAIRE portals ([EXPLORE](https://explore.openaire.eu), [CONNECT](https://connect.openaire.eu), [PROVIDE](https://provide.openaire.eu)) and APIs, the latter adopted by several third-party applications and organizations, such as: + +* The OpenAIRE Graph APIs and Portals will offer to the EOSC (European Open Science Cloud) an Open Science Resource Catalogue, keeping an up to date map of all research products (publications, datasets, software), services, organizations, projects, funders in Europe and beyond. + +* DSpace & EPrints repositories can install the OpenAIRE plugin to expose OpenAIRE compliant metadata records via their OAI-PMH endpoint and offer to researchers the possibility to link their depositions to the funding project, by selecting it from the list of project provided by OpenAIRE. + +* EC participant portal (Sygma - System for Grant Management) uses the OpenAIRE API in the “Continuous Reporting” section. Sygma automatically fetches from the OpenAIRE Search API the list of publications and datasets in the OpenAIRE Graph that are linked to the project. The user can select the research products from the list and easily compile the continuous reporting data of the project. + +* ScholExplorer is used by different players of the scholarly communication ecosystem. For example, [Elsevier](https://www.elsevier.com/authors/tools-and-resources/research-data/data-base-linking) uses its API to make the links between +publications and datasets automatically appear on ScienceDirect. +ScholExplorer indexes the links among the four major types of research products (API v3) available in the OpenAIRE Graph and makes them available through an HTTP API that allows +to search them by the following criteria: + * Links whose source object has a given PID or PID type; + * Links whose source object has been published by a given data source ("data source as publisher"); + * Links that were collected from a given data source ("data source as provider"). diff --git a/versioned_docs/version-9.0.0/graph-production-workflow/indicators-ingestion/fos-classification.md b/versioned_docs/version-9.0.0/graph-production-workflow/indicators-ingestion/fos-classification.md new file mode 100644 index 0000000..8a3270a --- /dev/null +++ b/versioned_docs/version-9.0.0/graph-production-workflow/indicators-ingestion/fos-classification.md @@ -0,0 +1,2 @@ +# Field of Science + diff --git a/versioned_docs/version-9.0.0/graph-production-workflow/indicators-ingestion/impact-indicators.md b/versioned_docs/version-9.0.0/graph-production-workflow/indicators-ingestion/impact-indicators.md new file mode 100644 index 0000000..23476e1 --- /dev/null +++ b/versioned_docs/version-9.0.0/graph-production-workflow/indicators-ingestion/impact-indicators.md @@ -0,0 +1,170 @@ +# Citation-based impact indicators + +This page summarises all calculated citation-based impact indicators, provided by [BIP!](https://bip.imsi.athenarc.gr/), which are included in the [bipIndicators](../../data-model/entities/other#bipindicators) property (found under the [indicators](../../data-model/entities/research-product#indicators) property of the reseach product). + +It should be noted that the citation-based impact indicators are being calculated on the level of the research output. +Below we explain their main intuition, the way they are calculated, and their most important limitations, in an attempt help avoiding common pitfalls and misuses. + + +## Citation Count (CC) • influence_alt + +***Short description:*** +This is the most widely used citation-based impact indicator, which sums all citations received by each article. +Citation count can be viewed as a measure of a publication's overall (citation-based) impact, since it conveys the number of other works that directly +drew on it. + +***Algorithmic details:*** +The citation count of a +publication $i$ corresponds to the in-degree of the corresponding node in the underlying citation network: $s_i = \sum_{j} A_{i,j}$, +where $A$ is the adjacency matrix of the network (i.e., $A_{i,j}=1$ when paper $j$ cites paper $i$, while $A_{i,j}=0$ otherwise). + +***Parameters:*** - + +***Limitations:*** +OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator. +Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source. + +***Environment:*** PySpark + +***References:*** - + +***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker) + + +## "Incubation" Citation Count (iCC) • impulse + +***Short description:*** +This measure is essentially a time-restricted version of the citation count, where the time window is distinct for each paper, i.e., +only citations $y$ years after its publication are counted. + +***Algorithmic details:*** +The "incubation" citation count of a paper $i$ is +calculated as: $s_i = \sum_{j,t_j \leq t_i+y} A_{i,j}$, where $A$ is the adjacency matrix and $t_j, t_i$ are the citing and cited paper's +publication years, respectively. $t_i$ is cited paper $i$'s publication year. iCC can be seen as an indicator of a paper's initial momentum +(impulse) directly after its publication. + +***Parameters:*** +$y=3$ + +***Limitations:*** +OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator. +Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source. + +***Environment:*** PySpark + +***References:*** +* Vergoulis, T., Kanellos, I., Atzori, C., Mannocci, A., Chatzopoulos, S., Bruzzo, S. L., Manola, N., & Manghi, P. (2021, April). Bip! db: A dataset of impact measures for scientific publications. In Companion Proceedings of the Web Conference 2021 (pp. 456-460). + +***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker) + + + ## PageRank (PR) • influence + +***Short description:*** +Originally developed to rank Web pages, PageRank has been also widely used to rank publications in citation +networks. In this latter context, a publication's PageRank +score also serves as a measure of its influence. + +***Algorithmic details:*** +The PageRank score of a publication is calculated +as its probability of being read by a researcher that either randomly selects publications to read or selects +publications based on the references of her latest read. Formally, the score of a publication $i$ is given by: + +$$ +s_i = \alpha \cdot \sum_{j} P_{i,j} \cdot s_j + (1-\alpha) \cdot \frac{1}{N} +$$ + +where $P$ is the stochastic transition matrix, which corresponds to the column normalised version of adjacency +matrix $A$, $\alpha \in [0,1]$, and $N$ is the number of publications in the citation network. The first addend +of the equation corresponds to the selection (with probability $\alpha$) of following a reference, while the +second one to the selection of randomly choosing any publication in the network. It should be noted that the +score of each publication relies of the score of publications citing it (the algorithm is executed iteratively +until all scores converge). As a result, PageRank differentiates citations based on the importance of citing +articles, thus alleviating the corresponding issue of the Citation Count. + +***Parameters:*** +$\alpha = 0.5, convergence\_error = 10^{-12}$ + +***Limitations:*** +OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator. +Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source. + +***Environment:*** PySpark + +***References:*** +* Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford InfoLab. + +***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker) + + +## RAM • popularity_alt + +***Short description:*** +RAM is essentially a modified Citation Count, where recent citations are considered of higher importance compared to older ones. +Hence, it better captures the popularity of publications. This "time-awareness" of citations +alleviates the bias of methods like Citation Count and PageRank against recently published articles, which have +not had "enough" time to gather as many citations. + +***Algorithmic details:*** +The RAM score of each paper $i$ is calculated as follows: + +$$ +s_i = \sum_j{R_{i,j}} +$$ + +where $R$ is the so-called Retained Adjacency Matrix (RAM) and $R_{i,j}=\gamma^{t_c-t_j}$ when publication $j$ cites publication +$i$, and $R_{i,j}=0$ otherwise. Parameter $\gamma \in (0,1)$, $t_c$ corresponds to the current year and $t_j$ corresponds to the +publication year of citing article $j$. + +***Parameters:*** +$\gamma = 0.6$ + +***Limitations:*** +OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator. +Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source. + +***Environment:*** PySpark + +***References:*** +* Ghosh, R., Kuo, T. T., Hsu, C. N., Lin, S. D., & Lerman, K. (2011, December). Time-aware ranking in dynamic citation networks. In 2011 ieee 11^{th} international conference on data mining workshops (pp. 373-380). IEEE. + +***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker) + + +## AttRank • popularity + +***Short description:*** +AttRank is a PageRank variant that alleviates its bias against recent publications (i.e., it is tailored to capture popularity). +AttRank achieves this by modifying PageRank's probability of randomly selecting a publication. Instead of using a uniform probability, +AttRank defines it based on a combination of the publication's age and the citations it received in recent years. + +***Algorithmic details:*** +The AttRank score +of each publication $i$ is calculated based on: + +$$ +s_i = \alpha \cdot \sum_{j} P_{i,j} \cdot s_j + + \beta \cdot Att(i)+ \gamma \cdot c \cdot e^{-\rho \cdot (t_c-t_i)} +$$ + +where $\alpha + \beta + \gamma =1$ and $\alpha,\beta,\gamma \in [0,1]$. $Att(i)$ denotes a recent attention-based score for publication $i$, +which reflects its share of citations in the $y$ most recent years, $t_i$ is the publication year of article $i$, $t_c$ denotes the current +year, and $c$ is a normalisation constant. Finally, $P$ is the stochastic transition matrix. + +***Parameters:*** +$\alpha = 0.2, \beta = 0.5, \gamma = 0.3, \rho = -0.16, convergence\_error = 10^-{12}$ + +Note that recent attention is based on the 3 most recent years (including current one). + +***Limitations:*** +OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator. +Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source. + +***Environment:*** PySpark + +***References:*** +* Kanellos, I., Vergoulis, T., Sacharidis, D., Dalamagas, T., & Vassiliou, Y. (2021, April). Ranking papers by their short-term scientific impact. In 2021 IEEE 37th International Conference on Data Engineering (ICDE) (pp. 1997-2002). IEEE. + +***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker) + + \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/graph-production-workflow/indicators-ingestion/indicators-ingestion.md b/versioned_docs/version-9.0.0/graph-production-workflow/indicators-ingestion/indicators-ingestion.md new file mode 100644 index 0000000..285a1dc --- /dev/null +++ b/versioned_docs/version-9.0.0/graph-production-workflow/indicators-ingestion/indicators-ingestion.md @@ -0,0 +1,8 @@ +import DocCardList from '@theme/DocCardList'; + +# Indicators ingestion + +In this step, research products are enriched with Impact and Usage Statistics indicators. +The former are provided by [BIP!](https://bip.imsi.athenarc.gr/) while the latter are computed by OpenAIRE's [UsageCounts service](https://usagecounts.openaire.eu/). + ++ +
+ +## The 17 Sustainable Development Goals + +1. [**No Poverty**](https://sdgs.un.org/goals/goal1): End poverty in all its forms everywhere. +2. [**Zero Hunger**](https://sdgs.un.org/goals/goal2): End hunger, achieve food security and improved nutrition, and + promote sustainable agriculture. +3. [**Good Health and Well-being**](https://sdgs.un.org/goals/goal3): Ensure healthy lives and promote well-being + for all at all ages. +4. [**Quality Education**](https://sdgs.un.org/goals/goal4): Ensure inclusive and equitable quality education and + promote lifelong learning opportunities for all. +5. [**Gender Equality**](https://sdgs.un.org/goals/goal5): Achieve gender equality and empower all women and girls. +6. [**Clean Water and Sanitation**](https://sdgs.un.org/goals/goal6): Ensure availability and sustainable + management of water and sanitation for all. +7. [**Affordable and Clean Energy**](https://sdgs.un.org/goals/goal7): Ensure access to affordable, reliable, + sustainable, and modern energy for all. +8. [**Decent Work and Economic Growth**](https://sdgs.un.org/goals/goal8): Promote sustained, inclusive, and + sustainable economic growth, full and productive employment, and decent work for all. +9. [**Industry, Innovation, and Infrastructure**](https://sdgs.un.org/goals/goal9): Build resilient infrastructure, + promote inclusive and sustainable industrialization, and foster innovation. +10. [**Reduced Inequalities**](https://sdgs.un.org/goals/goal10): Reduce inequality within and among countries. +11. [**Sustainable Cities and Communities**](https://sdgs.un.org/goals/goal11): Make cities and human settlements + inclusive, safe, resilient, and sustainable. +12. [**Responsible Consumption and Production**](https://sdgs.un.org/goals/goal12): Ensure sustainable consumption + and production patterns. +13. [**Climate Action**](https://sdgs.un.org/goals/goal13): Take urgent action to combat climate change and its impacts. +14. [**Life Below Water**](https://sdgs.un.org/goals/goal14): Conserve and sustainably use the oceans, seas, and + marine resources for sustainable development. +15. [**Life on Land**](https://sdgs.un.org/goals/goal15): Protect, restore, and promote sustainable use of + terrestrial ecosystems, manage forests sustainably, combat desertification, and halt and reverse land + degradation and halt biodiversity loss. +16. [**Peace, Justice, and Strong Institutions**](https://sdgs.un.org/goals/goal16): Promote peaceful and inclusive + societies for sustainable development, provide access to justice for all, and build effective, accountable, and + inclusive institutions at all levels. +17. [**Partnerships for the Goals**](https://sdgs.un.org/goals/goal17): Strengthen the means of implementation and + revitalize the global partnership for sustainable development. + +## Application in Classification of Research Products + +The SDG taxonomy is used to classify research products based on their relevance to the overarching goals. This +classification helps in identifying the impact of research on sustainable development and aligning research efforts +with global priorities. Here’s how it can be applied: + +1. **Mapping Research Outputs**: Research outputs such as publications are be mapped to specific SDGs based on their + objectives, methodologies, and outcomes. +2. **Evaluating Impact**: The classification allows for the evaluation of the impact of research on achieving the + SDGs, helping to highlight contributions to specific goals. +3. **Funding and Collaboration**: Aligning research with SDGs can attract funding from organizations focused on + sustainable development and foster collaborations with other researchers and institutions working towards + similar goals. +4. **Policy and Decision-Making**: Policymakers can use the classification to identify research that supports + sustainable development policies and make informed decisions based on evidence from relevant research. + +By integrating the SDG taxonomy into the classification of research products, we can ensure that research efforts +are directed towards addressing the most pressing global challenges and contributing to a sustainable future. + +## Conclusion + +The Sustainable Development Goals provide a comprehensive framework for addressing global challenges. By applying +the SDG taxonomy to classify research products, we can better understand and enhance the impact of research on +sustainable development, ensuring that scientific advancements contribute to a more equitable and sustainable world. + +Check an example of how the SDG classification appears in the OpenAIRE data in the +[data model](../../data-model/entities/research-product#subjects) section. \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/graph-production-workflow/indicators-ingestion/usage-counts.md b/versioned_docs/version-9.0.0/graph-production-workflow/indicators-ingestion/usage-counts.md new file mode 100644 index 0000000..b1a86bd --- /dev/null +++ b/versioned_docs/version-9.0.0/graph-production-workflow/indicators-ingestion/usage-counts.md @@ -0,0 +1,7 @@ +# Usage Statistics indicators + +Usage Statistics indicators for research products, like publications, datasets,etc., are an important complement to other (traditional and alternative) bibliometric indicators to provide a comprehensive and recent view of the impact of such resources but also about their authors, institutions and the platforms themselves. They are taking into account different levels of information: the usage of data sources, the usage of individual items in the context of their resource type and the usage of individual web resources or files. + +Usage Statistics Indicators are built by the OpenAIRE's UsageCounts service. The service collects usage data and consolidated usage statistics reports respectively, from its distributed network of data providers (repositories, e-journals, CRIS) by utilizing open standards and protocols and delivers reliable, consolidated and comparable usage metrics like counts of item downloads and metadata views conformant to COUNTER Code of Practice. + +You can find more information about the UsageCounts service [here](https://usagecounts.openaire.eu/). \ No newline at end of file diff --git a/versioned_docs/version-9.0.0/graph-production-workflow/merge-by-id.md b/versioned_docs/version-9.0.0/graph-production-workflow/merge-by-id.md new file mode 100644 index 0000000..9e994c7 --- /dev/null +++ b/versioned_docs/version-9.0.0/graph-production-workflow/merge-by-id.md @@ -0,0 +1,28 @@ +# Merge by id + +In the metadata aggregation system it is common to find the same record provided by +different datasources and, sometimes, even inside the same datasource (especially in +case of aggregators). As the harmonisation processes are performed per datasource +contents, the relative records are the output of different mapping implementations. +This approach has the advantage to be deeply customisable to catch datasource specific +aspects, but it leaves room for inconsistencies when evaluating the different mappings +across the various datasources. + +This phase is therefore responsible to compensate for such inconsistencies and performs +a global grouping of every record available in the graph: + +- entities are grouped by [`id`](../data-model/entities/research-product#id) +- relations are grouped by [`source`, `target`, `reltype`](../data-model/relationships/relationship-object) + +This ensures that the same record, possibly assigned to different types by different +mappings, appears only once in the graph and under a single typing. In case of clashing +identifiers, the properties are merged (including the provenance information), considering +the following precedence order for the research product typing: + +``` +publication > dataset > software > other +``` + +The same holds for relationships, as the same (e.g.) DOI-to-DOI citation relation could +be aggregated from multiple sources, this grouping phase would collapse all the different +duplicates onto a single relation that would however include all the individual provenances. diff --git a/versioned_docs/version-9.0.0/graph-production-workflow/stats.md b/versioned_docs/version-9.0.0/graph-production-workflow/stats.md new file mode 100644 index 0000000..9d0de86 --- /dev/null +++ b/versioned_docs/version-9.0.0/graph-production-workflow/stats.md @@ -0,0 +1,12 @@ +# Stats analysis + +The OpenAIRE Graph is also processed by a pipeline for extracting the statistics +and producing the charts for funders, research initiative, research infrastructures, +and policymakers available on [MONITOR](https://monitor.openaire.eu). + +Based on the information available on the graph, OpenAIRE provides a set of +indicators for monitoring the funding and research impact and the uptake of +Open Science publishing practices, such as Open Access publishing of publications +and datasets, availability of interlinks between research products, availability +of post-print versions in institutional or thematic Open Access repositories, etc. + diff --git a/versioned_docs/version-9.0.0/intro.md b/versioned_docs/version-9.0.0/intro.md new file mode 100644 index 0000000..5bbf407 --- /dev/null +++ b/versioned_docs/version-9.0.0/intro.md @@ -0,0 +1,34 @@ +--- +slug: / +id: intro +sidebar_position: 1 +--- + +# Overview + +The [OpenAIRE Graph](https://graph.openaire.eu/) (formerly known as the OpenAIRE Research Graph) is one of the largest open scholarly record collections worldwide, key in fostering Open Science and establishing its practices in the daily research activities. +Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back in the hands of the scientific community. + +Imagine a vast collection of research products all linked together, contextualised and openly available. For the past years OpenAIRE has been working to gather this valuable record. It is a massive collection of metadata and links between scientific products such as articles, datasets, software, and other research products, entities like organisations, funders, funding streams, projects, communities, and data sources. + +The OpenAIRE Graph aggregates millions of metadata records collected from trusted data sources, including: + +* Open Access journals registered in DOAJ +* Crossref +* Unpaywall +* ORCID +* Microsoft Academic Graph +* Datacite + +And repositories registered in OpenDOAR, re3data.org, FAIRSharing.org, and the EOSC Service Catalogue. Among these, prominent repositories such as: + +* UKPubMed +* ArXiv +* HAL +* Zenodo +* Figshare +* Dryad +* Repec + +After cleaning, deduplication, enrichment and full-text mining processes, the graph is analysed to produce statistics for the [OpenAIRE MONITOR](https://monitor.openaire.eu), the [Open Science Observatory](https://osobservatory.openaire.eu), made discoverable via the [OpenAIRE EXPLORE](https://explore.openaire.eu) and programmatically accessible via [OpenAIRE Public APIs](https://develop.openaire.eu). +Last but not least, the Graph data are openly available and can be used by third-parties to create added value services. diff --git a/versioned_docs/version-9.0.0/license.md b/versioned_docs/version-9.0.0/license.md new file mode 100644 index 0000000..b55436d --- /dev/null +++ b/versioned_docs/version-9.0.0/license.md @@ -0,0 +1,10 @@ +--- +sidebar_position: 11 +--- + +# License + +OpenAIRE Graph is available for download and re-use as CC-BY (due to some input sources whose license is CC-BY). Parts of the graphs can be re-used as CC-0. + +If you are using data from the OpenAIRE Graph, please find the appropriate way to acknowledge this [here](downloads/full-graph#how-to-acknowledge-this-work). + diff --git a/versioned_docs/version-9.0.0/publications.md b/versioned_docs/version-9.0.0/publications.md new file mode 100644 index 0000000..250af64 --- /dev/null +++ b/versioned_docs/version-9.0.0/publications.md @@ -0,0 +1,80 @@ +--- +sidebar_position: 7 +--- + +# Relevant publications + +Open Science services are open and transparent and survive thanks to your active support and to the visibility and reward they gather. If you use one of the [OpenAIRE Graph Datasets](https://doi.org/10.5281/zenodo.3516917) for your research, please provide a proper citation following the recommendation that you find on the dataset's Zenodo page or as provided below. + +:::note How to cite + +Manghi P., Atzori C., Bardi A., Baglioni M., Schirrwagen J., Dimitropoulos H., La Bruzzo S., Foufoulas I., Mannocci A., Horst M., Czerniak A., Iatropoulou K., Kokogiannaki A., De Bonis M., Artini M., Lempesis A., Ioannidis A., Manola N., Principe P., Vergoulis T., Chatzopoulos S., Pierrakos D. (2022). "OpenAIRE Research Graph Dataset", *Dataset*, Zenodo. [doi:10.5281/zenodo.3516917](https://doi.org/10.5281/zenodo.3516917) ([BibTex](/bibtex/OpenAIRE_Research_Graph_dump.bib)) +::: + +## Other relevant research products + +Please also consider citing the related research products listed below. + +### Aggregation system + +Manghi P., Artini M., Atzori C., Bardi A., Mannocci A., La Bruzzo S., Candela L., Castelli D., Pagano P. (2014). "The D-NET software toolkit: A framework for the realization, maintenance, and operation of aggregative infrastructures", Program: electronic library and information systems, Vol. 48 No. 4, pp. 322-354. [doi:10.1108/prog-08-2013-0045](http://doi.org/10.1108/prog-08-2013-0045) + +Atzori C., Bardi A., Manghi P., Mannocci A. (2017). "The OpenAIRE workflows for data management", In Italian Research Conference on Digital Libraries (IRCDL), pp. 95-107, Springer, Cham. [doi:10.1007/978-3-319-68130-6_8](https://doi.org/10.1007/978-3-319-68130-6_8) + +Artini M., Atzori C., Bardi A., La Bruzzo S., Manghi P., Mannocci A. (2016). "The D-NET software toolkit: dnet-basic-aggregator (Version 1.3.0)". *Software*, Zenodo. [doi:10.5281/zenodo.168356](https://doi.org/10.5281/zenodo.168356) + +Mannocci A., Manghi P. (2016). "DataQ: a data flow quality monitoring system for aggregative data infrastructures", International Conference on Theory and Practice of Digital Libraries (TPDL), pp. 357-369, Springer, Cham. [doi:10.1007/978-3-319-43997-6_28](https://doi.org/10.1007/978-3-319-43997-6_28) + +### Deduplication + +Vichos K., De Bonis M., Kanellos I., Chatzopoulos S., Atzori C., Manola N., Manghi P., Vergoulis T. (2022). "A preliminary assessment of the article deduplication algorithm used for the OpenAIRE Research Graph", In Italian Research Conference on Digital Libraries (IRCDL), Padua, Italy, CEUR-WS Proceedings. [http://ceur-ws.org/Vol-3160](http://ceur-ws.org/Vol-3160/) + +De Bonis M., Manghi P., Atzori C. (2022). "FDup: a framework for general-purpose and efficient entity deduplication of record collections", PeerJ Computer Science, 8, e1058. [https://peerj.com/articles/cs-1058](https://peerj.com/articles/cs-1058) + +Manghi P., Atzori C., De Bonis M., Bardi, A. (2020). "Entity deduplication in big data graphs for scholarly communication", Data Technologies and Applications. [doi:10.1108/dta-09-2019-0163](https://doi.org/10.1108/dta-09-2019-0163) + + +Atzori C., Manghi P., Bardi, A. (2018). "GDup: de-duplication of scholarly communication big graphs", In 2018 IEEE/ACM 5th International Conference on Big Data Computing Applications and Technologies (BDCAT) (pp. 142-151). IEEE. [doi:10.1109/bdcat.2018.00025](https://doi.org/10.1109/bdcat.2018.00025) + +Atzori C., & Paolo Manghi. (2017). "GDup: a big graph entity deduplication system" (Version 4.0.5), *Software*, Zenodo. [doi:/10.5281/zenodo.292980](https://doi.org/10.5281/zenodo.292980) + +Atzori C. (2016). "GDup: an Integrated, Scalable Big Graph Deduplication System.". [doi:10.5281/zenodo.1454879](https://doi.org/10.5281/zenodo.1454879) + +Manghi P., Mikulicic M., Atzori C. (2012). "De-duplication of aggregation authority files." International Journal of Metadata, Semantics and Ontologies 7.2: 114-130. [doi:10.1504/ijmso.2012.050014](https://doi.org/10.1504/ijmso.2012.050014) + +Manghi P., Mikulicic M. (2011). "PACE: A general-purpose tool for authority control", In Research Conference on Metadata and Semantic Research, pp. 80-92, Springer, Berlin, Heidelberg. [doi:10.1007/978-3-642-24731-6_8](https://doi.org/10.1007/978-3-642-24731-6_8) + +### Mining + +Giannakopoulos T., Foufoulas Y., Dimitropoulos H., Manola N. (2019). "Interactive Text Analysis and Information Extraction", In Italian Research Conference on Digital Libraries (IRCDL), vol 988. Springer, Cham. [doi:10.1007/978-3-030-11226-4_27](https://doi.org/10.1007/978-3-030-11226-4_27) + +Foufoulas Y., Stamatogiannakis L., Dimitropoulos H., Ioannidis Y. (2017). "High-Pass Text Filtering for Citation Matching", In International Conference on Theory and Practice of Digital Libraries (TPDL). Springer, Cham. [doi:10.1007/978-3-319-67008-9_28](https://doi.org/10.1007/978-3-319-67008-9_28) + +Chronis Y., Foufoulas Y., Nikolopoulos V., Papadopoulos A., Stamatogiannakis L., Svingos C., Ioannidis Y. E. (2016). "A Relational Approach to Complex Dataflows", In Workshop Proceedings of the EDBT/ICDT 2016 (MEDAL 2016) Joint Conference on CEUR-WS.org (ISSN 1613-0073) [http://ceur-ws.org/Vol-1558/paper45.pdf](http://ceur-ws.org/Vol-1558/paper45.pdf) + +Giannakopoulos T., Foufoulas I., Stamatogiannakis E., Dimitropoulos H., Manola N., Ioannidis Y. (2015). "Visual-Based Classification of Figures from Scientific Literature", In Proceedings of the 24th International Conference on World Wide Web (WWW), Association for Computing Machinery, New York, NY, USA, 1059–1060. [doi:10.1145/2740908.2742024](https://doi.org/10.1145/2740908.2742024) + +Giannakopoulos T., Foufoulas I., Stamatogiannakis E., Dimitropoulos H., Manola N., Ioannidis Y. (2014). "Discovering and Visualizing Interdisciplinary Content Classes in Scientific Publications". D-Lib Mag., Volume 20, Number 11/12. [doi:10.1045/november14-giannakopoulos](https://doi.org/10.1045/november14-giannakopoulos) + +Giannakopoulos T., Stamatogiannakis E., Foufoulas I., Dimitropoulos H., Manola N., Ioannidis Y. (2014). "Content Visualization of Scientific Corpora Using an Extensible Relational Database Implementation", International Conference on Theory and Practice of Digital Libraries (TPDL), Springer, Cham. [doi:10.1007/978-3-319-08425-1_10](https://doi.org/10.1007/978-3-319-08425-1_10) + +Giannakopoulos T., Dimitropoulos H., Metaxas O., Manola N., Ioannidis Y. (2013). "Supervised Content Visualization of Scientific Publications: A Case Study on the ArXiv Dataset", Intelligent Information Systems Symposium (IIS) vol 7912, Springer, Berlin, Heidelberg. [doi:10.1007/978-3-642-38634-3_23](https://doi.org/10.1007/978-3-642-38634-3_23) + +Tkaczyk, D., Szostek, P., Fedoryszak, M., Jan Dendek P., Bolikowski Ł. (2015). "CERMINE: automatic extraction of structured metadata from scientific literature", International Journal on Document Analysis and Recognition (IJDAR), 317–335. [doi:10.1007/s10032-015-0249-8](https://doi.org/10.1007/s10032-015-0249-8) + +Kobos M., Bolikowski Ł., Horst M., Manghi P., Μanola N., Schirrwagen J. (2014). "Information inference in scholarly communication infrastructures: the OpenAIREplus project experience", Procedia Computer Science 38, 92-99. [doi:10.1016/j.procs.2014.10.016](https://doi.org/10.1016/j.procs.2014.10.016) + +### Portals + +Baglioni Μ., Bardi Α., Kokogiannaki Α., Manghi P., Iatropoulou K., Principe P., Vieira A., Nielsen L. H., Dimitropoulos H., Foufoulas I., Manola N., Atzori C., La Bruzzo S., Lazzeri E., Artini M., De Bonis M., Dell’Amico A. (2019). "The OpenAIRE Research Community Dashboard: On Blending Scientific Workflows and Scientific Publishing", +International Conference on Theory and Practice of Digital Libraries (TPDL). Lecture Notes in Computer Science, vol 11799. Springer, Cham. [doi:10.1007/978-3-030-30760-8_5](https://doi.org/10.1007/978-3-030-30760-8_5) + +### Broker Service + +Manghi P., Atzori C., Bardi A., La Bruzzo S., Artini M. (2016). "Realizing a Scalable and History-Aware Literature Broker Service for OpenAIRE", Italian Research Conference on Digital Libraries (IRCDL), pp. 92-103, Springer, Cham. [doi:10.1007/978-3-319-56300-8_9](https://doi.org/10.1007/978-3-319-56300-8_9) + +Artini M., Atzori C., Bardi A., La Bruzzo S., Manghi P., Mannocci A. (2015). "The OpenAIRE literature broker service for institutional repositories", D-Lib Magazine, 21(11/12), 1. [doi:10.1045/november2015-artini](https://doi.org/10.1045/november2015-artini) + + + + diff --git a/versioned_sidebars/version-8.0.1-sidebars.json b/versioned_sidebars/version-8.0.1-sidebars.json new file mode 100644 index 0000000..bc7fbcf --- /dev/null +++ b/versioned_sidebars/version-8.0.1-sidebars.json @@ -0,0 +1,413 @@ +{ + "mySidebar": [ + { + "type": "doc", + "id": "intro" + }, + { + "type": "category", + "label": "Data model", + "link": { + "type": "doc", + "id": "data-model/data-model" + }, + "items": [ + { + "type": "category", + "label": "Entities", + "link": { + "type": "generated-index", + "description": "The main entities of the OpenAIRE Graph are listed below." + }, + "items": [ + { + "type": "doc", + "id": "data-model/entities/research-product" + }, + { + "type": "doc", + "id": "data-model/entities/data-source" + }, + { + "type": "doc", + "id": "data-model/entities/organization" + }, + { + "type": "doc", + "id": "data-model/entities/project" + }, + { + "type": "doc", + "id": "data-model/entities/community" + } + ] + }, + { + "type": "category", + "label": "Relationships", + "link": { + "type": "generated-index", + "description": "This section describes the relationships between entities in the OpenAIRE Graph: they way they are modelled as well as the different relationship types currently supported." + }, + "items": [ + { + "type": "doc", + "id": "data-model/relationships/relationship-object" + }, + { + "type": "doc", + "id": "data-model/relationships/relationship-types" + } + ] + }, + { + "type": "doc", + "id": "data-model/pids-and-identifiers" + } + ] + }, + { + "type": "category", + "label": "Public APIs", + "link": { + "type": "doc", + "id": "apis/home" + }, + "items": [ + { + "type": "category", + "label": "Graph API", + "link": { + "type": "doc", + "id": "apis/graph-api/graph-api" + }, + "items": [ + { + "type": "doc", + "id": "apis/graph-api/getting-a-single-entity" + }, + { + "type": "category", + "label": "Searching entities", + "link": { + "type": "doc", + "id": "apis/graph-api/searching-entities/searching-entities" + }, + "items": [ + { + "type": "doc", + "id": "apis/graph-api/searching-entities/filtering-search-results" + }, + { + "type": "doc", + "id": "apis/graph-api/searching-entities/sorting-and-paging" + } + ] + }, + { + "type": "doc", + "id": "apis/graph-api/making-requests" + } + ] + }, + { + "type": "category", + "label": "Search API", + "link": { + "type": "doc", + "id": "apis/search-api/search-api" + }, + "items": [ + { + "type": "doc", + "id": "apis/search-api/research-products" + }, + { + "type": "doc", + "id": "apis/search-api/projects" + }, + { + "type": "doc", + "id": "apis/search-api/response-metadata-format" + } + ] + }, + { + "type": "link", + "label": "ScholeXplorer API", + "href": "https://api.scholexplorer.openaire.eu/swagger-ui/index.html?urls.primaryName=Scholexplorer%20API%20V2.0" + }, + { + "type": "doc", + "id": "apis/dspace-eprints-api" + }, + { + "type": "doc", + "id": "apis/broker-api" + }, + { + "type": "doc", + "id": "apis/terms" + }, + { + "type": "doc", + "id": "apis/authentication" + }, + { + "type": "doc", + "id": "apis/specification-changelog" + } + ] + }, + { + "type": "category", + "label": "Downloads", + "link": { + "type": "generated-index", + "description": "All resources, available for download, are listed below. For the versions available in Zenodo, please refer to the Changelog section." + }, + "items": [ + { + "type": "doc", + "id": "downloads/full-graph" + }, + { + "type": "doc", + "id": "downloads/beginners-kit" + }, + { + "type": "doc", + "id": "downloads/subgraphs" + }, + { + "type": "doc", + "id": "downloads/related-datasets" + } + ] + }, + { + "type": "category", + "label": "Graph production workflow", + "link": { + "type": "doc", + "id": "graph-production-workflow/graph-production-workflow" + }, + "items": [ + { + "type": "category", + "label": "Aggregation", + "link": { + "type": "doc", + "id": "graph-production-workflow/aggregation/aggregation" + }, + "items": [ + { + "type": "doc", + "label": "OpenAIRE compatible sources", + "id": "graph-production-workflow/aggregation/compatible-sources" + }, + { + "type": "category", + "label": "Non-compatible sources", + "link": { + "type": "generated-index" + }, + "items": [ + { + "type": "doc", + "id": "graph-production-workflow/aggregation/non-compatible-sources/crossref_unpaywall", + "label": "Crossref & Unpaywall" + }, + { + "type": "doc", + "id": "graph-production-workflow/aggregation/non-compatible-sources/mag", + "label": "Microsoft Academic Graph" + }, + { + "type": "doc", + "id": "graph-production-workflow/aggregation/non-compatible-sources/orcid", + "label": "ORCID" + }, + { + "type": "doc", + "id": "graph-production-workflow/aggregation/non-compatible-sources/pubmed" + }, + { + "type": "doc", + "id": "graph-production-workflow/aggregation/non-compatible-sources/datacite" + }, + { + "type": "doc", + "id": "graph-production-workflow/aggregation/non-compatible-sources/ebi", + "label": "EMBL-EBI" + }, + { + "type": "doc", + "id": "graph-production-workflow/aggregation/non-compatible-sources/uniprot", + "label": "UniProtKB/Swiss-Prot" + } + ] + } + ] + }, + { + "type": "doc", + "id": "graph-production-workflow/merge-by-id" + }, + { + "type": "category", + "label": "Enrichment by PID", + "link": { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-pid/enrichment-by-pid" + }, + "items": [ + { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-pid/orcid-enrichment" + } + ] + }, + { + "type": "category", + "label": "Enrichment by mining", + "link": { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-mining/enrichment-by-mining" + }, + "items": [ + { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-mining/affiliation_matching" + }, + { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-mining/citation_matching" + }, + { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-mining/classifies" + }, + { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-mining/documents_similarity" + }, + { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-mining/acks" + }, + { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-mining/cites" + }, + { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-mining/metadata_extraction" + } + ] + }, + { + "type": "doc", + "id": "graph-production-workflow/cleaning" + }, + { + "type": "category", + "label": "Deduplication", + "link": { + "type": "doc", + "id": "graph-production-workflow/deduplication/deduplication" + }, + "items": [ + { + "type": "doc", + "id": "graph-production-workflow/deduplication/research-products" + }, + { + "type": "doc", + "id": "graph-production-workflow/deduplication/organizations" + } + ] + }, + { + "type": "category", + "label": "Deduction & propagation", + "link": { + "type": "generated-index", + "description": "The OpenAIRE Graph is further enriched by the deduction and propagation processes descibed in this section." + }, + "items": [ + { + "type": "doc", + "id": "graph-production-workflow/deduction-and-propagation/bulk-tagging" + }, + { + "type": "doc", + "id": "graph-production-workflow/deduction-and-propagation/propagation" + } + ] + }, + { + "type": "category", + "label": "Indicators ingestion", + "link": { + "type": "doc", + "id": "graph-production-workflow/indicators-ingestion/indicators-ingestion" + }, + "items": [ + { + "type": "doc", + "id": "graph-production-workflow/indicators-ingestion/impact-indicators" + }, + { + "type": "doc", + "id": "graph-production-workflow/indicators-ingestion/usage-counts" + }, + { + "type": "doc", + "id": "graph-production-workflow/indicators-ingestion/fos-classification" + }, + { + "type": "doc", + "id": "graph-production-workflow/indicators-ingestion/sdg-classification" + } + ] + }, + { + "type": "doc", + "id": "graph-production-workflow/finalisation" + }, + { + "type": "doc", + "id": "graph-production-workflow/indexing" + }, + { + "type": "doc", + "id": "graph-production-workflow/stats" + } + ] + }, + { + "type": "doc", + "id": "publications", + "label": "Relevant publications" + }, + { + "type": "doc", + "id": "license" + }, + { + "type": "doc", + "id": "changelog" + }, + { + "type": "link", + "label": "Helpdesk", + "href": "https://graph.openaire.eu/support" + }, + { + "type": "link", + "label": "User forum", + "href": "https://openaire.flarum.cloud/" + } + ] +} diff --git a/versioned_sidebars/version-9.0.0-sidebars.json b/versioned_sidebars/version-9.0.0-sidebars.json new file mode 100644 index 0000000..bc7fbcf --- /dev/null +++ b/versioned_sidebars/version-9.0.0-sidebars.json @@ -0,0 +1,413 @@ +{ + "mySidebar": [ + { + "type": "doc", + "id": "intro" + }, + { + "type": "category", + "label": "Data model", + "link": { + "type": "doc", + "id": "data-model/data-model" + }, + "items": [ + { + "type": "category", + "label": "Entities", + "link": { + "type": "generated-index", + "description": "The main entities of the OpenAIRE Graph are listed below." + }, + "items": [ + { + "type": "doc", + "id": "data-model/entities/research-product" + }, + { + "type": "doc", + "id": "data-model/entities/data-source" + }, + { + "type": "doc", + "id": "data-model/entities/organization" + }, + { + "type": "doc", + "id": "data-model/entities/project" + }, + { + "type": "doc", + "id": "data-model/entities/community" + } + ] + }, + { + "type": "category", + "label": "Relationships", + "link": { + "type": "generated-index", + "description": "This section describes the relationships between entities in the OpenAIRE Graph: they way they are modelled as well as the different relationship types currently supported." + }, + "items": [ + { + "type": "doc", + "id": "data-model/relationships/relationship-object" + }, + { + "type": "doc", + "id": "data-model/relationships/relationship-types" + } + ] + }, + { + "type": "doc", + "id": "data-model/pids-and-identifiers" + } + ] + }, + { + "type": "category", + "label": "Public APIs", + "link": { + "type": "doc", + "id": "apis/home" + }, + "items": [ + { + "type": "category", + "label": "Graph API", + "link": { + "type": "doc", + "id": "apis/graph-api/graph-api" + }, + "items": [ + { + "type": "doc", + "id": "apis/graph-api/getting-a-single-entity" + }, + { + "type": "category", + "label": "Searching entities", + "link": { + "type": "doc", + "id": "apis/graph-api/searching-entities/searching-entities" + }, + "items": [ + { + "type": "doc", + "id": "apis/graph-api/searching-entities/filtering-search-results" + }, + { + "type": "doc", + "id": "apis/graph-api/searching-entities/sorting-and-paging" + } + ] + }, + { + "type": "doc", + "id": "apis/graph-api/making-requests" + } + ] + }, + { + "type": "category", + "label": "Search API", + "link": { + "type": "doc", + "id": "apis/search-api/search-api" + }, + "items": [ + { + "type": "doc", + "id": "apis/search-api/research-products" + }, + { + "type": "doc", + "id": "apis/search-api/projects" + }, + { + "type": "doc", + "id": "apis/search-api/response-metadata-format" + } + ] + }, + { + "type": "link", + "label": "ScholeXplorer API", + "href": "https://api.scholexplorer.openaire.eu/swagger-ui/index.html?urls.primaryName=Scholexplorer%20API%20V2.0" + }, + { + "type": "doc", + "id": "apis/dspace-eprints-api" + }, + { + "type": "doc", + "id": "apis/broker-api" + }, + { + "type": "doc", + "id": "apis/terms" + }, + { + "type": "doc", + "id": "apis/authentication" + }, + { + "type": "doc", + "id": "apis/specification-changelog" + } + ] + }, + { + "type": "category", + "label": "Downloads", + "link": { + "type": "generated-index", + "description": "All resources, available for download, are listed below. For the versions available in Zenodo, please refer to the Changelog section." + }, + "items": [ + { + "type": "doc", + "id": "downloads/full-graph" + }, + { + "type": "doc", + "id": "downloads/beginners-kit" + }, + { + "type": "doc", + "id": "downloads/subgraphs" + }, + { + "type": "doc", + "id": "downloads/related-datasets" + } + ] + }, + { + "type": "category", + "label": "Graph production workflow", + "link": { + "type": "doc", + "id": "graph-production-workflow/graph-production-workflow" + }, + "items": [ + { + "type": "category", + "label": "Aggregation", + "link": { + "type": "doc", + "id": "graph-production-workflow/aggregation/aggregation" + }, + "items": [ + { + "type": "doc", + "label": "OpenAIRE compatible sources", + "id": "graph-production-workflow/aggregation/compatible-sources" + }, + { + "type": "category", + "label": "Non-compatible sources", + "link": { + "type": "generated-index" + }, + "items": [ + { + "type": "doc", + "id": "graph-production-workflow/aggregation/non-compatible-sources/crossref_unpaywall", + "label": "Crossref & Unpaywall" + }, + { + "type": "doc", + "id": "graph-production-workflow/aggregation/non-compatible-sources/mag", + "label": "Microsoft Academic Graph" + }, + { + "type": "doc", + "id": "graph-production-workflow/aggregation/non-compatible-sources/orcid", + "label": "ORCID" + }, + { + "type": "doc", + "id": "graph-production-workflow/aggregation/non-compatible-sources/pubmed" + }, + { + "type": "doc", + "id": "graph-production-workflow/aggregation/non-compatible-sources/datacite" + }, + { + "type": "doc", + "id": "graph-production-workflow/aggregation/non-compatible-sources/ebi", + "label": "EMBL-EBI" + }, + { + "type": "doc", + "id": "graph-production-workflow/aggregation/non-compatible-sources/uniprot", + "label": "UniProtKB/Swiss-Prot" + } + ] + } + ] + }, + { + "type": "doc", + "id": "graph-production-workflow/merge-by-id" + }, + { + "type": "category", + "label": "Enrichment by PID", + "link": { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-pid/enrichment-by-pid" + }, + "items": [ + { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-pid/orcid-enrichment" + } + ] + }, + { + "type": "category", + "label": "Enrichment by mining", + "link": { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-mining/enrichment-by-mining" + }, + "items": [ + { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-mining/affiliation_matching" + }, + { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-mining/citation_matching" + }, + { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-mining/classifies" + }, + { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-mining/documents_similarity" + }, + { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-mining/acks" + }, + { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-mining/cites" + }, + { + "type": "doc", + "id": "graph-production-workflow/enrichment-by-mining/metadata_extraction" + } + ] + }, + { + "type": "doc", + "id": "graph-production-workflow/cleaning" + }, + { + "type": "category", + "label": "Deduplication", + "link": { + "type": "doc", + "id": "graph-production-workflow/deduplication/deduplication" + }, + "items": [ + { + "type": "doc", + "id": "graph-production-workflow/deduplication/research-products" + }, + { + "type": "doc", + "id": "graph-production-workflow/deduplication/organizations" + } + ] + }, + { + "type": "category", + "label": "Deduction & propagation", + "link": { + "type": "generated-index", + "description": "The OpenAIRE Graph is further enriched by the deduction and propagation processes descibed in this section." + }, + "items": [ + { + "type": "doc", + "id": "graph-production-workflow/deduction-and-propagation/bulk-tagging" + }, + { + "type": "doc", + "id": "graph-production-workflow/deduction-and-propagation/propagation" + } + ] + }, + { + "type": "category", + "label": "Indicators ingestion", + "link": { + "type": "doc", + "id": "graph-production-workflow/indicators-ingestion/indicators-ingestion" + }, + "items": [ + { + "type": "doc", + "id": "graph-production-workflow/indicators-ingestion/impact-indicators" + }, + { + "type": "doc", + "id": "graph-production-workflow/indicators-ingestion/usage-counts" + }, + { + "type": "doc", + "id": "graph-production-workflow/indicators-ingestion/fos-classification" + }, + { + "type": "doc", + "id": "graph-production-workflow/indicators-ingestion/sdg-classification" + } + ] + }, + { + "type": "doc", + "id": "graph-production-workflow/finalisation" + }, + { + "type": "doc", + "id": "graph-production-workflow/indexing" + }, + { + "type": "doc", + "id": "graph-production-workflow/stats" + } + ] + }, + { + "type": "doc", + "id": "publications", + "label": "Relevant publications" + }, + { + "type": "doc", + "id": "license" + }, + { + "type": "doc", + "id": "changelog" + }, + { + "type": "link", + "label": "Helpdesk", + "href": "https://graph.openaire.eu/support" + }, + { + "type": "link", + "label": "User forum", + "href": "https://openaire.flarum.cloud/" + } + ] +} diff --git a/versions.json b/versions.json index 424639f..297ae21 100644 --- a/versions.json +++ b/versions.json @@ -1,4 +1,6 @@ [ + "9.0.0", + "8.0.1", "8.0.0", "7.2.0", "7.1.3",