Restructure data provision section

This commit is contained in:
Serafeim Chatzopoulos 2022-12-20 17:55:04 +02:00
parent 4e3806e05e
commit 484d6cb82b
28 changed files with 80 additions and 57 deletions

2
.env
View File

@ -1,2 +1,2 @@
URL="http://snf-23385.ok-kno.grnetcloud.net" URL="http://snf-23385.ok-kno.grnetcloud.net"
BASE_URL="/docs" BASE_URL="/"

View File

@ -647,11 +647,11 @@ A measure computed for this instance (e.g. those provided by [BIP! Finder](https
_Type: String • Cardinality: ONE_ _Type: String • Cardinality: ONE_
The specified measure. Currently supported one of: The specified measure. Currently supported one of:
* `influence` (see [PageRank](/data-provision/enrichment/impact-scores#pagerank-pr)) * `influence` (see [PageRank](/data-provision/indicators-ingestion/impact-scores#pagerank-pr))
* `influence_alt` (see [Citation Count](/data-provision/enrichment/impact-scores#citation-count-cc)) * `influence_alt` (see [Citation Count](/data-provision/indicators-ingestion/impact-scores#citation-count-cc))
* `popularity` (see [AttRank](/data-provision/enrichment/impact-scores#attrank)) * `popularity` (see [AttRank](/data-provision/indicators-ingestion/impact-scores#attrank))
* `popularity_alt` (see [RAM](/data-provision/enrichment/impact-scores#ram)) * `popularity_alt` (see [RAM](/data-provision/indicators-ingestion/impact-scores#ram))
* `impulse` (see ["Incubation" Citation Count](/data-provision/enrichment/impact-scores#incubation-citation-count-icc)) * `impulse` (see ["Incubation" Citation Count](/data-provision/indicators-ingestion/impact-scores#incubation-citation-count-icc))
```json ```json
"key": "influence" "key": "influence"

View File

@ -0,0 +1,5 @@
---
sidebar_position: 1
---
# OpenAIRE compatible sources

View File

@ -0,0 +1 @@
# Cleaning

View File

@ -1,7 +1,7 @@
# Data provision # Graph production workflow
OpenAIRE collects metadata records from more than 70K scholarly communication sources from all over the world, including Open Access institutional repositories, data archives, journals. All the metadata records (i.e. descriptions of research products) are put together in a data lake, together with records from Crossref, Unpaywall, ORCID, Grid.ac, and information about projects provided by national and international funders. Dedicated inference algorithms applied to metadata and to the full-texts of Open Access publications enrich the content of the data lake with links between research results and projects, author affiliations, subject classification, links to entries from domain-specific databases. Duplicated organisations and results are identified and merged together to obtain an open, trusted, public resource enabling explorations of the scholarly communication landscape like never before. OpenAIRE collects metadata records from more than 70K scholarly communication sources from all over the world, including Open Access institutional repositories, data archives, journals. All the metadata records (i.e. descriptions of research products) are put together in a data lake, together with records from Crossref, Unpaywall, ORCID, Grid.ac, and information about projects provided by national and international funders. Dedicated inference algorithms applied to metadata and to the full-texts of Open Access publications enrich the content of the data lake with links between research results and projects, author affiliations, subject classification, links to entries from domain-specific databases. Duplicated organisations and results are identified and merged together to obtain an open, trusted, public resource enabling explorations of the scholarly communication landscape like never before.
<p align="center"> <p align="center">
<img loading="lazy" alt="Data provision" src="/img/docs/architecture.png" width="80%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/> <img loading="lazy" alt="Data provision" src="/img/docs/architecture.png" width="120%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p> </p>

View File

@ -1,5 +1,4 @@
# Deduction
# Bulk Tagging/Deduction
The Deduction process (also known as “bulk tagging”) enriches each record with new information that can be derived from the existing property values. The Deduction process (also known as “bulk tagging”) enriches each record with new information that can be derived from the existing property values.

View File

Before

Width:  |  Height:  |  Size: 37 KiB

After

Width:  |  Height:  |  Size: 37 KiB

View File

@ -1,8 +1,4 @@
--- # Finalisation
sidebar_position: 4
---
# Post cleaning
At the very end of the processing pipeline, a step is dedicated to perform cleaning operations aimed at improving the overall quality of the data. At the very end of the processing pipeline, a step is dedicated to perform cleaning operations aimed at improving the overall quality of the data.
The output of this final cleansing step is the final version of the OpenAIRE Graph. The output of this final cleansing step is the final version of the OpenAIRE Graph.
@ -47,7 +43,7 @@ Bibliographic records that do not meet minimal requirements for being part of th
Currently, the only criteria applied horizontally to the entire graph aims at excluding scientific results whose title is not meaningful for citation purposes. Currently, the only criteria applied horizontally to the entire graph aims at excluding scientific results whose title is not meaningful for citation purposes.
Then, different criteria are applied in the pre-processing of specific sub-collections: Then, different criteria are applied in the pre-processing of specific sub-collections:
* [Crossref filtering](/data-provision/aggregation/doiboost#crossref-filtering) * [Crossref filtering](/data-provision/aggregation/non-compatible-sources/doiboost#crossref-filtering)
## Country cleaning ## Country cleaning

View File

@ -1,7 +1,3 @@
---
sidebar_position: 5
---
# Indexing # Indexing
The final version of the OpenAIRE Graph is indexed on a Solr server that is used by the OpenAIRE portals (EXPLORE, CONNECT, PROVIDE) and APIs, the latter adopted by several third-party applications and organizations, such as: The final version of the OpenAIRE Graph is indexed on a Solr server that is used by the OpenAIRE portals (EXPLORE, CONNECT, PROVIDE) and APIs, the latter adopted by several third-party applications and organizations, such as:

View File

@ -1,7 +1,3 @@
---
sidebar_position: 2
---
# Impact indicators # Impact indicators
This page summarises all calculated impact indicators, which are included into the [measure](/data-model/entities/other#measure) property. This page summarises all calculated impact indicators, which are included into the [measure](/data-model/entities/other#measure) property.

View File

@ -0,0 +1 @@
# Usage counts

View File

@ -0,0 +1 @@
# Merge by id

View File

@ -66,7 +66,7 @@ const sidebars = {
}, },
{ {
type: 'category', type: 'category',
label: "Data provision", label: "Graph production workflow",
link: {type: 'doc', id: 'data-provision/data-provision'}, link: {type: 'doc', id: 'data-provision/data-provision'},
items: [ items: [
{ {
@ -74,12 +74,46 @@ const sidebars = {
label: "Aggregation", label: "Aggregation",
link: {type: 'doc', id: 'data-provision/aggregation/aggregation'}, link: {type: 'doc', id: 'data-provision/aggregation/aggregation'},
items: [ items: [
{ type: 'doc', id: 'data-provision/aggregation/doiboost', label: 'DOIBoost' }, {
{ type: 'doc', id: 'data-provision/aggregation/pubmed' }, type: 'doc',
{ type: 'doc', id: 'data-provision/aggregation/datacite' }, label: "OpenAIRE compatible sources",
{ type: 'doc', id: 'data-provision/aggregation/ebi', label: 'EMBL-EBI' }, id: 'data-provision/aggregation/compatible-sources',
},
{
type: 'category',
label: "Non-compatible sources",
link: { type: 'generated-index' },
items: [
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/doiboost', label: 'DOIBoost' },
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/pubmed' },
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/datacite' },
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/ebi', label: 'EMBL-EBI' },
]
}
] ]
}, },
{
type: 'doc',
id: 'data-provision/merge-by-id'
},
{
type: 'category',
label: "Enrichment by mining",
link: {
type: 'generated-index',
description: 'The OpenAIRE Graph is enriched using the different Text and Data Mining (TDM) algorithms that are grouped in the following categories.'
},
items: [
{ type: 'doc', id: 'data-provision/enrichment-by-mining/affiliation_matching' },
{ type: 'doc', id: 'data-provision/enrichment-by-mining/citation_matching' },
{ type: 'doc', id: 'data-provision/enrichment-by-mining/classifies' },
{ type: 'doc', id: 'data-provision/enrichment-by-mining/documents_similarity' },
{ type: 'doc', id: 'data-provision/enrichment-by-mining/acks' },
{ type: 'doc', id: 'data-provision/enrichment-by-mining/cites' },
{ type: 'doc', id: 'data-provision/enrichment-by-mining/metadata_extraction' },
]
},
{ type: 'doc', id: 'data-provision/cleaning' },
{ {
type: 'category', type: 'category',
label: "Deduplication", label: "Deduplication",
@ -91,37 +125,31 @@ const sidebars = {
}, },
{ {
type: 'category', type: 'category',
label: "Enrichment", label: "Enrichment by deduplication & propagation",
link: { link: {
type: 'generated-index', type: 'generated-index' ,
description: 'The OpenAIRE Graph is enriched using the different processes that we describe in this section.' description: 'The OpenAIRE Graph is further enriched by the deduction and propagation processes descibed in this section.'
}, },
items: [ items: [
{ type: 'doc', id: 'data-provision/enrichment-by-deduction-and-propagation/bulk-tagging' },
{ type: 'doc', id: 'data-provision/enrichment-by-deduction-and-propagation/propagation' },
]
},
{ {
type: 'category', type: 'category',
label: "Mining", label: "Indicators ingestion",
link: { link: {
type: 'generated-index', type: 'generated-index' ,
description: 'The Text and Data Mining (TDM) algorithms used for enriching the OpenAIRE Graph are grouped in the following main categories:' description: 'In this step, the following types of indicators are ingested in the OpenAIRE Graph.'
}, },
items: [ items: [
{ type: 'doc', id: 'data-provision/enrichment/affiliation_matching' }, { type: 'doc', id: 'data-provision/indicators-ingestion/impact-scores' },
{ type: 'doc', id: 'data-provision/enrichment/citation_matching' }, { type: 'doc', id: 'data-provision/indicators-ingestion/usage-counts' },
{ type: 'doc', id: 'data-provision/enrichment/classifies' },
{ type: 'doc', id: 'data-provision/enrichment/documents_similarity' },
{ type: 'doc', id: 'data-provision/enrichment/acks' },
{ type: 'doc', id: 'data-provision/enrichment/cites' },
{ type: 'doc', id: 'data-provision/enrichment/metadata_extraction' },
] ]
}, },
{ type: 'doc', id: 'data-provision/enrichment/bulk-tagging' }, { type: 'doc', id: 'data-provision/finalisation' },
{ type: 'doc', id: 'data-provision/enrichment/propagation' },
{ type: 'doc', id: 'data-provision/enrichment/impact-scores' },
]
},
{ type: 'doc', id: 'data-provision/post-cleaning' },
{ type: 'doc', id: 'data-provision/indexing' }, { type: 'doc', id: 'data-provision/indexing' },
] ]
}, },

Binary file not shown.

After

Width:  |  Height:  |  Size: 278 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 278 KiB

After

Width:  |  Height:  |  Size: 83 KiB