forked from D-Net/openaire-graph-docs
Restructure data provision section
This commit is contained in:
parent
4e3806e05e
commit
484d6cb82b
2
.env
2
.env
|
@ -1,2 +1,2 @@
|
||||||
URL="http://snf-23385.ok-kno.grnetcloud.net"
|
URL="http://snf-23385.ok-kno.grnetcloud.net"
|
||||||
BASE_URL="/docs"
|
BASE_URL="/"
|
||||||
|
|
|
@ -647,11 +647,11 @@ A measure computed for this instance (e.g. those provided by [BIP! Finder](https
|
||||||
_Type: String • Cardinality: ONE_
|
_Type: String • Cardinality: ONE_
|
||||||
|
|
||||||
The specified measure. Currently supported one of:
|
The specified measure. Currently supported one of:
|
||||||
* `influence` (see [PageRank](/data-provision/enrichment/impact-scores#pagerank-pr))
|
* `influence` (see [PageRank](/data-provision/indicators-ingestion/impact-scores#pagerank-pr))
|
||||||
* `influence_alt` (see [Citation Count](/data-provision/enrichment/impact-scores#citation-count-cc))
|
* `influence_alt` (see [Citation Count](/data-provision/indicators-ingestion/impact-scores#citation-count-cc))
|
||||||
* `popularity` (see [AttRank](/data-provision/enrichment/impact-scores#attrank))
|
* `popularity` (see [AttRank](/data-provision/indicators-ingestion/impact-scores#attrank))
|
||||||
* `popularity_alt` (see [RAM](/data-provision/enrichment/impact-scores#ram))
|
* `popularity_alt` (see [RAM](/data-provision/indicators-ingestion/impact-scores#ram))
|
||||||
* `impulse` (see ["Incubation" Citation Count](/data-provision/enrichment/impact-scores#incubation-citation-count-icc))
|
* `impulse` (see ["Incubation" Citation Count](/data-provision/indicators-ingestion/impact-scores#incubation-citation-count-icc))
|
||||||
|
|
||||||
```json
|
```json
|
||||||
"key": "influence"
|
"key": "influence"
|
||||||
|
|
|
@ -0,0 +1,5 @@
|
||||||
|
---
|
||||||
|
sidebar_position: 1
|
||||||
|
---
|
||||||
|
|
||||||
|
# OpenAIRE compatible sources
|
|
@ -0,0 +1 @@
|
||||||
|
# Cleaning
|
|
@ -1,7 +1,7 @@
|
||||||
# Data provision
|
# Graph production workflow
|
||||||
|
|
||||||
OpenAIRE collects metadata records from more than 70K scholarly communication sources from all over the world, including Open Access institutional repositories, data archives, journals. All the metadata records (i.e. descriptions of research products) are put together in a data lake, together with records from Crossref, Unpaywall, ORCID, Grid.ac, and information about projects provided by national and international funders. Dedicated inference algorithms applied to metadata and to the full-texts of Open Access publications enrich the content of the data lake with links between research results and projects, author affiliations, subject classification, links to entries from domain-specific databases. Duplicated organisations and results are identified and merged together to obtain an open, trusted, public resource enabling explorations of the scholarly communication landscape like never before.
|
OpenAIRE collects metadata records from more than 70K scholarly communication sources from all over the world, including Open Access institutional repositories, data archives, journals. All the metadata records (i.e. descriptions of research products) are put together in a data lake, together with records from Crossref, Unpaywall, ORCID, Grid.ac, and information about projects provided by national and international funders. Dedicated inference algorithms applied to metadata and to the full-texts of Open Access publications enrich the content of the data lake with links between research results and projects, author affiliations, subject classification, links to entries from domain-specific databases. Duplicated organisations and results are identified and merged together to obtain an open, trusted, public resource enabling explorations of the scholarly communication landscape like never before.
|
||||||
|
|
||||||
<p align="center">
|
<p align="center">
|
||||||
<img loading="lazy" alt="Data provision" src="/img/docs/architecture.png" width="80%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
|
<img loading="lazy" alt="Data provision" src="/img/docs/architecture.png" width="120%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
|
||||||
</p>
|
</p>
|
||||||
|
|
|
@ -1,5 +1,4 @@
|
||||||
|
# Deduction
|
||||||
# Bulk Tagging/Deduction
|
|
||||||
|
|
||||||
The Deduction process (also known as “bulk tagging”) enriches each record with new information that can be derived from the existing property values.
|
The Deduction process (also known as “bulk tagging”) enriches each record with new information that can be derived from the existing property values.
|
||||||
|
|
Before Width: | Height: | Size: 37 KiB After Width: | Height: | Size: 37 KiB |
|
@ -1,8 +1,4 @@
|
||||||
---
|
# Finalisation
|
||||||
sidebar_position: 4
|
|
||||||
---
|
|
||||||
|
|
||||||
# Post cleaning
|
|
||||||
|
|
||||||
At the very end of the processing pipeline, a step is dedicated to perform cleaning operations aimed at improving the overall quality of the data.
|
At the very end of the processing pipeline, a step is dedicated to perform cleaning operations aimed at improving the overall quality of the data.
|
||||||
The output of this final cleansing step is the final version of the OpenAIRE Graph.
|
The output of this final cleansing step is the final version of the OpenAIRE Graph.
|
||||||
|
@ -47,7 +43,7 @@ Bibliographic records that do not meet minimal requirements for being part of th
|
||||||
Currently, the only criteria applied horizontally to the entire graph aims at excluding scientific results whose title is not meaningful for citation purposes.
|
Currently, the only criteria applied horizontally to the entire graph aims at excluding scientific results whose title is not meaningful for citation purposes.
|
||||||
Then, different criteria are applied in the pre-processing of specific sub-collections:
|
Then, different criteria are applied in the pre-processing of specific sub-collections:
|
||||||
|
|
||||||
* [Crossref filtering](/data-provision/aggregation/doiboost#crossref-filtering)
|
* [Crossref filtering](/data-provision/aggregation/non-compatible-sources/doiboost#crossref-filtering)
|
||||||
|
|
||||||
## Country cleaning
|
## Country cleaning
|
||||||
|
|
|
@ -1,7 +1,3 @@
|
||||||
---
|
|
||||||
sidebar_position: 5
|
|
||||||
---
|
|
||||||
|
|
||||||
# Indexing
|
# Indexing
|
||||||
|
|
||||||
The final version of the OpenAIRE Graph is indexed on a Solr server that is used by the OpenAIRE portals (EXPLORE, CONNECT, PROVIDE) and APIs, the latter adopted by several third-party applications and organizations, such as:
|
The final version of the OpenAIRE Graph is indexed on a Solr server that is used by the OpenAIRE portals (EXPLORE, CONNECT, PROVIDE) and APIs, the latter adopted by several third-party applications and organizations, such as:
|
||||||
|
|
|
@ -1,7 +1,3 @@
|
||||||
---
|
|
||||||
sidebar_position: 2
|
|
||||||
---
|
|
||||||
|
|
||||||
# Impact indicators
|
# Impact indicators
|
||||||
|
|
||||||
This page summarises all calculated impact indicators, which are included into the [measure](/data-model/entities/other#measure) property.
|
This page summarises all calculated impact indicators, which are included into the [measure](/data-model/entities/other#measure) property.
|
|
@ -0,0 +1 @@
|
||||||
|
# Usage counts
|
|
@ -0,0 +1 @@
|
||||||
|
# Merge by id
|
94
sidebars.js
94
sidebars.js
|
@ -66,7 +66,7 @@ const sidebars = {
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
type: 'category',
|
type: 'category',
|
||||||
label: "Data provision",
|
label: "Graph production workflow",
|
||||||
link: {type: 'doc', id: 'data-provision/data-provision'},
|
link: {type: 'doc', id: 'data-provision/data-provision'},
|
||||||
items: [
|
items: [
|
||||||
{
|
{
|
||||||
|
@ -74,12 +74,46 @@ const sidebars = {
|
||||||
label: "Aggregation",
|
label: "Aggregation",
|
||||||
link: {type: 'doc', id: 'data-provision/aggregation/aggregation'},
|
link: {type: 'doc', id: 'data-provision/aggregation/aggregation'},
|
||||||
items: [
|
items: [
|
||||||
{ type: 'doc', id: 'data-provision/aggregation/doiboost', label: 'DOIBoost' },
|
{
|
||||||
{ type: 'doc', id: 'data-provision/aggregation/pubmed' },
|
type: 'doc',
|
||||||
{ type: 'doc', id: 'data-provision/aggregation/datacite' },
|
label: "OpenAIRE compatible sources",
|
||||||
{ type: 'doc', id: 'data-provision/aggregation/ebi', label: 'EMBL-EBI' },
|
id: 'data-provision/aggregation/compatible-sources',
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'category',
|
||||||
|
label: "Non-compatible sources",
|
||||||
|
link: { type: 'generated-index' },
|
||||||
|
items: [
|
||||||
|
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/doiboost', label: 'DOIBoost' },
|
||||||
|
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/pubmed' },
|
||||||
|
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/datacite' },
|
||||||
|
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/ebi', label: 'EMBL-EBI' },
|
||||||
|
]
|
||||||
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
type: 'doc',
|
||||||
|
id: 'data-provision/merge-by-id'
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'category',
|
||||||
|
label: "Enrichment by mining",
|
||||||
|
link: {
|
||||||
|
type: 'generated-index',
|
||||||
|
description: 'The OpenAIRE Graph is enriched using the different Text and Data Mining (TDM) algorithms that are grouped in the following categories.'
|
||||||
|
},
|
||||||
|
items: [
|
||||||
|
{ type: 'doc', id: 'data-provision/enrichment-by-mining/affiliation_matching' },
|
||||||
|
{ type: 'doc', id: 'data-provision/enrichment-by-mining/citation_matching' },
|
||||||
|
{ type: 'doc', id: 'data-provision/enrichment-by-mining/classifies' },
|
||||||
|
{ type: 'doc', id: 'data-provision/enrichment-by-mining/documents_similarity' },
|
||||||
|
{ type: 'doc', id: 'data-provision/enrichment-by-mining/acks' },
|
||||||
|
{ type: 'doc', id: 'data-provision/enrichment-by-mining/cites' },
|
||||||
|
{ type: 'doc', id: 'data-provision/enrichment-by-mining/metadata_extraction' },
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{ type: 'doc', id: 'data-provision/cleaning' },
|
||||||
{
|
{
|
||||||
type: 'category',
|
type: 'category',
|
||||||
label: "Deduplication",
|
label: "Deduplication",
|
||||||
|
@ -90,38 +124,32 @@ const sidebars = {
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
type: 'category',
|
type: 'category',
|
||||||
label: "Enrichment",
|
label: "Enrichment by deduplication & propagation",
|
||||||
link: {
|
link: {
|
||||||
type: 'generated-index',
|
type: 'generated-index' ,
|
||||||
description: 'The OpenAIRE Graph is enriched using the different processes that we describe in this section.'
|
description: 'The OpenAIRE Graph is further enriched by the deduction and propagation processes descibed in this section.'
|
||||||
|
|
||||||
},
|
},
|
||||||
items: [
|
items: [
|
||||||
{
|
{ type: 'doc', id: 'data-provision/enrichment-by-deduction-and-propagation/bulk-tagging' },
|
||||||
type: 'category',
|
{ type: 'doc', id: 'data-provision/enrichment-by-deduction-and-propagation/propagation' },
|
||||||
label: "Mining",
|
|
||||||
link: {
|
|
||||||
type: 'generated-index',
|
|
||||||
description: 'The Text and Data Mining (TDM) algorithms used for enriching the OpenAIRE Graph are grouped in the following main categories:'
|
|
||||||
},
|
|
||||||
items: [
|
|
||||||
{ type: 'doc', id: 'data-provision/enrichment/affiliation_matching' },
|
|
||||||
{ type: 'doc', id: 'data-provision/enrichment/citation_matching' },
|
|
||||||
{ type: 'doc', id: 'data-provision/enrichment/classifies' },
|
|
||||||
{ type: 'doc', id: 'data-provision/enrichment/documents_similarity' },
|
|
||||||
{ type: 'doc', id: 'data-provision/enrichment/acks' },
|
|
||||||
|
|
||||||
{ type: 'doc', id: 'data-provision/enrichment/cites' },
|
|
||||||
|
|
||||||
{ type: 'doc', id: 'data-provision/enrichment/metadata_extraction' },
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{ type: 'doc', id: 'data-provision/enrichment/bulk-tagging' },
|
|
||||||
{ type: 'doc', id: 'data-provision/enrichment/propagation' },
|
|
||||||
{ type: 'doc', id: 'data-provision/enrichment/impact-scores' },
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{ type: 'doc', id: 'data-provision/post-cleaning' },
|
{
|
||||||
|
type: 'category',
|
||||||
|
label: "Indicators ingestion",
|
||||||
|
link: {
|
||||||
|
type: 'generated-index' ,
|
||||||
|
description: 'In this step, the following types of indicators are ingested in the OpenAIRE Graph.'
|
||||||
|
|
||||||
|
},
|
||||||
|
items: [
|
||||||
|
{ type: 'doc', id: 'data-provision/indicators-ingestion/impact-scores' },
|
||||||
|
{ type: 'doc', id: 'data-provision/indicators-ingestion/usage-counts' },
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{ type: 'doc', id: 'data-provision/finalisation' },
|
||||||
{ type: 'doc', id: 'data-provision/indexing' },
|
{ type: 'doc', id: 'data-provision/indexing' },
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
|
Binary file not shown.
After Width: | Height: | Size: 278 KiB |
Binary file not shown.
Before Width: | Height: | Size: 278 KiB After Width: | Height: | Size: 83 KiB |
Loading…
Reference in New Issue