2021-04-16 11:01:42 +02:00
|
|
|
|
<schema2jsonld [URL]="properties.domain + '/about'"
|
2020-11-24 15:51:29 +01:00
|
|
|
|
[logoURL]="properties.domain + '/assets/common-assets/logo-small-graph.png'"
|
|
|
|
|
[description]="description"
|
|
|
|
|
type="other"
|
|
|
|
|
[name]="title">
|
|
|
|
|
</schema2jsonld>
|
2020-10-09 01:27:04 +02:00
|
|
|
|
<div class="about">
|
2020-10-09 18:37:06 +02:00
|
|
|
|
<div class="uk-section">
|
2020-10-09 11:35:22 +02:00
|
|
|
|
<div class="uk-margin-large-left uk-margin-medium-bottom">
|
|
|
|
|
<breadcrumbs [breadcrumbs]="breadcrumbs"></breadcrumbs>
|
|
|
|
|
</div>
|
2020-10-09 18:37:06 +02:00
|
|
|
|
<div class="firstBackground">
|
|
|
|
|
<div class="uk-container">
|
|
|
|
|
<h2 class="uk-text-center">About</h2>
|
|
|
|
|
<div class="uk-flex uk-flex-center">
|
|
|
|
|
<div class="uk-padding-small uk-width-4-5@m">
|
|
|
|
|
<p>
|
|
|
|
|
Open Science is gradually becoming the modus operandi in research practices, affecting the way researchers
|
|
|
|
|
collaborate and publish, discover, and access scientific knowledge.
|
|
|
|
|
Scientists are increasingly publishing research results beyond the article, to share all scientific
|
|
|
|
|
products (metadata and files) generated during an experiment, such as datasets, software, experiments.
|
|
|
|
|
They publish in scholarly communication data sources (e.g. institutional repositories, data archives,
|
|
|
|
|
software repositories), rely where possible on persistent identifiers (e.g. DOI, ORCID, Grid.ac, PDBs),
|
|
|
|
|
specify semantic links to other research products (e.g. supplementedBy, citedBy, versionOf), and possibly
|
|
|
|
|
to projects and/or relative funders.
|
|
|
|
|
By following such practices, scientists are implicitly constructing the Global Open Science Graph, where
|
|
|
|
|
by "graph" we mean a collection of objects interlinked by semantic relationships.
|
|
|
|
|
<br><br>
|
|
|
|
|
The OpenAIRE Research Graph includes metadata and links between scientific products (e.g. literature,
|
|
|
|
|
datasets, software, and "other research products"), organizations, funders, funding streams, projects,
|
|
|
|
|
communities, and (provenance) data sources - the details of the <a
|
|
|
|
|
href="https://zenodo.org/record/2643199#.XOqdstMzZ24" target="_blank">graph data model</a> can be found
|
|
|
|
|
in Zenodo.org.
|
|
|
|
|
<br><br>
|
|
|
|
|
The Graph is available and obtained as an aggregation of the metadata and links collected from ~70.000
|
|
|
|
|
trusted sources, further enriched with metadata and links provided by:</p>
|
|
|
|
|
<ul class="portal-circle">
|
|
|
|
|
<li class="uk-margin-bottom">OpenAIRE end-users, e.g. researchers, project administrators, data curators
|
|
|
|
|
providing links from scientific products to projects, funders, communities, or other products;
|
|
|
|
|
</li>
|
|
|
|
|
<li class="uk-margin-bottom">OpenAIRE Full-text mining algorithms over around ~10Mi Open Access article
|
|
|
|
|
full-texts;
|
|
|
|
|
</li>
|
|
|
|
|
<li>Research infrastructure scholarly services, bridged to the graph via OpenAIRE, exposing metadata of
|
|
|
|
|
products such as research workflows, experiments, research objects, software, etc..
|
|
|
|
|
</li>
|
|
|
|
|
</ul>
|
|
|
|
|
</div>
|
2020-10-09 01:27:04 +02:00
|
|
|
|
</div>
|
|
|
|
|
</div>
|
|
|
|
|
</div>
|
2020-10-09 11:35:22 +02:00
|
|
|
|
</div>
|
2020-10-09 18:37:06 +02:00
|
|
|
|
<div id="architecture" class="uk-container uk-section">
|
|
|
|
|
<div class="uk-padding-small">
|
|
|
|
|
<h2 class="uk-text-center">Architecture</h2>
|
|
|
|
|
<div class="uk-flex uk-flex-center">
|
|
|
|
|
<div class="uk-width-4-5@m">
|
|
|
|
|
<h3 class="uk-margin-medium-top portal-color">How we build it</h3>
|
|
|
|
|
<div>
|
|
|
|
|
<p>
|
|
|
|
|
OpenAIRE collects metadata records from more than 70K scholarly communication sources from all over the
|
|
|
|
|
world, including Open Access institutional repositories, data archives, journals.
|
|
|
|
|
All the metadata records (i.e. descriptions of research products) are put together in a data lake,
|
|
|
|
|
together
|
|
|
|
|
with records from Crossref, Unpaywall, ORCID, Grid.ac, and information about projects provided by national
|
|
|
|
|
and international funders.
|
|
|
|
|
Dedicated inference algorithms applied to metadata and to the full-texts of Open Access publications
|
|
|
|
|
enrich
|
|
|
|
|
the content of the data lake with links between research results and projects, author affiliations,
|
|
|
|
|
subject
|
|
|
|
|
classification, links to entries from domain-specific databases.
|
|
|
|
|
Duplicated organisations and results are identified and merged together to obtain an open, trusted, public
|
|
|
|
|
resource enabling explorations of the scholarly communication landscape like never before.
|
|
|
|
|
</p>
|
2020-10-09 01:27:04 +02:00
|
|
|
|
|
2020-10-09 18:37:06 +02:00
|
|
|
|
</div>
|
2020-10-09 01:27:04 +02:00
|
|
|
|
</div>
|
|
|
|
|
</div>
|
2020-10-09 18:37:06 +02:00
|
|
|
|
<div class="uk-flex uk-flex-center uk-inline uk-margin-medium-top">
|
|
|
|
|
<img [src]="'assets/graph-assets/about/architecture/'+architectureImage"
|
|
|
|
|
class="uk-width-4-5 architecture-image">
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<a class="uk-position-absolute uk-transform-center uk-padding" style="left: 27%; top: 51.5%"
|
2021-04-13 00:21:12 +02:00
|
|
|
|
(click)="changeTab(0)" routerLink="/about" fragment="tabs_card"
|
2020-10-09 18:37:06 +02:00
|
|
|
|
(mouseenter)="architectureImage = 'aggregation_hover.png'" (mouseleave)="architectureImage = 'gray.png'">
|
2020-10-11 16:12:13 +02:00
|
|
|
|
<action-point [class.uk-invisible]="architectureImage == 'aggregation_hover.png'"></action-point>
|
2020-10-09 18:37:06 +02:00
|
|
|
|
</a>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<a class="uk-position-absolute uk-transform-center uk-padding" style="left: 47%; top: 51.5%"
|
2021-04-13 00:21:12 +02:00
|
|
|
|
(click)="changeTab(1)" routerLink="/about" fragment="tabs_card"
|
2020-10-09 18:37:06 +02:00
|
|
|
|
(mouseenter)="architectureImage = 'deduplication_hover.png'" (mouseleave)="architectureImage = 'gray.png'">
|
2020-10-11 16:12:13 +02:00
|
|
|
|
<action-point [class.uk-invisible]="architectureImage == 'deduplication_hover.png'"></action-point>
|
2020-10-09 18:37:06 +02:00
|
|
|
|
</a>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<a class="uk-position-absolute uk-transform-center uk-padding" style="left: 58%; top: 51.5%"
|
2021-04-13 00:21:12 +02:00
|
|
|
|
(click)="changeTab(2)" routerLink="/about" fragment="tabs_card"
|
2020-10-09 18:37:06 +02:00
|
|
|
|
(mouseenter)="architectureImage = 'enrichment_hover.png'" (mouseleave)="architectureImage = 'gray.png'">
|
2020-10-11 16:12:13 +02:00
|
|
|
|
<action-point [class.uk-invisible]="architectureImage == 'enrichment_hover.png'"></action-point>
|
2020-10-09 18:37:06 +02:00
|
|
|
|
</a>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<a class="uk-position-absolute uk-transform-center uk-padding" style="left: 70%; top: 51.5%"
|
2021-04-13 00:21:12 +02:00
|
|
|
|
(click)="changeTab(3)" routerLink="/about" fragment="tabs_card"
|
2020-10-09 18:37:06 +02:00
|
|
|
|
(mouseenter)="architectureImage = 'post_cleaning_hover.png'" (mouseleave)="architectureImage = 'gray.png'">
|
2020-10-11 16:12:13 +02:00
|
|
|
|
<action-point [class.uk-invisible]="architectureImage == 'post_cleaning_hover.png'"></action-point>
|
2020-10-09 18:37:06 +02:00
|
|
|
|
</a>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<a class="uk-position-absolute uk-transform-center uk-padding" style="left: 75%; top: 35%;"
|
2021-04-13 00:21:12 +02:00
|
|
|
|
(click)="changeTab(4)" routerLink="/about" fragment="tabs_card"
|
2020-10-09 18:37:06 +02:00
|
|
|
|
(mouseenter)="architectureImage = 'indexing_hover.png'" (mouseleave)="architectureImage = 'gray.png'">
|
2020-10-11 16:12:13 +02:00
|
|
|
|
<action-point [class.uk-invisible]="architectureImage == 'indexing_hover.png'"></action-point>
|
2020-10-09 18:37:06 +02:00
|
|
|
|
</a>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<a class="uk-position-absolute uk-transform-center uk-padding" style="left: 75%; top: 72%"
|
2021-04-13 00:21:12 +02:00
|
|
|
|
(click)="changeTab(5)" routerLink="/about" fragment="tabs_card"
|
2020-10-09 18:37:06 +02:00
|
|
|
|
(mouseenter)="architectureImage = 'stats_analysis_hover.png'" (mouseleave)="architectureImage = 'gray.png'">
|
2020-10-11 16:12:13 +02:00
|
|
|
|
<action-point [class.uk-invisible]="architectureImage == 'stats_analysis_hover.png'"></action-point>
|
2020-10-09 18:37:06 +02:00
|
|
|
|
</a>
|
|
|
|
|
</div>
|
|
|
|
|
<div id="tabs_card"
|
2020-10-09 18:44:07 +02:00
|
|
|
|
class="uk-margin-xlarge-top uk-padding-small">
|
|
|
|
|
<div class="uk-card uk-card-default uk-card-body architecture-card">
|
|
|
|
|
<ul #tabs uk-tab class="uk-tab">
|
|
|
|
|
<li><a>Aggregation</a></li>
|
|
|
|
|
<li><a>Deduplication</a></li>
|
|
|
|
|
<li><a>Enrichment</a></li>
|
|
|
|
|
<li><a>Post-Cleaning</a></li>
|
|
|
|
|
<li><a>Indexing</a></li>
|
|
|
|
|
<li><a>Stats Analysis</a></li>
|
|
|
|
|
</ul>
|
|
|
|
|
<ul class="uk-switcher uk-margin">
|
|
|
|
|
<li>
|
|
|
|
|
<div class=" uk-margin-large-top uk-text-small">
|
|
|
|
|
<img class="uk-width-2-5@m uk-align-right@m uk-margin-remove-adjacent tab-image"
|
|
|
|
|
src="assets/graph-assets/about/architecture/aggregation.png" alt="Aggregation">
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<div class="uk-margin-bottom uk-margin-medium-right uk-text-small lines-18"
|
|
|
|
|
[class.multi-line-ellipsis]="!aggregationReadMore">
|
2020-10-09 18:44:07 +02:00
|
|
|
|
<div>
|
|
|
|
|
OpenAIRE collects metadata records from a variety of content providers as described in
|
|
|
|
|
<a href="https://www.openaire.eu/aggregation-and-content-provision-workflows" target="_blank">https://www.openaire.eu/aggregation-and-content-provision-workflows</a>.
|
|
|
|
|
<br><br>
|
|
|
|
|
OpenAIRE aggregates metadata records describing objects of the research life-cycle from content
|
|
|
|
|
providers compliant to the
|
|
|
|
|
<a href="https://guidelines.openaire.eu" target="_blank">OpenAIRE guidelines</a>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
and from entity registries (i.e. data sources offering authoritative lists of entities, like
|
|
|
|
|
OpenDOAR,
|
2020-10-09 18:44:07 +02:00
|
|
|
|
re3data, DOAJ, and funder databases).
|
2021-04-15 16:39:34 +02:00
|
|
|
|
After collection, metadata are transformed according to the OpenAIRE internal metadata model, which
|
|
|
|
|
is
|
2020-10-09 18:44:07 +02:00
|
|
|
|
used to generate the final OpenAIRE Research Graph that you can access from the OpenAIRE portal and
|
|
|
|
|
the
|
|
|
|
|
APIs.
|
|
|
|
|
<br><br>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
The transformation process includes the application of cleaning functions whose goal is to ensure
|
|
|
|
|
that
|
2020-10-09 18:44:07 +02:00
|
|
|
|
values are harmonised according to a common format (e.g. dates as YYYY-MM-dd) and, whenever
|
|
|
|
|
applicable,
|
|
|
|
|
to a common controlled vocabulary.
|
|
|
|
|
The controlled vocabularies used for cleansing are accessible at
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<a href="http://api.openaire.eu/vocabularies"
|
|
|
|
|
target="_blank">http://api.openaire.eu/vocabularies</a>.
|
2020-10-09 18:44:07 +02:00
|
|
|
|
Each vocabulary features a set of controlled terms, each with one code, one label, and a set of
|
|
|
|
|
synonyms.
|
|
|
|
|
If a synonym is found as field value, the value is updated with the corresponding term.
|
2021-04-15 16:39:34 +02:00
|
|
|
|
Also, the OpenAIRE Research Graph is extended with other relevant scholarly communication sources
|
|
|
|
|
that
|
|
|
|
|
are too big to be integrated via the “normal” aggregation mechanism: DOIBoost (which merges
|
|
|
|
|
Crossref,
|
|
|
|
|
ORCID, Microsoft Academic Graph, and Unpaywall), and ScholeXplorer, one of the Scholix hubs offering
|
|
|
|
|
a
|
2020-10-09 18:44:07 +02:00
|
|
|
|
large set of links between research literature and data.
|
|
|
|
|
</div>
|
2020-10-09 01:27:04 +02:00
|
|
|
|
</div>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<div *ngIf="!aggregationReadMore" class="uk-text-center clickable">
|
|
|
|
|
<a (click)="aggregationReadMore = true" class="custom-explore-toggle">Read more<span uk-icon="chevron-down"></span></a>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
</div>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<div *ngIf="aggregationReadMore" class="uk-text-center clickable">
|
|
|
|
|
<a (click)="aggregationReadMore = false" routerLink="./" fragment="tabs_card" class="custom-explore-toggle">Read less<span uk-icon="chevron-up"></span></a>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
</div>
|
2020-10-09 11:35:22 +02:00
|
|
|
|
</div>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
</li>
|
|
|
|
|
<li>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<div class="uk-margin-bottom uk-text-small">
|
2020-10-09 18:44:07 +02:00
|
|
|
|
<ul class="uk-subnav button-tab" uk-switcher>
|
|
|
|
|
<li><a>Clustering</a></li>
|
|
|
|
|
<li><a>Matching & Election</a></li>
|
|
|
|
|
</ul>
|
|
|
|
|
<ul class="uk-switcher uk-margin align-list">
|
|
|
|
|
<li>
|
|
|
|
|
<img class="uk-width-2-5@m uk-align-right@m uk-margin-remove-adjacent tab-image"
|
|
|
|
|
src="assets/graph-assets/about/architecture/deduplication.svg" alt="Deduplication">
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<div class="uk-margin-bottom uk-margin-medium-right uk-text-small lines-18"
|
|
|
|
|
[class.multi-line-ellipsis]="!dedupClusteringReadMore">
|
2020-10-09 01:27:04 +02:00
|
|
|
|
<div>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
<div>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
Clustering is a common heuristics used to overcome the N x N complexity required to match
|
|
|
|
|
all
|
2020-10-09 18:44:07 +02:00
|
|
|
|
pairs of objects to identify the equivalent ones.
|
2021-04-15 16:39:34 +02:00
|
|
|
|
The challenge is to identify a clustering function that maximizes the chance of comparing
|
|
|
|
|
only
|
2020-10-09 18:44:07 +02:00
|
|
|
|
records that may lead to a match, while minimizing the number of records that will not be
|
|
|
|
|
matched while being equivalent.
|
2021-04-15 16:39:34 +02:00
|
|
|
|
Since the equivalence function is to some level tolerant to minimal errors (e.g. switching
|
|
|
|
|
of
|
2020-10-09 18:44:07 +02:00
|
|
|
|
characters in the title, or minimal difference in letters), we need this function to be not
|
|
|
|
|
too
|
|
|
|
|
precise (e.g. a hash of the title), but also not too flexible (e.g. random ngrams of the
|
|
|
|
|
title).
|
|
|
|
|
On the other hand, reality tells us that in some cases equality of two records can only be
|
|
|
|
|
determined by their PIDs (e.g. DOI) as the metadata properties are very different across
|
|
|
|
|
different versions and no clustering function will ever bring them into the same cluster.
|
|
|
|
|
To match these requirements OpenAIRE clustering for products works with two functions:
|
|
|
|
|
</div>
|
|
|
|
|
<ul class="portal-circle">
|
|
|
|
|
<li>
|
|
|
|
|
<div>DOI: the function generates the DOI when this is provided as part of the record
|
|
|
|
|
properties;
|
|
|
|
|
</div>
|
|
|
|
|
</li>
|
|
|
|
|
<li>
|
|
|
|
|
<div>
|
|
|
|
|
Title-based function: the function generates a key that depends on
|
|
|
|
|
(i) number of significant words in the title (normalized, stemming, etc.),
|
|
|
|
|
(ii) module 10 of the number of characters of such words, and
|
|
|
|
|
(iii) a string obtained as an alternation of the function prefix(3) and suffix(3) (and
|
|
|
|
|
vice
|
|
|
|
|
versa) o the first 3 words (2 words if the title only has 2). For example, the title
|
|
|
|
|
“Entity
|
|
|
|
|
deduplication in big data graphs for scholarly communication” becomes “entity
|
|
|
|
|
deduplication
|
|
|
|
|
big data graphs scholarly communication” with two keys key “7.1entionbig” and
|
|
|
|
|
“7.1itydedbig”
|
|
|
|
|
(where 1 is module 10 of 54 characters of the normalized title.
|
|
|
|
|
</div>
|
|
|
|
|
</li>
|
|
|
|
|
</ul>
|
|
|
|
|
<div>
|
|
|
|
|
To give an idea, this configuration generates around 77Mi blocks, which we limited to 200
|
2021-04-15 16:39:34 +02:00
|
|
|
|
records each (only 15K blocks are affected by the cut), and entails 260Bi matches. Matches
|
|
|
|
|
in
|
2020-10-09 18:44:07 +02:00
|
|
|
|
a
|
|
|
|
|
block are performed using a “sliding window” set to 80 records. The records are sorted
|
|
|
|
|
lexicographically on a normalized version of their titles. The 1st record is matched against
|
|
|
|
|
all
|
|
|
|
|
the 80 following ones, then the second, etc. for an NlogN complexity.
|
|
|
|
|
</div>
|
|
|
|
|
</div>
|
|
|
|
|
</div>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<div *ngIf="!dedupClusteringReadMore" class="uk-text-center clickable">
|
|
|
|
|
|
|
|
|
|
<a (click)="dedupClusteringReadMore = true" class="custom-explore-toggle">Read more<span uk-icon="chevron-down"></span></a>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
</div>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<div *ngIf="dedupClusteringReadMore" class="uk-text-center clickable">
|
|
|
|
|
<a (click)="dedupClusteringReadMore = false;" routerLink="./" fragment="tabs_card" class="custom-explore-toggle">Read less<span uk-icon="chevron-up"></span></a>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
</div>
|
|
|
|
|
</li>
|
|
|
|
|
<li>
|
|
|
|
|
<img class="uk-width-2-5@m uk-align-right@m uk-margin-remove-adjacent tab-image"
|
|
|
|
|
src="assets/graph-assets/about/architecture/deduplication.svg" alt="Deduplication">
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<div class="uk-margin-bottom uk-margin-medium-right uk-text-small lines-18"
|
|
|
|
|
[class.multi-line-ellipsis]="!dedupMatchingAndElectionReadMore">
|
2020-10-09 11:35:22 +02:00
|
|
|
|
<div>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
<div>
|
|
|
|
|
Once the clusters have been built, the algorithm proceeds with the comparisons.
|
|
|
|
|
Comparisons are driven by a decisional tree that:
|
|
|
|
|
</div>
|
|
|
|
|
<ul class="uk-list">
|
|
|
|
|
<li class="uk-margin-small-bottom">
|
|
|
|
|
<div>
|
|
|
|
|
<span class="portal-color">1.</span> Tries to capture equivalence via PIDs: if records
|
|
|
|
|
share
|
|
|
|
|
a PID then they are equivalent
|
|
|
|
|
</div>
|
|
|
|
|
</li>
|
|
|
|
|
<li class="uk-margin-small-bottom">
|
|
|
|
|
<div>
|
|
|
|
|
<span class="portal-color">2.</span> Tries to capture difference:
|
|
|
|
|
</div>
|
|
|
|
|
<ul class="uk-list">
|
|
|
|
|
<li class="uk-margin-small-bottom">
|
|
|
|
|
<div>
|
|
|
|
|
<span class="portal-color">a.</span>
|
|
|
|
|
If record titles contain different “numbers” then they are different (this rule is
|
|
|
|
|
subject to different feelings, and should be fine-tuned);
|
|
|
|
|
</div>
|
|
|
|
|
</li>
|
|
|
|
|
<li class="uk-margin-small-bottom">
|
|
|
|
|
<div>
|
|
|
|
|
<span class="portal-color">b.</span>
|
|
|
|
|
If record contain different number of authors then they are different;
|
|
|
|
|
</div>
|
|
|
|
|
</li>
|
|
|
|
|
<li class="uk-margin-small-bottom">
|
|
|
|
|
<div>
|
|
|
|
|
<span class="portal-color">c.</span>
|
|
|
|
|
Note that different PIDs do not imply different records, as different versions may
|
|
|
|
|
have
|
|
|
|
|
different PIDs.
|
|
|
|
|
</div>
|
|
|
|
|
</li>
|
|
|
|
|
</ul>
|
|
|
|
|
</li>
|
|
|
|
|
<li>
|
|
|
|
|
<div><span class="portal-color">3.</span> Measures equivalence:</div>
|
|
|
|
|
<ul class="uk-list portal-circle">
|
|
|
|
|
<li>
|
|
|
|
|
<div>
|
|
|
|
|
The titles of the two records are normalised and compared for similarity by applying
|
|
|
|
|
the
|
|
|
|
|
Levenstein distance algorithm.
|
2021-04-15 16:39:34 +02:00
|
|
|
|
The algorithm returns a number in the range [0,1], where 0 means “very different”
|
|
|
|
|
and
|
2020-10-09 18:44:07 +02:00
|
|
|
|
1
|
|
|
|
|
means “equal”.
|
|
|
|
|
If the distance is greater than or equal 0,99 the two records are identified as
|
|
|
|
|
duplicates.
|
|
|
|
|
</div>
|
|
|
|
|
</li>
|
|
|
|
|
<li>
|
|
|
|
|
<div>Dates are not regarded for equivalence matching because different versions of the
|
2021-04-15 16:39:34 +02:00
|
|
|
|
same records should be merged and may be published on different dates, e.g.
|
|
|
|
|
pre-print
|
2020-10-09 18:44:07 +02:00
|
|
|
|
and published version of an article.
|
|
|
|
|
</div>
|
|
|
|
|
</li>
|
|
|
|
|
</ul>
|
|
|
|
|
</li>
|
|
|
|
|
</ul>
|
|
|
|
|
<div>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
Once the equivalence relationships between pairs of records are set, the groups of
|
|
|
|
|
equivalent
|
2020-10-09 18:44:07 +02:00
|
|
|
|
records are obtained (transitive closure, i.e. “mesh”).
|
2021-04-15 16:39:34 +02:00
|
|
|
|
From such sets a new representative object is obtained, which inherits all properties from
|
|
|
|
|
the
|
2020-10-09 18:44:07 +02:00
|
|
|
|
merged records and keeps track of their provenance.
|
|
|
|
|
The ID of the record is obtained by appending the prefix “dedup_” to the MD5 of the first ID
|
|
|
|
|
(given their lexicographical ordering).
|
|
|
|
|
A new, more stable function to generate the ID is under development, which exploits the DOI
|
|
|
|
|
when
|
|
|
|
|
one of the records to be merged includes a Crossref or a DataCite record.
|
|
|
|
|
</div>
|
2020-10-09 11:35:22 +02:00
|
|
|
|
</div>
|
2020-10-09 01:27:04 +02:00
|
|
|
|
</div>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<div *ngIf="!dedupMatchingAndElectionReadMore" class="uk-text-center clickable">
|
|
|
|
|
<a (click)="dedupMatchingAndElectionReadMore = true" class="custom-explore-toggle">Read more<span uk-icon="chevron-down"></span></a>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
</div>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<div *ngIf="dedupMatchingAndElectionReadMore" class="uk-text-center clickable">
|
|
|
|
|
<a (click)="dedupMatchingAndElectionReadMore = false" routerLink="./" fragment="tabs_card" class="custom-explore-toggle">Read less<span uk-icon="chevron-up"></span></a>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
</div>
|
|
|
|
|
</li>
|
|
|
|
|
</ul>
|
|
|
|
|
</div>
|
|
|
|
|
</li>
|
|
|
|
|
<li>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<div class="uk-margin-bottom uk-text-small">
|
2020-10-09 18:44:07 +02:00
|
|
|
|
<ul class="uk-subnav button-tab uk-grid uk-grid-small" uk-switcher>
|
|
|
|
|
<li><a>Mining</a></li>
|
|
|
|
|
<li><a>Bulk tagging/ Deduction</a></li>
|
|
|
|
|
<li><a>Propagation</a></li>
|
|
|
|
|
</ul>
|
|
|
|
|
<ul class="uk-switcher uk-margin">
|
|
|
|
|
<li>
|
|
|
|
|
<img class="uk-width-2-5@m uk-align-right@m uk-margin-remove-adjacent tab-image"
|
|
|
|
|
src="assets/graph-assets/about/architecture/enrichment.svg" alt="Enrichment">
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<div class="uk-margin-bottom uk-margin-medium-right uk-text-small lines-18"
|
|
|
|
|
[class.multi-line-ellipsis]="!enrichmentMiningReadMore">
|
2020-10-09 01:27:04 +02:00
|
|
|
|
<div>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
The OpenAIRE Research Graph is enriched by links mined by OpenAIRE’s full-text mining
|
|
|
|
|
algorithms
|
|
|
|
|
that scan the plaintexts of publications for funding information, references to datasets,
|
|
|
|
|
software URIs, accession numbers of bioetities, and EPO patent mentions.
|
|
|
|
|
Custom mining modules also link research objects to specific research communities, initiatives
|
|
|
|
|
and infrastructures.
|
|
|
|
|
In addition, other inference modules provide content-based document classification, document
|
|
|
|
|
similarity, citation matching, and author affiliation matching.
|
|
|
|
|
<br><br>
|
|
|
|
|
<span class="portal-color">Project mining</span>
|
|
|
|
|
in OpenAIRE text mines the full-texts of publications in order to extract matches to funding
|
|
|
|
|
project codes/IDs.
|
|
|
|
|
The mining algorithm works by utilising
|
|
|
|
|
(i) the grant identifier, and
|
|
|
|
|
(ii) the project acronym (if available) of each project.
|
|
|
|
|
The mining algorithm:
|
|
|
|
|
(1) Preprocesses/normalizes the full-texts using several functions, which depend on the
|
|
|
|
|
characteristics of each funder (i.e., the format of the grant identifiers), such as stopword
|
|
|
|
|
and/or punctuation removal, tokenization, stemming, converting to lowercase; then
|
|
|
|
|
(2) String matching of grant identifiers against the normalized text is done using database
|
|
|
|
|
techniques; and
|
|
|
|
|
(3) The results are validated and cleaned using the context near the match by looking at the
|
|
|
|
|
context around the matched ID for relevant metadata and positive or negative words/phrases, in
|
|
|
|
|
order to calculate a confidence value for each publication-->project link.
|
|
|
|
|
A confidence threshold is set to optimise high accuracy while minimising false positives, such
|
|
|
|
|
as matches with page or report numbers, post/zip codes, parts of telephone numbers, DOIs or
|
|
|
|
|
URLs, accession numbers.
|
|
|
|
|
The algorithm also applies rules for disambiguating results, as different funders can share
|
|
|
|
|
identical project IDs; for example, grant number 633172 could refer to H2020 project EuroMix
|
|
|
|
|
but
|
|
|
|
|
also to Australian-funded NHMRC project “Brain activity (EEG) analysis and brain imaging
|
|
|
|
|
techniques to measure the neurobiological effects of sleep apnea”.
|
|
|
|
|
Project mining works very well and was the first Text & Data Mining (TDM) service of OpenAIRE.
|
|
|
|
|
Performance results vary from funder to funder but precision is higher than 98% for all
|
|
|
|
|
funders
|
|
|
|
|
and 99.5% for EC projects.
|
|
|
|
|
Recall is higher than 95% (99% for EC projects), when projects are properly acknowledged using
|
|
|
|
|
project/grant IDs.
|
|
|
|
|
<br><br>
|
|
|
|
|
<span class="portal-color">Dataset extraction</span>
|
|
|
|
|
runs on publications full-texts as described in “High pass text-filtering for Citation
|
|
|
|
|
matching”, TPDL 2017[1].
|
|
|
|
|
In particular, we search for citations to datasets using their DOIs, titles and other metadata
|
|
|
|
|
(i.e., dates, creator names, publishers, etc.).
|
|
|
|
|
We extract parts of the text which look like citations and search for datasets using database
|
|
|
|
|
join and pattern matching techniques.
|
|
|
|
|
Based on the experiments described in the paper, precision of the dataset extraction module is
|
|
|
|
|
98.5% and recall is 97.4% but it is also probably overestimated since it does not take into
|
|
|
|
|
account corruptions that may take place during pdf to text extraction.
|
|
|
|
|
It is calculated on the extracted full-texts of small samples from PubMed and arXiv.
|
|
|
|
|
<br><br>
|
|
|
|
|
<span class="portal-color">Software extraction</span>
|
|
|
|
|
runs also on parts of the text which look like citations.
|
|
|
|
|
We search the citations for links to software in open software repositories, specifically
|
|
|
|
|
github, sourceforge, bitbucket and the google code archive.
|
|
|
|
|
After that, we search for links that are included in Software Heritage (SH,
|
|
|
|
|
https://www.softwareheritage.org) and return the permanent URL that SH provides for each
|
|
|
|
|
software project.
|
|
|
|
|
We also enrich this content with user names, titles and descriptions of the software projects
|
|
|
|
|
using web mining techniques.
|
|
|
|
|
Since software mining is based on URL matching, our precision is 100% (we return a software
|
|
|
|
|
link
|
|
|
|
|
only if we find it in the text and there is no need to disambiguate).
|
|
|
|
|
As for recall rate, this is not calculable for this mining task.
|
|
|
|
|
Although we apply all the necessary normalizations to the URLs in order to overcome usual
|
|
|
|
|
issues
|
|
|
|
|
(e.g., http or https, existence of www or not, lower/upper case), we do not calculate cases
|
|
|
|
|
where a software is mentioned using its name and not by a link from the supported software
|
|
|
|
|
repositories.
|
|
|
|
|
<br><br>
|
|
|
|
|
<span class="portal-color">For the extraction of bio-entities</span>, we focus on Protein Data
|
|
|
|
|
Bank (PDB) entries.
|
|
|
|
|
We have downloaded the database with PDB codes and we update it regularly.
|
|
|
|
|
We search through the whole publication’s full-text for references to PDB codes.
|
|
|
|
|
We apply disambiguation rules (e.g., there are PDB codes that are the same as antibody codes
|
|
|
|
|
or
|
|
|
|
|
other issues) so that we return valid results.
|
|
|
|
|
Current precision is 98%.
|
|
|
|
|
Although it's risky to mention recall rates since these are usually overestimated, we have
|
|
|
|
|
calculated a recall rate of 98% using small samples from pubmed publications.
|
|
|
|
|
Moreover, our technique is able to identify about 30% more links to proteins than the ones
|
|
|
|
|
that
|
|
|
|
|
are tagged in Pubmed xmls.
|
|
|
|
|
<br><br>
|
|
|
|
|
<span class="portal-color">Other text-mining modules</span> include mining for links to EPO
|
|
|
|
|
patents, or custom mining modules for linking research objects to specific research
|
|
|
|
|
communities,
|
|
|
|
|
initiatives and infrastructures, e.g. COVID-19 mining module.
|
|
|
|
|
Apart from text-mining modules, OpenAIRE also provides a document classification service that
|
|
|
|
|
employs analysis of free text stemming from the abstracts of the publications.
|
|
|
|
|
The purpose of applying a document classification module is to assign a scientific text one or
|
|
|
|
|
more predefined content classes.
|
|
|
|
|
In OpenAIRE, the currently used taxonomies are arXiv, MeSH (Medical Subject Headings), ACM and
|
|
|
|
|
DDC (Dewey Decimal Classification, or Dewey Decimal System).
|
|
|
|
|
<br><br>
|
|
|
|
|
<hr>
|
|
|
|
|
[1] Foufoulas, Y., Stamatogiannakis, L., Dimitropoulos, H., & Ioannidis, Y. (2017, September).
|
|
|
|
|
High-Pass Text Filtering for Citation Matching.
|
|
|
|
|
In International Conference on Theory and Practice of Digital Libraries (pp. 355-366).
|
|
|
|
|
Springer,
|
|
|
|
|
Cham.
|
2020-10-09 11:35:22 +02:00
|
|
|
|
</div>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
</div>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<div *ngIf="!enrichmentMiningReadMore" class="uk-text-center clickable">
|
|
|
|
|
<a (click)="enrichmentMiningReadMore = true" class="custom-explore-toggle">Read more<span uk-icon="chevron-down"></span></a>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
</div>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<div *ngIf="enrichmentMiningReadMore" class="uk-text-center clickable">
|
|
|
|
|
<a (click)="enrichmentMiningReadMore = false" routerLink="./" fragment="tabs_card" class="custom-explore-toggle">Read less<span uk-icon="chevron-up"></span></a>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
</div>
|
|
|
|
|
</li>
|
|
|
|
|
<li>
|
|
|
|
|
<img class="uk-width-2-5@m uk-align-right@m uk-margin-remove-adjacent tab-image"
|
|
|
|
|
src="assets/graph-assets/about/architecture/enrichment.svg" alt="Enrichment">
|
|
|
|
|
<div class="uk-margin-bottom uk-margin-medium-right uk-text-small">
|
|
|
|
|
The Deduction process (also known as “bulk tagging”) enriches each record with new information
|
|
|
|
|
that
|
|
|
|
|
can be derived from the existing property values.
|
|
|
|
|
<br><br>
|
|
|
|
|
As of September 2020, three procedures are in place to relate a research product to a research
|
|
|
|
|
initiative, infrastructure (RI) or community (RC) based on:
|
|
|
|
|
<ul class="portal-circle">
|
|
|
|
|
<li>subjects (2.7M results tagged)</li>
|
|
|
|
|
<li>Zenodo community (16K results tagged)</li>
|
|
|
|
|
<li>the data source it comes from (250K results tagged)</li>
|
|
|
|
|
</ul>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
The list of subjects, Zenodo communities and data sources used to enrich the products are
|
|
|
|
|
defined
|
2020-10-09 18:44:07 +02:00
|
|
|
|
by
|
|
|
|
|
the managers of the community gateway or infrastructure monitoring dashboard associated with the
|
|
|
|
|
RC/RI.
|
|
|
|
|
</div>
|
|
|
|
|
</li>
|
|
|
|
|
<li>
|
|
|
|
|
<img class="uk-width-2-5@m uk-align-right@m uk-margin-remove-adjacent tab-image"
|
|
|
|
|
src="assets/graph-assets/about/architecture/enrichment.svg" alt="Enrichment">
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<div class="uk-margin-bottom uk-margin-medium-right uk-text-small lines-18"
|
|
|
|
|
[class.multi-line-ellipsis]="!enrichmentPropagationReadMore">
|
2020-10-09 18:44:07 +02:00
|
|
|
|
<div>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
This process “propagates” properties and links from one product to another if between the
|
|
|
|
|
two
|
2020-10-09 18:44:07 +02:00
|
|
|
|
there is a “strong” semantic relationship.
|
|
|
|
|
<br><br>
|
|
|
|
|
As of September 2020, the following procedures are in place:
|
|
|
|
|
<ul class="portal-circle">
|
|
|
|
|
<li>
|
|
|
|
|
Propagation of the property “country” to results from institutional repositories:
|
|
|
|
|
e.g. publication collected from an institutional repository maintained by an italian
|
|
|
|
|
university will be enriched with the property “country = IT”.
|
2020-10-09 01:27:04 +02:00
|
|
|
|
</li>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
<li>
|
|
|
|
|
Propagation of links to projects: e.g. publication linked to project P “is supplemented
|
|
|
|
|
by”
|
|
|
|
|
a dataset D.
|
|
|
|
|
Dataset D will get the link to project P.
|
2021-04-15 16:39:34 +02:00
|
|
|
|
The relationships considered for this procedure are “isSupplementedBy” and
|
|
|
|
|
“supplements”.
|
2020-10-09 01:27:04 +02:00
|
|
|
|
</li>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
<li>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
Propagation of related community/infrastructure/initiative from organizations to
|
|
|
|
|
products
|
2020-10-09 18:44:07 +02:00
|
|
|
|
via affiliation relationships: e.g. a publication with an author affiliated with
|
|
|
|
|
organization O.
|
2021-04-15 16:39:34 +02:00
|
|
|
|
The manager of the community gateway C declared that the outputs of O are all relevant
|
|
|
|
|
for
|
2020-10-09 18:44:07 +02:00
|
|
|
|
his/her community C.
|
|
|
|
|
The publication is tagged as relevant for C.
|
2020-10-09 01:27:04 +02:00
|
|
|
|
</li>
|
|
|
|
|
<li>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
Propagation of related community/infrastructure/initiative to related products: e.g.
|
|
|
|
|
publication associated to community C is supplemented by a dataset D.
|
|
|
|
|
Dataset D will get the association to C.
|
2021-04-15 16:39:34 +02:00
|
|
|
|
The relationships considered for this procedure are “isSupplementedBy” and
|
|
|
|
|
“supplements”.
|
2020-10-09 01:27:04 +02:00
|
|
|
|
</li>
|
|
|
|
|
<li>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
Propagation of ORCID identifiers to related products, if the products have the same
|
|
|
|
|
authors:
|
|
|
|
|
e.g. publication has ORCID for its authors and is supplemented by a dataset D. Dataset D
|
|
|
|
|
has
|
|
|
|
|
the same authors as the publication. Authors of D are enriched with the ORCIDs available
|
|
|
|
|
in
|
|
|
|
|
the publication.
|
2021-04-15 16:39:34 +02:00
|
|
|
|
The relationships considered for this procedure are “isSupplementedBy” and
|
|
|
|
|
“supplements”.
|
2020-10-09 01:27:04 +02:00
|
|
|
|
</li>
|
|
|
|
|
</ul>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
</div>
|
2020-10-09 01:27:04 +02:00
|
|
|
|
</div>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<div *ngIf="!enrichmentPropagationReadMore" class="uk-text-center clickable">
|
|
|
|
|
<a (click)="enrichmentPropagationReadMore = true" class="custom-explore-toggle">Read more<span uk-icon="chevron-down"></span></a>
|
2020-10-09 11:35:22 +02:00
|
|
|
|
</div>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<div *ngIf="enrichmentPropagationReadMore" class="uk-text-center clickable">
|
|
|
|
|
<a (click)="enrichmentPropagationReadMore = false" routerLink="./" fragment="tabs_card" class="custom-explore-toggle">Read less<span uk-icon="chevron-up"></span></a>
|
2020-10-09 11:35:22 +02:00
|
|
|
|
</div>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
</li>
|
|
|
|
|
</ul>
|
|
|
|
|
</div>
|
|
|
|
|
</li>
|
|
|
|
|
<li>
|
|
|
|
|
<div class="uk-text-small uk-margin-large-top">
|
|
|
|
|
<img class="uk-width-2-5@m uk-align-right@m uk-margin-remove-adjacent tab-image"
|
|
|
|
|
src="assets/graph-assets/about/architecture/post_cleaning.svg" alt="Post Cleaning">
|
|
|
|
|
<div class="uk-margin-bottom uk-margin-medium-right">
|
|
|
|
|
<p>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
The aggregation processes are continuously running and apply vocabularies as they are in a given
|
|
|
|
|
moment of time.
|
|
|
|
|
It could be the case that a vocabulary changes after the aggregation of one data source has
|
|
|
|
|
finished, thus the aggregated content does not reflect the current status of the controlled
|
|
|
|
|
vocabularies.
|
2020-10-11 03:22:06 +02:00
|
|
|
|
<br><br>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
In addition, the integration of ScholeXplorer and DOIBoost and some enrichment processes applied on
|
|
|
|
|
the raw and on the de-duplicated graph may introduce values that do not comply with the current
|
|
|
|
|
status of the OpenAIRE controlled vocabularies.
|
2020-10-11 03:22:06 +02:00
|
|
|
|
For these reasons, we included a final step of cleansing at the end of the workflow materialisation.
|
|
|
|
|
The output of the final cleansing step is the final version of the OpenAIRE Research Graph.
|
2020-10-09 18:44:07 +02:00
|
|
|
|
</p>
|
|
|
|
|
</div>
|
2020-10-09 18:37:06 +02:00
|
|
|
|
</div>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
</li>
|
|
|
|
|
<li>
|
|
|
|
|
<div class="uk-text-small uk-margin-large-top">
|
|
|
|
|
<img class="uk-width-2-5@m uk-align-right@m uk-margin-remove-adjacent tab-image"
|
|
|
|
|
src="assets/graph-assets/about/architecture/indexing.svg" alt="Indexing">
|
|
|
|
|
<div class="uk-margin-bottom uk-margin-medium-right">
|
|
|
|
|
<p>
|
|
|
|
|
The final version of the OpenAIRE Research Graph is indexed on a Solr server that is used by the
|
|
|
|
|
OpenAIRE portals (EXPLORE, CONNECT, PROVIDE) and APIs, the latter adopted by several third-party
|
|
|
|
|
applications and organizations, such as:
|
|
|
|
|
</p>
|
|
|
|
|
<ul class="portal-circle">
|
|
|
|
|
<li class="uk-margin-small-bottom">
|
|
|
|
|
<span class="portal-color">EOSC</span>
|
|
|
|
|
--The OpenAIRE Research Graph APIs and Portals will offer to the EOSC an Open Science Resource
|
|
|
|
|
Catalogue, keeping an up to date map of all research results (publications, datasets, software),
|
|
|
|
|
services, organizations, projects, funders in Europe and beyond.
|
|
|
|
|
</li>
|
|
|
|
|
<li class="uk-margin-small-bottom">
|
|
|
|
|
<span class="portal-color">DSpace & EPrints</span>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
repositories can install the OpenAIRE plugin to expose OpenAIRE compliant metadata records via
|
|
|
|
|
their
|
2020-10-09 18:44:07 +02:00
|
|
|
|
OAI-PMH endpoint and offer to researchers the possibility to link their depositions to the funding
|
|
|
|
|
project, by selecting it from the list of project provided by OpenAIRE
|
|
|
|
|
</li>
|
|
|
|
|
<li>
|
|
|
|
|
<span class="portal-color">EC participant portal (Sygma - System for Grant Management)</span>
|
|
|
|
|
uses the OpenAIRE API in the “Continuous Reporting” section.
|
|
|
|
|
Sygma automatically fetches from the OpenAIRE Search API the list of publications and datasets in
|
|
|
|
|
the
|
|
|
|
|
OpenAIRE Research Graph that are linked to the project.
|
2021-04-15 16:39:34 +02:00
|
|
|
|
The user can select the research products from the list and easily compile the continuous
|
|
|
|
|
reporting
|
2020-10-09 18:44:07 +02:00
|
|
|
|
data of the project.
|
|
|
|
|
</li>
|
|
|
|
|
</ul>
|
|
|
|
|
</div>
|
2020-10-09 18:37:06 +02:00
|
|
|
|
</div>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
</li>
|
|
|
|
|
<li>
|
|
|
|
|
<div class="uk-text-small uk-margin-large-top">
|
|
|
|
|
<img
|
|
|
|
|
class="uk-width-2-5@m uk-align-right@m uk-margin-remove-adjacent tab-image uk-padding-large uk-padding-remove-top uk-padding-remove-horizontal"
|
|
|
|
|
src="assets/graph-assets/about/architecture/stats_analysis.svg" alt="Stats Analysis">
|
|
|
|
|
<div class="uk-margin-bottom uk-margin-medium-right">
|
|
|
|
|
<p>
|
|
|
|
|
The OpenAIRE Research Graph is also processed by a pipeline for extracting the statistics and
|
|
|
|
|
producing
|
|
|
|
|
the charts for funders, research initiative, infrastructures, and policy makers that you can see on
|
|
|
|
|
MONITOR.
|
2021-04-15 16:39:34 +02:00
|
|
|
|
Based on the information available on the graph, OpenAIRE provides a set of indicators for
|
|
|
|
|
monitoring
|
2020-10-09 18:44:07 +02:00
|
|
|
|
the funding and research impact and the uptake of Open Science publishing practices,
|
|
|
|
|
such as Open Access publishing of publications and datasets, availability of interlinks between
|
|
|
|
|
research
|
|
|
|
|
products, availability of post-print versions in institutional or thematic Open Access repositories,
|
|
|
|
|
etc.
|
|
|
|
|
</p>
|
|
|
|
|
</div>
|
2020-10-09 18:37:06 +02:00
|
|
|
|
</div>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
</li>
|
|
|
|
|
</ul>
|
|
|
|
|
</div>
|
2020-10-09 18:37:06 +02:00
|
|
|
|
</div>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
<div class="uk-padding-small uk-margin-top">
|
2020-10-09 18:37:06 +02:00
|
|
|
|
<h6>References</h6>
|
|
|
|
|
<ul class="uk-text-small portal-circle">
|
|
|
|
|
<li>
|
|
|
|
|
<a href="https://aka.ms/msracad" target="_blank">Microsoft Academic Graph</a>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
which is made available under the ODC Attribution License.<br>
|
2020-10-09 18:37:06 +02:00
|
|
|
|
For more information on Microsoft Academic Graph please also read
|
|
|
|
|
<a href="https://docs.microsoft.com/en-us/academic-services/graph/resources-faq" target="_blank">here</a>.
|
|
|
|
|
</li>
|
|
|
|
|
<li>
|
|
|
|
|
<a href="https://www.openaire.eu/aggregation-and-content-provision-workflows" target="_blank">https://www.openaire.eu/aggregation-and-content-provision-workflows</a>
|
|
|
|
|
</li>
|
|
|
|
|
</ul>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<a class="portal-link uk-icon-link uk-text-small uk-text-bold uk-text-uppercase" routerLink="/resources/references">
|
|
|
|
|
See all references <icon name="arrow_right" class="uk-margin-small-left"></icon>
|
|
|
|
|
</a>
|
2020-10-09 18:37:06 +02:00
|
|
|
|
</div>
|
2020-10-09 01:27:04 +02:00
|
|
|
|
</div>
|
2020-10-09 11:35:22 +02:00
|
|
|
|
</div>
|
|
|
|
|
<div id="metrics" class="uk-container uk-container-large uk-section">
|
2020-10-09 18:37:06 +02:00
|
|
|
|
<div class="uk-padding-small">
|
2021-04-16 15:14:42 +02:00
|
|
|
|
<h2 class="uk-text-center uk-margin-medium-bottom">Data & Metrics</h2>
|
|
|
|
|
<numbers colorClass="uk-text-secondary"></numbers>
|
2020-10-09 01:27:04 +02:00
|
|
|
|
</div>
|
2020-10-09 11:35:22 +02:00
|
|
|
|
</div>
|
2020-10-11 03:22:06 +02:00
|
|
|
|
<div id="infrastructure" class="uk-container uk-section">
|
2020-10-09 18:37:06 +02:00
|
|
|
|
<div class="uk-padding-small">
|
|
|
|
|
<h2 class="uk-text-center">Infrastructure</h2>
|
2020-10-09 18:44:07 +02:00
|
|
|
|
<div>
|
2020-10-11 03:22:06 +02:00
|
|
|
|
<div class="uk-flex uk-flex-center uk-grid uk-grid-stack">
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<p class="uk-width-4-5@m uk-padding-small">
|
|
|
|
|
The OpenAIRE Research Graph is operated and maintained at the <a
|
2021-02-16 18:55:35 +01:00
|
|
|
|
href="https://icm.edu.pl/en/centre-of-technology/" target="_blank">ICM cutting-edge Technology centre</a>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
with the facilities and staff guaranteeing robust operation of the whole system.
|
|
|
|
|
Okeanos SuperComputer hosting the graph consists of 26016 cores in total providing 1082 Tflops/s.
|
|
|
|
|
Whole setup is energy efficient with 1.554 Gflops/Watts Power Efficiency resulting in 160th place on the
|
|
|
|
|
"Top500 by energy-eficiency" list (as of 2019).
|
|
|
|
|
</p>
|
|
|
|
|
<img class="infrastructure-image uk-margin-top uk-margin-bottom"
|
|
|
|
|
src="assets/graph-assets/about/infrastructure.png">
|
|
|
|
|
<p class="uk-width-4-5@m uk-padding-small">
|
|
|
|
|
ICM supports the continuous operation of the infrastructure including data aggregation, deduplication,
|
|
|
|
|
inference and provision ensuring seamless 24/7 system uptime and availability.
|
|
|
|
|
System administration activities cover hardware maintenance and provisioning of the new computational
|
|
|
|
|
resources, providing High Availability solutions to address resilience to failures by service-level
|
|
|
|
|
redundancy and Load Balancing to distribute workloads uniformly across servers.
|
|
|
|
|
The most crucial parts of the persisted graph are covered with backups along with well defined restore
|
|
|
|
|
procedures.
|
|
|
|
|
All the monitoring activities rely on an aggregated system-level monitoring accessible via various
|
|
|
|
|
dashboards giving the better overview of system stability and potential requirements for system elements
|
|
|
|
|
extension.
|
|
|
|
|
System level monitoring is supplemented with monitoring availability of all the publicly accessible
|
|
|
|
|
endpoints.
|
|
|
|
|
Hence, the offer of the public API of OpenAIRE to third parties, is of high-standards.
|
|
|
|
|
</p>
|
|
|
|
|
<p class="uk-width-4-5@m uk-padding-small">
|
|
|
|
|
All the maintenance operations undertaken by experienced system administrators are founded on well
|
|
|
|
|
established routines and emergency maintenance procedures.
|
|
|
|
|
</p>
|
2020-10-09 18:37:06 +02:00
|
|
|
|
</div>
|
2020-10-09 01:27:04 +02:00
|
|
|
|
</div>
|
|
|
|
|
</div>
|
2020-10-09 11:35:22 +02:00
|
|
|
|
</div>
|
|
|
|
|
<div id="team" class="uk-container uk-container-large uk-section">
|
2020-10-09 18:37:06 +02:00
|
|
|
|
<div class="uk-padding-small">
|
|
|
|
|
<h2 class="uk-text-center">Team</h2>
|
|
|
|
|
<div>
|
|
|
|
|
<div class="uk-margin-bottom">
|
2020-10-11 03:22:06 +02:00
|
|
|
|
<img class="uk-align-center uk-align-left@m uk-margin-remove-adjacent"
|
|
|
|
|
src="assets/graph-assets/about/team.svg" alt="Team">
|
|
|
|
|
|
|
|
|
|
<div class="uk-text-center uk-width-1-2@m uk-align-center uk-margin-remove-adjacent">
|
|
|
|
|
<div class="uk-margin-medium-bottom">
|
|
|
|
|
Key team members contributing to the Research Graph
|
|
|
|
|
</div>
|
2021-04-15 16:39:34 +02:00
|
|
|
|
<div><a class="uk-button portal-button" routerLink="./team">
|
2020-10-11 03:22:06 +02:00
|
|
|
|
Meet the team
|
2020-10-13 15:25:18 +02:00
|
|
|
|
<icon name="arrow_right" ratio="0.8" class="space"></icon>
|
2020-10-11 03:22:06 +02:00
|
|
|
|
</a>
|
2020-10-09 18:37:06 +02:00
|
|
|
|
</div>
|
2020-10-09 01:27:04 +02:00
|
|
|
|
</div>
|
|
|
|
|
</div>
|
|
|
|
|
</div>
|
|
|
|
|
</div>
|
2020-10-09 11:35:22 +02:00
|
|
|
|
</div>
|
2020-10-09 12:43:48 +02:00
|
|
|
|
</div>
|