Open Science is gradually becoming the modus operandi in research practices, affecting the way researchers
collaborate and publish, discover, and access scientific knowledge.
Scientists are increasingly publishing research results beyond the article, to share all scientific
products (metadata and files) generated during an experiment, such as datasets, software, experiments.
They publish in scholarly communication data sources (e.g. institutional repositories, data archives,
software repositories), rely where possible on persistent identifiers (e.g. DOI, ORCID, Grid.ac, PDBs),
specify semantic links to other research products (e.g. supplementedBy, citedBy, versionOf), and possibly
to projects and/or relative funders.
By following such practices, scientists are implicitly constructing the Global Open Science Graph, where
by "graph" we mean a collection of objects interlinked by semantic relationships.
The OpenAIRE Research Graph includes metadata and links between scientific products (e.g. literature,
datasets, software, and "other research products"), organizations, funders, funding streams, projects,
communities, and (provenance) data sources - the details of the graph data model can be found
in Zenodo.org.
The Graph is available and obtained as an aggregation of the metadata and links collected from ~70.000
trusted sources, further enriched with metadata and links provided by:
OpenAIRE collects metadata records from more than 70K scholarly communication sources from all over the world, including Open Access institutional repositories, data archives, journals. All the metadata records (i.e. descriptions of research products) are put together in a data lake, together with records from Crossref, Unpaywall, ORCID, Grid.ac, and information about projects provided by national and international funders. Dedicated inference algorithms applied to metadata and to the full-texts of Open Access publications enrich the content of the data lake with links between research results and projects, author affiliations, subject classification, links to entries from domain-specific databases. Duplicated organisations and results are identified and merged together to obtain an open, trusted, public resource enabling explorations of the scholarly communication landscape like never before.
The aggregation processes are continuously running and apply vocabularies as they are in a given
moment of time.
It could be the case that a vocabulary changes after the aggregation of one data source has
finished,
thus the aggregated content does not reflect the current status of the controlled vocabularies.
In addition, the integration of ScholeXplorer and DOIBooost and some enrichment processes
applied
on the raw
and on the de-duplicated graph may introduce values that do not comply with the current status
of
the OpenAIRE controlled vocabularies.
For these reasons, we included a final step of cleansing at the end of the workflow
materialisation.
The output of the final cleansing step is the final version of the OpenAIRE Research Graph.
The aggregation processes are continuously running and apply vocabularies as they are in a given moment of time.
It could be the case that a vocabulary changes after the aggregation of one data source has finished, thus the aggregated content does not reflect the current status of the controlled vocabularies.
In addition, the integration of ScholeXplorer and DOIBoost and some enrichment processes applied on the raw and on the de-duplicated graph may introduce values that do not comply with the current status of the OpenAIRE controlled vocabularies.
For these reasons, we included a final step of cleansing at the end of the workflow materialisation.
The output of the final cleansing step is the final version of the OpenAIRE Research Graph.
The final version of the OpenAIRE Research Graph is indexed on a Solr server that is used by the OpenAIRE portals (EXPLORE, CONNECT, PROVIDE) and APIs, the latter adopted by several third-party applications and organizations, such as:
The OpenAIRE Research Graph is also processed by a pipeline for extracting the statistics and producing the charts for funders, research initiative, infrastructures, and policy makers that you can see on MONITOR. Based on the information available on the graph, OpenAIRE provides a set of indicators for monitoring the funding and research impact and the uptake of Open Science publishing practices, such as Open Access publishing of publications and datasets, availability of interlinks between research products, availability of post-print versions in institutional or thematic Open Access repositories, etc.
The OpenAIRE Research Graph is operated and maintained at the ICM cutting-edge Technology centre with the facilities and staff guaranteeing robust operation of the whole system. Okeanos SuperComputer hosting the graph consists of 26016 cores in total providing 1082 Tflops/s. Whole setup is energy efficient with 1.554 Gflops/Watts Power Efficiency resulting in 160th place on the "Top500 by energy-eficiency" list (as of 2019).
ICM supports the continuous operation of the infrastructure including data aggregation, deduplication, inference and provision ensuring seamless 24/7 system uptime and availability. System administration activities cover hardware maintenance and provisioning of the new computational resources, providing High Availability solutions to address resilience to failures by service-level redundancy and Load Balancing to distribute workloads uniformly across servers. The most crucial parts of the persisted graph are covered with backups along with well defined restore procedures. All the monitoring activities rely on an aggregated system-level monitoring accessible via various dashboards giving the better overview of system stability and potential requirements for system elements extension. System level monitoring is supplemented with monitoring availability of all the publicly accessible endpoints. Hence, the offer of the public API of OpenAIRE to third parties, is of high-standards.
All the maintenance operations undertaken by experienced system administrators are founded on well established routines and emergency maintenance procedures.