About

Open Science is gradually becoming the modus operandi in research practices, affecting the way researchers collaborate and publish, discover, and access scientific knowledge. Scientists are increasingly publishing research results beyond the article, to share all scientific products (metadata and files) generated during an experiment, such as research data, research software, experiments. They publish in scholarly communication data sources (e.g. institutional repositories, data archives, research software repositories), rely where possible on persistent identifiers (e.g. DOI, ORCID, Grid.ac, PDBs), specify semantic links to other research products (e.g. supplementedBy, citedBy, versionOf), and possibly to projects and/or relative funders. By following such practices, scientists are implicitly constructing the Global Open Science Graph, where by "graph" we mean a collection of objects interlinked by semantic relationships.

The OpenAIRE Graph includes metadata and links between scientific products (e.g. literature, research data, research software, and "other research products"), organizations, funders, funding streams, projects, communities, and (provenance) data sources - the details of the graph data model can be found in Zenodo.org.

The Graph is available and obtained as an aggregation of the metadata and links collected from ~70.000 trusted sources, further enriched with metadata and links provided by:

  • OpenAIRE end-users, e.g. researchers, project administrators, data curators providing links from scientific products to projects, funders, communities, or other products;
  • OpenAIRE Full-text mining algorithms over around ~10Mi Open Access article full-texts;
  • Research infrastructure scholarly services, bridged to the graph via OpenAIRE, exposing metadata of products such as research workflows, experiments, research objects, research software, etc..

Data & Metrics

Infrastructure

The OpenAIRE Graph is operated and maintained at the ICM cutting-edge Technology centre with the facilities and staff guaranteeing robust operation of the whole system. Okeanos SuperComputer hosting the graph consists of 26016 cores in total providing 1082 Tflops/s. Whole setup is energy efficient with 1.554 Gflops/Watts Power Efficiency resulting in 160th place on the "Top500 by energy-eficiency" list (as of 2019).

ICM supports the continuous operation of the infrastructure including data aggregation, deduplication, inference and provision ensuring seamless 24/7 system uptime and availability. System administration activities cover hardware maintenance and provisioning of the new computational resources, providing High Availability solutions to address resilience to failures by service-level redundancy and Load Balancing to distribute workloads uniformly across servers. The most crucial parts of the persisted graph are covered with backups along with well defined restore procedures. All the monitoring activities rely on an aggregated system-level monitoring accessible via various dashboards giving the better overview of system stability and potential requirements for system elements extension. System level monitoring is supplemented with monitoring availability of all the publicly accessible endpoints. Hence, the offer of the public API of OpenAIRE to third parties, is of high-standards.

All the maintenance operations undertaken by experienced system administrators are founded on well established routines and emergency maintenance procedures.

Team

Team
Key team members contributing to the OpenAIRE Graph