
321 lines
26 KiB
Executable File

<!DOCTYPE html>
<html lang="en-gb" dir="ltr" vocab="">
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<meta name="description" content="OpenAIRE API documentation, OAI-PMH, open access, research, scientific publication, European Commission, EC, FP7, ERC, Horizon 2020, H2020, search, projects "/>
<link href="assets/common-assets/logo-small-graph.png">
<link rel="apple-touch-icon" sizes="57x57" href="assets/common-assets/logo/apple-icon-57x57.png">
<link rel="apple-touch-icon" sizes="60x60" href="assets/common-assets/logo/apple-icon-60x60.png">
<link rel="apple-touch-icon" sizes="72x72" href="assets/common-assets/logo/apple-icon-72x72.png">
<link rel="apple-touch-icon" sizes="76x76" href="assets/common-assets/logo/apple-icon-76x76.png">
<link rel="apple-touch-icon" sizes="114x114" href="assets/common-assets/logo/apple-icon-114x114.png">
<link rel="apple-touch-icon" sizes="120x120" href="assets/common-assets/logo/apple-icon-120x120.png">
<link rel="apple-touch-icon" sizes="144x144" href="assets/common-assets/logo/apple-icon-144x144.png">
<link rel="apple-touch-icon" sizes="152x152" href="assets/common-assets/logo/apple-icon-152x152.png">
<link rel="apple-touch-icon" sizes="180x180" href="assets/common-assets/logo/apple-icon-180x180.png">
<link rel="icon" type="image/png" sizes="192x192" href="assets/common-assets/logo/android-icon-192x192.png">
<link rel="icon" type="image/png" sizes="32x32" href="assets/common-assets/logo/favicon-32x32.png">
<link rel="icon" type="image/png" sizes="96x96" href="assets/common-assets/logo/favicon-96x96.png">
<link rel="icon" type="image/png" sizes="16x16" href="assets/common-assets/logo/favicon-16x16.png">
<link href="assets/common-assets/logo/favicon.ico" rel="shortcut icon" type="image/"/>
<title>OpenAIRE API documentation - Dumps of the OpenAIRE Research Graph</title>
<script src="./assets/common-assets/jquery/jquery.js"></script>
<script src="./assets/uikit.js"></script>
<script src="./assets/uikit-icon-max.js"></script>
<!-- Matomo -->
<script type="text/javascript">
var _paq = window._paq || [];
/* tracker methods like "setCustomDimension" should be called before "trackPageView" */
(function() {
var u="//";
_paq.push(['setTrackerUrl', u+'matomo.php']);
_paq.push(['setSiteId', '470']);
var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
g.type='text/javascript'; g.async=true; g.defer=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
<link rel="stylesheet" type="text/css" href="./assets/common-assets/common/theme.css">
<link rel="stylesheet" type="text/css" href="./assets/common-assets/common/custom.css">
<link rel="stylesheet" type="text/css" href="./assets/common-assets/library.css">
<link rel="stylesheet" type="text/css" href="./assets/develop-custom.css">
<body class="graphApp">
<div class="uk-offcanvas-content uk-height-viewport">
<!-- Header menu STARTS here-->
<div id="headerMobile" class="tm-header-mobile uk-hidden@m"></div>
<div id="header" class="tm-header uk-visible@m tm-header-transparent" uk-header=""></div>
<!-- Header menu ENDS here-->
<div class="first_page_section uk-section-default uk-section uk-padding-remove-vertical">
<div class="first_page_banner_headline uk-grid-collapse uk-flex-middle uk-margin-remove-vertical uk-grid" uk-grid=""></div>
<!-- Page content STARTS here -->
<div class=" uk-section tm-middle custom-main-content" id="tm-main">
<div class="uk-margin-large-left uk-margin-medium-bottom">
<ul class="uk-breadcrumb">
<a href="/">Home</a>
<a href="overview.html">API & Resources</a>
<a href="graph-dumps.html">Bulk Access</a>
<span>OpenAIRE Research Graph Dumps (old documentation)</span>
<div class="uk-container">
<div class="uk-grid">
<!-- Side menu & content -->
<div class="uk-width-1-3@s uk-width-1-4@m uk-width-1-4@l uk-visible@m" >
<ul class="uk-nav-default uk-nav-parent-icon" uk-nav="multiple: false" uk-sticky="offset: 100; media:@s" >
<li class="uk-parent uk-open">
<a href="#">Bulk Access</a>
<ul class="uk-nav-sub">
<li><a href="./graph-dumps.html">OpenAIRE Research Graph Dumps</a></li>
<li><a href="./oai-pmh.html">OAI-PMH (discontinued)</a></li>
<li><a href="./bulk-projects.html">Bulk access to projects</a></li>
<div class="tm-main uk-width-expand uk-row-first uk-first-column">
<!-- Content GOES HERE-->
<div class="uk-alert-danger" uk-alert>
<h3>Contribute to improve the OpenAIRE Research Graph</h3>
<p>You can explore and test the beta release of the OpenAIRE Research Graph via the <a href="">OpenAIRE BETA Explore Portal</a> or via data dumps made available in <a href="">Zenodo</a>. </p>
<p>Help us making the graph ready for its 1st production release by providing your feedback.<br/>
Go to the <a href="">OpenAIRE Research Graph Trello Board</a> to report content quality issues, including missing metadata records, wrong values, mistakes in the detection of duplicates and anything else that looks "weird" or wrong.
<p>Find the complete information about the OpenAIRE Research Graph, how to test it and contribute to improve it on <a href="">our blog</a>.</p>
<h2 class="uk-text-center">OpenAIRE Research Graph Dumps</h2>
<p>The OpenAIRE Research Graph is one of the largest open scholarly record collections worldwide, key in fostering Open Science and establishing its practices in the daily research activities.
Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back in the hands of the scientific community.
<p>Imagine a vast collection of research products all linked together, contextualised and openly available.
For the past ten years OpenAIRE has been working to gather this valuable record. OpenAIRE is pleased to announce the beta release of its Research Graph, a massive collection of metadata and links between
scientific products such as articles, datasets, software, and other research products, entities like organisations, funders, funding streams, projects, communities, and data sources.
<p>As of today, the OpenAIRE Research Graph aggregates around 450Mi metadata records with links collecting from 10,000 data sources trusted by scientists, including repositories registered in <a href="">OpenDOAR</a>, Open Access journals registered in <a href="">DOAJ</a>, <a href="">Crossref</a>, <a href="">Unpaywall</a>, <a href="">ORCID</a> and <a href="">Microsoft Academic Graph</a>.
After cleaning, deduplication, and fine-grained classification processes, they narrow down to ~100Mi publications, ~8Mi datasets, ~200K software research products, 8Mi other products linked together with semantic relations.
More than 10Mi full-texts of Open Access publications are mined by algorithms to enrich metadata records with additional properties and links among research products, funders, projects, communities, and organizations.
Thanks to the mining algorithm, the graph is completed with 480Mi semantic relations.
<p>The OpenAIRE Research graph is available via our <a href="">BETA Explore Portal</a> and you can download it from <a href="">Zenodo</a>.
<h3 class="portal-color">Get the dumps</h3>
<p>The OpenAIRE Research Graph is exported as several dump files available on Zenodo (go to <a href=""><img src="" alt="DOI"></a>), so you can download the parts you are interested into. </p>
<ul class="portal-circle">
<li> <strong>publications</strong>: metadata records about research literature (includes types of publications listed <a href="">here</a>)</li>
<li> <strong>datasets:</strong>: metadata records about research data (includes the subtypes listed <a href="">here</a>)</li>
<li> <strong>software:</strong>: metadata records about research software (includes the subtypes listed <a href="">here</a>)</li>
<li> <strong>orps</strong>: metadata records about research products that cannot be classified as research literature, data or software (includes types of products listed <a href="">here</a>)</li>
<li> <strong>organizations</strong>: metadata records about organizations involved in the research life-cycle, such as universities, research organizations, funders.</li>
<li> <strong>content_providers</strong>: metadata records about providers whose content is available in the OpenAIRE Research Graph. They includes institutional and thematic repositories, journals, aggregators, funders' databases.</li>
<li> <strong>results_by_funder</strong>: metadata records about research results funded by a given funder. Each result includes information about its type (publications, datasets, software or other) and its specific sub-type (check the list of sub-types for <a href="">publications</a>, <a href="">datasets</a>, <a href="">software</a>, and <a href="">other research products</a>). </li>
<p>The up-to-date list of funders available on OpenAIRE BETA can be find <a href="">here on the BETA Explore portal</a>.</p>
<p> In the same <a href="">Zenodo community</a> you can also find the dumps of ScholeXplorer and DOIBoost.</p>
<p>The dumps contain XML records compliant to the <b>OpenAIRE data model</b> and to the <b>oaf metadata format</b> (the same format of the records exported via <a href="./oai-pmh.html">OAI-PMH</a>):</p>
<ul class="portal-circle">
<li><a href="" target="_blank">See the description of the OpenAIRE data model</a></li>
<li><a href="" target="_blank">See the oaf XML schema</a></li>
<li><a href="" target="_blank">See the oaf XML schema documentation (generated via Oxygen XML Editor)</a></li>
<p>Keep reading for instructions on how to consume the dumps.</p>
<h3 class="portal-color">Consume the dumps</h3>
Each dump is a gzipped json file with many lines. Each line is in the form of:
where the <code>body</code> field contains the base64 econding of the compressed XML record. <br/>
In order to get the XMLs you have to:
<li>Unzip the file</li>
<li>Get only the value of the <code>$binary</code> field</li>
<li>Read each line and base64 decode it</li>
<li>Unzip the decoded string</li>
For example, to print the XMLs on the standard output you can run this command on MacOS/Unix/Linux based systems:
<code>gunzip -c file.json.gz | jq '.body."$binary"' -r | while IFS= read -r line; do echo "$line" | base64 --decode | bsdtar -x -O ; done </code><br/>
<ul class="portal-circle">
<li><code>file.json.gz</code> is the name you gave to the downloaded file dump;</li>
<li><code>jq</code> is a command to parse json files. It is not installed by default, but you can easy find it on official repositories. <a href="">Click here for installation instructions</a>.
<li><code>base64</code> and <code>bsdtar</code> are two libraries that are typically pre-installed.</li>
Note that you should decide what to do with it (keep parsing XML inline or store them somewhere).
We suggest to start with few records to test and decide what to do, by adding a <code>head</code> command after the <code>gunzip</code>, like:
<code>gunzip -c file.json.gz | head -n 10 | jq '.body."$binary"' -r | while IFS= read -r line; do echo "$line" | base64 --decode | bsdtar -x -O ; done</code>
<h3 class="portal-color">Cite us</h3>
<p>If you use the OpenAIRE Research Graph for research purposes, please cite it as:<br/>
<i>Manghi, Paolo, Atzori, Claudio, Bardi, Alessia, Shirrwagen, Jochen, Dimitropoulos, Harry, La Bruzzo, Sandro, … Summan, Friedrich. (2019). OpenAIRE Research Graph Dump [Data set]. Zenodo.</i><br/>
If you want to cite a specific version, please follow the suggestion on Zenodo. For the current version (1.0.0-beta), please use: </br>
<i>Manghi, Paolo, Atzori, Claudio, Bardi, Alessia, Shirrwagen, Jochen, Dimitropoulos, Harry, La Bruzzo, Sandro, … Summan, Friedrich. (2019). OpenAIRE Research Graph Dump (Version 1.0.0-beta) [Data set]. Zenodo.</i><br/>
The OpenAIRE Research graph includes data from <a href="">Microsoft Academic Graph</a> (MAG): please acknowledge also MAG following <a href="">this guideline</a>.
<h3 class="portal-color">License</h3>
<p>The OpenAIRE Research Graph is released under CC-BY license.</p>
<p>OpenAIRE is working to produce dumps that only contains metadata records that can be re-distributed with the CC0 license: stay tuned!</p>
<!-- Page content ENDS here -->
<!-- Footer STARTS here -->
<bottom class="footer">
<div class="footer-light-background uk-padding-remove-bottom uk-section uk-section-small">
<div class="uk-container uk-container-small">
<div id="footer#3" class="uk-first-column uk-flex uk-flex-middle uk-grid uk-margin-remove-right">
<div class="uk-text-center uk-width-1-1 uk-width-1-4@m">
<img width="126px" height="30px" alt="OpenAIRE" loading="lazy" class="el-image" src="assets/common-assets/common/Logo_Horizontal_dark_small.png">
<div id="footer#5" class="uk-margin uk-text-left uk-width-expand">
<div class="uk-flex uk-flex-middle">
<img alt="flag black white low" width="50px" height="33px" loading="lazy" style="margin-right: 8px; float: left;" src="assets/common-assets/common/commission.jpg">
<div class="uk-margin-left">
<span style="font-size: 8pt; line-height: 0.7!important;">OpenAIRE has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreements No. 777541 and 101017452</span>
<div class="uk-width-expand@m">
<div id="footer#7" class="uk-text-right@m uk-text-center">
<ul uk-margin="" class="uk-flex-center uk-margin-remove-bottom uk-subnav uk-subnav-divider"></ul>
<div class="uk-margin uk-text-center uk-width-1-1">
<div uk-grid="" class="uk-child-width-auto uk-flex-center uk-grid uk-grid-small">
<div class="uk-first-column">
<a href="" target="_blank" class="el-link uk-icon-button uk-icon">
<svg width="20" height="20" viewBox="0 0 20 20" xmlns="">
<path d="M11,10h2.6l0.4-3H11V5.3c0-0.9,0.2-1.5,1.5-1.5H14V1.1c-0.3,0-1-0.1-2.1-0.1C9.6,1,8,2.4,8,5v2H5.5v3H8v8h3V10z"></path>
<a href="" target="_blank" class="el-link uk-icon-button uk-icon">
<svg width="20" height="20" viewBox="0 0 20 20" xmlns="">
<path d="M19,4.74 C18.339,5.029 17.626,5.229 16.881,5.32 C17.644,4.86 18.227,4.139 18.503,3.28 C17.79,3.7 17.001,4.009 16.159,4.17 C15.485,3.45 14.526,3 13.464,3 C11.423,3 9.771,4.66 9.771,6.7 C9.771,6.99 9.804,7.269 9.868,7.539 C6.795,7.38 4.076,5.919 2.254,3.679 C1.936,4.219 1.754,4.86 1.754,5.539 C1.754,6.82 2.405,7.95 3.397,8.61 C2.79,8.589 2.22,8.429 1.723,8.149 L1.723,8.189 C1.723,9.978 2.997,11.478 4.686,11.82 C4.376,11.899 4.049,11.939 3.713,11.939 C3.475,11.939 3.245,11.919 3.018,11.88 C3.49,13.349 4.852,14.419 6.469,14.449 C5.205,15.429 3.612,16.019 1.882,16.019 C1.583,16.019 1.29,16.009 1,15.969 C2.635,17.019 4.576,17.629 6.662,17.629 C13.454,17.629 17.17,12 17.17,7.129 C17.17,6.969 17.166,6.809 17.157,6.649 C17.879,6.129 18.504,5.478 19,4.74"></path>
<a href="" target="_blank" class="el-link uk-icon-button uk-icon">
<svg width="20" height="20" viewBox="0 0 20 20" xmlns="">
<path d="M5.77,17.89 L5.77,7.17 L2.21,7.17 L2.21,17.89 L5.77,17.89 L5.77,17.89 Z M3.99,5.71 C5.23,5.71 6.01,4.89 6.01,3.86 C5.99,2.8 5.24,2 4.02,2 C2.8,2 2,2.8 2,3.85 C2,4.88 2.77,5.7 3.97,5.7 L3.99,5.7 L3.99,5.71 L3.99,5.71 Z"></path>
<path d="M7.75,17.89 L11.31,17.89 L11.31,11.9 C11.31,11.58 11.33,11.26 11.43,11.03 C11.69,10.39 12.27,9.73 13.26,9.73 C14.55,9.73 15.06,10.71 15.06,12.15 L15.06,17.89 L18.62,17.89 L18.62,11.74 C18.62,8.45 16.86,6.92 14.52,6.92 C12.6,6.92 11.75,7.99 11.28,8.73 L11.3,8.73 L11.3,7.17 L7.75,7.17 C7.79,8.17 7.75,17.89 7.75,17.89 L7.75,17.89 L7.75,17.89 Z"></path>
<a href="" target="_blank" class="el-link uk-icon-button uk-icon">
<svg width="20" height="20" viewBox="0 0 20 20" xmlns="">
<line fill="none" stroke="#000" stroke-width="1.1" x1="13.4" y1="14" x2="6.3" y2="10.7"></line>
<line fill="none" stroke="#000" stroke-width="1.1" x1="13.5" y1="5.5" x2="6.5" y2="8.8"></line>
<circle fill="none" stroke="#000" stroke-width="1.1" cx="15.5" cy="4.6" r="2.3"></circle>
<circle fill="none" stroke="#000" stroke-width="1.1" cx="15.5" cy="14.8" r="2.3"></circle>
<circle fill="none" stroke="#000" stroke-width="1.1" cx="4.5" cy="9.8" r="2.3"></circle>
<a href="" target="_blank" class="el-link uk-icon-button uk-icon">
<svg width="20" height="20" viewBox="0 0 20 20" xmlns="">
<path d="M15,4.1c1,0.1,2.3,0,3,0.8c0.8,0.8,0.9,2.1,0.9,3.1C19,9.2,19,10.9,19,12c-0.1,1.1,0,2.4-0.5,3.4c-0.5,1.1-1.4,1.5-2.5,1.6 c-1.2,0.1-8.6,0.1-11,0c-1.1-0.1-2.4-0.1-3.2-1c-0.7-0.8-0.7-2-0.8-3C1,11.8,1,10.1,1,8.9c0-1.1,0-2.4,0.5-3.4C2,4.5,3,4.3,4.1,4.2 C5.3,4.1,12.6,4,15,4.1z M8,7.5v6l5.5-3L8,7.5z"></path>
<a target="_blank" href="" class="el-link newsletter">
<span class="el-title uk-margin uk-text-large"> Newsletter
<span class="el-image uk-icon">
<svg width="20" height="20" viewBox="0 0 20 20" xmlns="">
<circle cx="3.12" cy="16.8" r="1.85"></circle>
<path fill="none" stroke="#000" stroke-width="1.1" d="M1.5,8.2 C1.78,8.18 2.06,8.16 2.35,8.16 C7.57,8.16 11.81,12.37 11.81,17.57 C11.81,17.89 11.79,18.19 11.76,18.5"></path><path fill="none" stroke="#000" stroke-width="1.1" d="M1.5,2.52 C1.78,2.51 2.06,2.5 2.35,2.5 C10.72,2.5 17.5,9.24 17.5,17.57 C17.5,17.89 17.49,18.19 17.47,18.5"></path>
<div class="footer-light-background uk-section uk-section-xsmall">
<div class="uk-container uk-container-expand">
<div uk-grid="" class="uk-grid-margin uk-grid">
<div class="uk-width-small@m uk-first-column"></div>
<div class="uk-width-expand@m">
<div id="footer#22" class="uk-text-small uk-margin uk-margin-remove-bottom uk-text-center@m uk-text-center">
<a href="" rel="license" class="license"> &nbsp;
<svg xmlns="" width="24" height="24" viewBox="0 0 24 24">
<path id="creative-commons" d="M9.7,14.675a1.311,1.311,0,0,1-1.15-.557,2.511,2.511,0,0,1-.391-1.477q0-2.032,1.541-2.034a1.36,1.36,0,0,1,.666.205,1.569,1.569,0,0,1,.605.718l1.541-.8A3.222,3.222,0,0,0,9.457,9.067a3.249,3.249,0,0,0-2.412.964,3.548,3.548,0,0,0-.957,2.61,3.562,3.562,0,0,0,.945,2.63,3.362,3.362,0,0,0,2.485.942,3.367,3.367,0,0,0,1.766-.481,3.408,3.408,0,0,0,1.254-1.326l-1.419-.718a1.44,1.44,0,0,1-1.416.987Zm6.634,0a1.312,1.312,0,0,1-1.15-.557,2.511,2.511,0,0,1-.391-1.477q0-2.032,1.541-2.034a1.389,1.389,0,0,1,.686.205,1.577,1.577,0,0,1,.608.718l1.519-.8a3.181,3.181,0,0,0-3.04-1.663,3.253,3.253,0,0,0-2.412.964,3.546,3.546,0,0,0-.955,2.61,3.576,3.576,0,0,0,.934,2.63,3.349,3.349,0,0,0,2.5.942,3.328,3.328,0,0,0,1.745-.481,3.54,3.54,0,0,0,1.274-1.326l-1.438-.718a1.441,1.441,0,0,1-1.416.987ZM21.156,4.12A11.61,11.61,0,0,0,12.624.64a11.436,11.436,0,0,0-8.44,3.48A11.738,11.738,0,0,0,.641,12.64,11.537,11.537,0,0,0,4.185,21.1a11.532,11.532,0,0,0,8.44,3.541,11.856,11.856,0,0,0,8.592-3.57,11.389,11.389,0,0,0,3.424-8.431,11.583,11.583,0,0,0-3.484-8.52Zm-1.5,15.391a9.631,9.631,0,0,1-7,2.94,9.479,9.479,0,0,1-6.938-2.911A9.422,9.422,0,0,1,2.8,12.64,9.57,9.57,0,0,1,5.747,5.68,9.3,9.3,0,0,1,12.655,2.8a9.4,9.4,0,0,1,6.94,2.88,9.411,9.411,0,0,1,2.884,6.96,9.157,9.157,0,0,1-2.823,6.87Z" transform="translate(-0.641 -0.64)"></path>
<svg xmlns="" width="24" height="24" viewBox="0 0 24 24">
<g id="Group_756" data-name="Group 756" transform="translate(0)">
<path id="Path_2324" data-name="Path 2324" d="M18.325,11.98a.775.775,0,0,0-.775-.775H12.641a.775.775,0,0,0-.775.775v4.909h1.369V22.7h3.719V16.889h1.37V11.98Z" transform="translate(-3.095 -2.951)"></path>
<path id="Path_2325" data-name="Path 2325" d="M17.209,7.759A1.679,1.679,0,1,1,15.53,6.08,1.679,1.679,0,0,1,17.209,7.759Z" transform="translate(-3.529 -1.83)"></path>
<path id="Path_2326" data-name="Path 2326" d="M12.624.64A11.439,11.439,0,0,0,4.183,4.12,11.736,11.736,0,0,0,.639,12.64,11.537,11.537,0,0,0,4.183,21.1a11.531,11.531,0,0,0,8.441,3.54,11.851,11.851,0,0,0,8.591-3.57,11.383,11.383,0,0,0,3.424-8.43,11.582,11.582,0,0,0-3.484-8.52,11.612,11.612,0,0,0-8.53-3.48Zm.03,2.159a9.4,9.4,0,0,1,6.939,2.88,9.414,9.414,0,0,1,2.883,6.96,9.156,9.156,0,0,1-2.823,6.87,9.63,9.63,0,0,1-7,2.94,9.48,9.48,0,0,1-6.939-2.91A9.425,9.425,0,0,1,2.8,12.64,9.573,9.573,0,0,1,5.746,5.68,9.3,9.3,0,0,1,12.654,2.8Z" transform="translate(-0.639 -0.64)"></path>
</a> &nbsp;Unless otherwise indicated, all materials created by OpenAIRE are licenced under
<a href="" rel="license">
<div class="uk-width-small@m">
<div class="uk-margin uk-margin-remove-top uk-margin-remove-bottom uk-text-right@m uk-text-center">
<a href="#" uk-scroll="" class="uk-totop uk-icon">
<svg width="18" height="10" viewBox="0 0 18 10" xmlns="" data-svg="totop">
<polyline fill="none" stroke="#000" stroke-width="1.2" points="1 9 9 1 17 9 "></polyline>
<!-- Footer ENDS here -->