From 0adb200894cfbd4d0c9597396c9548e1d7f6bfe7 Mon Sep 17 00:00:00 2001 From: Claudio Atzori Date: Mon, 2 Mar 2020 18:07:29 +0100 Subject: [PATCH] Update page 'Data provision workflow' --- Data-provision-workflow.md | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 Data-provision-workflow.md diff --git a/Data-provision-workflow.md b/Data-provision-workflow.md new file mode 100644 index 0000000..3e5e24f --- /dev/null +++ b/Data-provision-workflow.md @@ -0,0 +1,7 @@ +The data provision workflow is a sequence of processing steps aimed at updating the content of the backends serving the OpenAIRE public services. Currently it covers +* Apache Solr fulltext index serving content to [explore.openaire.eu](https://explore.openaire.eu/) and to the [HTTP search API](http://api.openaire.eu/api.html) +* Databases accessed through Apache Impala for calculation of statistics over the Graph +* MongoDB noSQL database serving content to the [OpenAIRE OAI-PMH endpoint](http://api.openaire.eu/oai_pmh) + +This document provides a coarse grained description of the data provision workflow, it is composed of several data movement and manipulation steps: +* Load data from the aggregator and map it to the OCEAN cluster according to the dhp.Oaf model. The procedure freezes the content stored in the aggregation system backends to produce the so called raw graph **G_r**