diff --git a/Data-provision-workflow.md b/Data-provision-workflow.md new file mode 100644 index 0000000..3e5e24f --- /dev/null +++ b/Data-provision-workflow.md @@ -0,0 +1,7 @@ +The data provision workflow is a sequence of processing steps aimed at updating the content of the backends serving the OpenAIRE public services. Currently it covers +* Apache Solr fulltext index serving content to [explore.openaire.eu](https://explore.openaire.eu/) and to the [HTTP search API](http://api.openaire.eu/api.html) +* Databases accessed through Apache Impala for calculation of statistics over the Graph +* MongoDB noSQL database serving content to the [OpenAIRE OAI-PMH endpoint](http://api.openaire.eu/oai_pmh) + +This document provides a coarse grained description of the data provision workflow, it is composed of several data movement and manipulation steps: +* Load data from the aggregator and map it to the OCEAN cluster according to the dhp.Oaf model. The procedure freezes the content stored in the aggregation system backends to produce the so called raw graph **G_r**