diff --git a/dhp-workflows/dhp-aggregation/README.md b/dhp-workflows/dhp-aggregation/README.md index e46fdeb16..5ed6a82d7 100644 --- a/dhp-workflows/dhp-aggregation/README.md +++ b/dhp-workflows/dhp-aggregation/README.md @@ -1,16 +1,27 @@ Description of the Module -------------------------- -This module defines a **collector worker application** that runs on Hadoop. +This module defines a set of oozie workflows for the **collection** and **transformation** of metadata records. +Both workflows interact with the Metadata Store Manager (MdSM) to handle the logical transactions required to ensure +the consistency of the read/write operations on the data as the MdSM in fact keeps track of the logical-physical mapping +of each MDStore. -It is responsible for harvesting metadata using different collector plugins and transformation into the common metadata model. +## Metadata collection -# Collector Plugins -* OAI Plugin +The **metadata collection workflow** is responsible for harvesting metadata records from different protocols and responding to +different formats and to store them as on HDFS so that they can be further processed. + +### Collector Plugins + +Different protocols are managed by dedicated Collector plugins, i.e. java programs implementing a defined interface: + +```eu.dnetlib.dhp.collection.plugin.CollectorPlugin``` + +The list of the supported plugins: + +* OAI Plugin: collects from OAI-PMH compatible endpoints +* MDStore plugin: collects from a given D-Net MetadataStore, (identified by moogodb URI, dbName, MDStoreID) +* MDStore dump plugin: collects from an MDStore dump stored on the HDFS location indicated by the `path` parameter # Transformation Plugins TODO - -# Usage -TODO -