dnet-hadoop/dhp-workflows/dhp-aggregation/README.md

28 lines
1.2 KiB
Markdown
Raw Normal View History

2019-04-03 16:05:16 +02:00
Description of the Module
--------------------------
2021-03-03 16:18:34 +01:00
This module defines a set of oozie workflows for the **collection** and **transformation** of metadata records.
Both workflows interact with the Metadata Store Manager (MdSM) to handle the logical transactions required to ensure
the consistency of the read/write operations on the data as the MdSM in fact keeps track of the logical-physical mapping
of each MDStore.
2019-04-03 16:05:16 +02:00
2021-03-03 16:18:34 +01:00
## Metadata collection
2019-04-03 16:05:16 +02:00
2021-03-03 16:18:34 +01:00
The **metadata collection workflow** is responsible for harvesting metadata records from different protocols and responding to
different formats and to store them as on HDFS so that they can be further processed.
2021-03-03 16:18:34 +01:00
### Collector Plugins
Different protocols are managed by dedicated Collector plugins, i.e. java programs implementing a defined interface:
2019-04-03 16:05:16 +02:00
2021-03-03 16:18:34 +01:00
```eu.dnetlib.dhp.collection.plugin.CollectorPlugin```
2019-04-03 16:05:16 +02:00
2021-03-03 16:18:34 +01:00
The list of the supported plugins:
* OAI Plugin: collects from OAI-PMH compatible endpoints
* MDStore plugin: collects from a given D-Net MetadataStore, (identified by moogodb URI, dbName, MDStoreID)
* MDStore dump plugin: collects from an MDStore dump stored on the HDFS location indicated by the `path` parameter
# Transformation Plugins
TODO
2019-04-03 16:05:16 +02:00