dnet-hadoop/dhp-workflows/dhp-aggregation
Claudio Atzori c267d958d5 [maven-release-plugin] prepare release dhp-1.2.0 2020-05-11 10:17:10 +02:00
..
src improved unit tests in dhp-aggregation 2020-05-05 12:39:04 +02:00
README.md dhp-collection-worker integrated in dhp-workflows 2019-10-24 11:36:59 +02:00
pom.xml [maven-release-plugin] prepare release dhp-1.2.0 2020-05-11 10:17:10 +02:00

README.md

Description of the Module

This module defines a collector worker application that runs on Hadoop.

It is responsible for harvesting metadata using different plugins.

The collector worker uses a message queue to inform the progress of the harvesting action (using a message queue for sending ONGOING messages) furthermore, It gives, at the end of the job, some information about the status of the collection i.e Number of records collected(using a message queue for sending REPORT messages).

To work the collection worker need some parameter like:

  • hdfsPath: the path where storing the sequential file
  • apidescriptor: the JSON encoding of the API Descriptor
  • namenode: the Name Node URI
  • userHDFS: the user wich create the hdfs seq file
  • rabbitUser: the user to connect with RabbitMq for messaging
  • rabbitPassWord: the password to connect with RabbitMq for messaging
  • rabbitHost: the host of the RabbitMq server
  • rabbitOngoingQueue: the name of the ongoing queue
  • rabbitReportQueue: the name of the report queue
  • workflowId: the identifier of the dnet Workflow

##Plugins

  • OAI Plugin

Usage

TODO