dnet-hadoop/dhp-workflows/dhp-aggregation
Claudio Atzori 9c84e21b87 added workflow to migrate latest version of each actionset content from DM to OCEAN cluster, mapping the targetValues from the old protobuf data model to the dhp.OAF datamodel 2020-03-13 15:56:52 +01:00
..
src added workflow to migrate latest version of each actionset content from DM to OCEAN cluster, mapping the targetValues from the old protobuf data model to the dhp.OAF datamodel 2020-03-13 15:56:52 +01:00
README.md dhp-collection-worker integrated in dhp-workflows 2019-10-24 11:36:59 +02:00
pom.xml added workflow to migrate latest version of each actionset content from DM to OCEAN cluster, mapping the targetValues from the old protobuf data model to the dhp.OAF datamodel 2020-03-13 15:56:52 +01:00

README.md

Description of the Module

This module defines a collector worker application that runs on Hadoop.

It is responsible for harvesting metadata using different plugins.

The collector worker uses a message queue to inform the progress of the harvesting action (using a message queue for sending ONGOING messages) furthermore, It gives, at the end of the job, some information about the status of the collection i.e Number of records collected(using a message queue for sending REPORT messages).

To work the collection worker need some parameter like:

  • hdfsPath: the path where storing the sequential file
  • apidescriptor: the JSON encoding of the API Descriptor
  • namenode: the Name Node URI
  • userHDFS: the user wich create the hdfs seq file
  • rabbitUser: the user to connect with RabbitMq for messaging
  • rabbitPassWord: the password to connect with RabbitMq for messaging
  • rabbitHost: the host of the RabbitMq server
  • rabbitOngoingQueue: the name of the ongoing queue
  • rabbitReportQueue: the name of the report queue
  • workflowId: the identifier of the dnet Workflow

##Plugins

  • OAI Plugin

Usage

TODO