dnet-hadoop/dhp-workflows/dhp-aggregation
sandro.labruzzo a1297082e2 Crossref Enhancements:
-Accurate Review Type Assignment: Resolved an issue identified in ticket https://support.openaire.eu/issues/9525#note-13. When a relationship of "is-review-of" is detected, the publication type is now correctly set to "Review."
-Enhanced Author Affiliation Data: Implemented Miriam's suggestion by including a new field, "RawAffiliationString," in each author entry. This additional data provides a more granular level of detail regarding author affiliations, potentially improving discoverability and research analysis.
2024-11-19 14:57:18 +01:00
..
src Crossref Enhancements: 2024-11-19 14:57:18 +01:00
.scalafmt.conf [stats-wf]fixed the result_result table related to PR#191 2022-02-04 14:51:25 +01:00
README.md README.md for aggregation workflows 2021-03-03 16:18:34 +01:00
pom.xml Use scala.binary.version property to resolve scala maven dependencies 2023-07-24 11:13:48 +02:00

README.md

Description of the Module

This module defines a set of oozie workflows for the collection and transformation of metadata records. Both workflows interact with the Metadata Store Manager (MdSM) to handle the logical transactions required to ensure the consistency of the read/write operations on the data as the MdSM in fact keeps track of the logical-physical mapping of each MDStore.

Metadata collection

The metadata collection workflow is responsible for harvesting metadata records from different protocols and responding to different formats and to store them as on HDFS so that they can be further processed.

Collector Plugins

Different protocols are managed by dedicated Collector plugins, i.e. java programs implementing a defined interface:

eu.dnetlib.dhp.collection.plugin.CollectorPlugin

The list of the supported plugins:

  • OAI Plugin: collects from OAI-PMH compatible endpoints
  • MDStore plugin: collects from a given D-Net MetadataStore, (identified by moogodb URI, dbName, MDStoreID)
  • MDStore dump plugin: collects from an MDStore dump stored on the HDFS location indicated by the path parameter

Transformation Plugins

TODO