dnet-applications/apps/dhp-mdstore-manager
Michele Artini df91e8c38e Merge branch 'master' into new-is-app 2023-05-03 09:51:52 +02:00
..
src setup for mdstore inspector ui 2023-02-06 16:21:57 +01:00
README.md MDsM README.md 2021-03-03 16:32:08 +01:00
pom.xml [maven-release-plugin] prepare for next development iteration 2023-05-02 13:40:41 +02:00

README.md

dhp-mdstore-manager

A key component in the OpenAIRE aggregation workflows is the Metadata Store Manager (MdSM). It manages the set of metadata record collections resulting from the aggregation processes (MDStore), keeping track of the logical-physical mapping of each MDStore indicating the HDFS location responsible to store each set of records.

Moreover, as the aggregation workflows are intrinsically asynchronous processes, it must ensure the consistency of the records stored within each MDStore. Being HDFS an append-only filesystem (i.e. it does not support updates on existing files as batch operations), the MdSM introduces the concept of MDStore version: each MDStore represents a set of metadata records, where each set is associated to the timestamp relative to its creation (so that they that can be ordered over time). One of its versions is defined as the "current" available for clients to read. Writing a new batch of records in a given MDStore doesn't alter the content available for clients reading from the same MDStore until the write operations are concluded and committed, therefore implementing a transaction (similarly to transactions in RDBMS). Only a predefined amount of versions are kept. From time to time a garbage collection mechanism disposes older MDStore versions, preserving the last N, where N is a configuration parameter defined in the information system (or in the MDdSM itself, yet to be decided).

The MdSM implements a lock mechanism that acts as a semaphore: the lock includes a counter that is incremented by one every time a given MDStore is read, and decrementing it when the read operation is concluded. Locked MDStores (associated to non-zero semaphores) cannot be deleted or garbage collected.

The MdSM implements the following operations

  • GET /mdstores/mdstore/{mdId}/newVersion : Create a new preliminary version of a MDStore, used to begin writing new records.
  • GET /mdstores/version/{versionId}/commit/{size} : Promote a preliminary version to current, used when a process wiring new records has finished.
  • GET /mdstores/version/{versionId}/abort : Abort a preliminary version, used to discard records temporarily store in a new MDStore version.
  • GET /mdstores/mdstore/{mdId}/startReading : Increase the read count of the current MDStore
  • GET /mdstores/version/{versionId}/endReading : Decrease the read count of a MDStore version
  • GET /mdstores/mdstore/{mdId}/versions : Return all the versions of a MDStore
  • DELETE /mdstoremanager/mdstores/mdstore/{mdId} : Delete a MDStore by id
  • DELETE /mdstores/versions/expired : Delete expired versions