dnet-applications/apps/dhp-mdstore-manager/README.md

33 lines
2.6 KiB
Markdown

# dhp-mdstore-manager
A key component in the OpenAIRE aggregation workflows is the Metadata Store Manager (MdSM).
It manages the set of metadata record collections resulting from the aggregation processes (MDStore), keeping track of the
logical-physical mapping of each MDStore indicating the HDFS location responsible to store each set of records.
Moreover, as the aggregation workflows are intrinsically asynchronous processes, it must ensure the consistency of the
records stored within each MDStore. Being HDFS an append-only filesystem (i.e. it does not support updates on existing
files as batch operations), the MdSM introduces the concept of MDStore version: each MDStore represents a set of metadata
records, where each set is associated to the timestamp relative to its creation (so that they that can be ordered over time).
One of its versions is defined as the "current" available for clients to read.
Writing a new batch of records in a given MDStore doesn't alter the content available for clients reading from the same
MDStore until the write operations are concluded and committed, therefore implementing a transaction (similarly to
transactions in RDBMS).
Only a predefined amount of versions are kept. From time to time a garbage collection mechanism disposes older MDStore
versions, preserving the last N, where N is a configuration parameter defined in the information system (or in the MDdSM
itself, yet to be decided).
The MdSM implements a lock mechanism that acts as a semaphore: the lock includes a counter that is incremented by one
every time a given MDStore is read, and decrementing it when the read operation is concluded. Locked MDStores
(associated to non-zero semaphores) cannot be deleted or garbage collected.
The MdSM implements the following operations
* GET /mdstores/mdstore/{mdId}/newVersion : Create a new preliminary version of a MDStore, used to begin writing new records.
* GET /mdstores/version/{versionId}/commit/{size} : Promote a preliminary version to current, used when a process wiring new records has finished.
* GET /mdstores/version/{versionId}/abort : Abort a preliminary version, used to discard records temporarily store in a new MDStore version.
* GET /mdstores/mdstore/{mdId}/startReading : Increase the read count of the current MDStore
* GET /mdstores/version/{versionId}/endReading : Decrease the read count of a MDStore version
* GET /mdstores/mdstore/{mdId}/versions : Return all the versions of a MDStore
* DELETE /mdstoremanager/mdstores/mdstore/{mdId} : Delete a MDStore by id
* DELETE /mdstores/versions/expired : Delete expired versions