software-versions-processor.../docs/index.rst

5.7 KiB

Software Artifact Processor

This library has been initially designed to deposit software artifacts programmatically in Zenodo.

It is actually a general-purpose library capable of analyzing a list of software artifacts (represented by the metadata) and process them.

The library currently offers two processors:

  • ZenodoExporter: It deposits the software artifact in Zenodo, obtaining a DOI;
  • BibLaTeXExporter: It exports the software artifact in a bib file using the BibLaTeX format.

Other processors can be easily added in the future by extending the SoftwareArtifactProcessor class.

The Core of the Library

The core class of the library is Analyser, which must be initialized with:

  • a configuration (JSON Object);
  • a list of software artifacts described by their metadata (JSON Array).

The configuration must contain:

  • The list of processors to be used and their configuration parameters (required);
  • An optional set of metadata that will be used as default metadata for all software artifacts defined in the list.

Exporter configuration requires at least the elaboration property, which can assume the following values (see ElaborationType enumerated):

  • ALL: The exporter analyses all the software artifacts;
  • UPDATE_ONLY: The exporter analyzes only the software artifact to be updated in the target;
  • NEW: The exporter analyses only the software artifact that does not yet exist in the target;
  • NONE: Does not export the software artifact in the target, but each software artifact is elaborated without effectively exporting it. It is a dry run.

The processors are executed in the order they are defined. A processor could produce metadata itself (e.g., the obtained Zenodo DOI). The input metadata + the metadata generated by a processor are made available to the subsequent processor.

The list of software artifacts contains an arbitrary set of metadata. It depends on the processor using certain metadata, among others.

While analyzing each software artifact, the Analyser links it with the previously elaborated software artifact. Relating an artifact with the previous one is helpful if a processor needs to link them somehow. It is in charge of the processor's logic of connecting them using specific metadata values used as conditions.

The library dynamically calculates the value of the metadata of a software artifact using the following features:

  • A property can contain the value of another property indicated as a variable using the referred property name;
  • The library merges the metadata of the software artifact with the metadata defined in the configuration;
  • The library calculates the final metadata values, replacing the variables only after merging the properties.

The following example shows an example of configuration.

./examples/gcat-doc.json

This JSON contains two properties at the top level:

  • configuration : a JSON Object that contains the processors configuration list mentioned above and a set of properties to be used as default for each artifact;
  • artifacts: a JSON Array containing the artifact's list to be processed.

In this example, the artifacts property comprises different versions of the same software, but different artifacts can be processed together.

As the reader can notice, a property defined can be used as a variable inside the value of another. For example

{
    "title": "gCube Catalogue (gCat) Service {{version}}"
}

The title value will be evaluated while analyzing the artifact.

Please note that there is no right place to define a property. Please remember that the metadata contained in the global configuration is merged with the artifact's metadata. If the same property has been defined, the value of the artifact specification is used (it is more specific).

After the merge, the values containing references to other values are replaced.

For this reason, the title for gcat 1.0.0 will be gCube Catalogue (gCat) Service 1.0.0.

As you can see from the example, the Concept DOI of gcat 1.X.X differs from the concept DOI for gcat 2.X.X This means that Zenodo will have two different concepts, each with different versions.

Moreover, the group for gcat 1.X.X is data-publishing which differs from the default value coming from the global configuration, which is data-catalogue.

ZenodoExporter

At the end of the processing phase, the library produces a file containing:

  • the configuration (JSON Object);
  • the list of software artifacts described by their metadata (JSON Array) with actualized output.

The output of the elaboration is the following.

./examples/gcat.json

The output produced by this processor is quite the same as the input JSON with the exception of the properties concept_doi_url and version_doi_url.

When the library deposits a concept on Zenodo, it creates the concept_doi_url and version_doi_url. The output file can be used in a future run to update the deposit.

BibLaTeXExporter

This processor produces an output file using the computed metadata + the metadata obtained from ZenodoExporter (i.e., version_doi_url).

The format of the output is defined in this template:

../src/main/resources/biblatex.template

Please note that for each entry there are three braces author = {{{author}}}

The first two are printed as they are because of the bib format, and the third is used as a variable container, as done in values of the JSON properties.

The output generated by BibLaTeXExporter is

./examples/gcat.bib