5.7 KiB
Software Artifact Processor
This library has been initially designed to deposit software artifacts programmatically in Zenodo.
It is actually a general-purpose library capable of analyzing a list of software artifacts (represented by the metadata) and process them.
The library currently offers two processors:
- ZenodoExporter: It deposits the software artifact in Zenodo, obtaining a DOI;
- BibLaTeXExporter: It exports the software artifact
in a
bib
file using the BibLaTeX format.
Other processors can be easily added in the future by extending the
SoftwareArtifactProcessor
class.
The Core of the Library
The core class of the library is Analyser
, which must be
initialized with:
- a configuration (JSON Object);
- a list of software artifacts described by their metadata (JSON Array).
The configuration must contain:
- The list of processors to be used and their configuration parameters (required);
- An optional set of metadata that will be used as default metadata for all software artifacts defined in the list.
Exporter configuration requires at least the elaboration
property, which can assume the following values (see
ElaborationType
enumerated):
- ALL: The exporter analyses all the software artifacts;
- UPDATE_ONLY: The exporter analyzes only the software artifact to be updated in the target;
- NEW: The exporter analyses only the software artifact that does not yet exist in the target;
- NONE: Does not export the software artifact in the target, but each software artifact is elaborated without effectively exporting it. It is a dry run.
The processors are executed in the order they are defined. A processor could produce metadata itself (e.g., the obtained Zenodo DOI). The input metadata + the metadata generated by a processor are made available to the subsequent processor.
The list of software artifacts contains an arbitrary set of metadata. It depends on the processor using certain metadata, among others.
While analyzing each software artifact, the Analyser
links it with the previously elaborated software artifact. Relating an
artifact with the previous one is helpful if a processor needs to link
them somehow. It is in charge of the processor's logic of connecting
them using specific metadata values used as conditions.
The library dynamically calculates the value of the metadata of a software artifact using the following features:
- A property can contain the value of another property indicated as a variable using the referred property name;
- The library merges the metadata of the software artifact with the metadata defined in the configuration;
- The library calculates the final metadata values, replacing the variables only after merging the properties.
The following example shows an example of configuration.
./examples/gcat-doc.json
This JSON contains two properties at the top level:
configuration
: a JSON Object that contains theprocessors
configuration list mentioned above and a set of properties to be used as default for each artifact;artifacts
: a JSON Array containing the artifact's list to be processed.
In this example, the artifacts property comprises different versions of the same software, but different artifacts can be processed together.
As the reader can notice, a property defined can be used as a variable inside the value of another. For example
{
"title": "gCube Catalogue (gCat) Service {{version}}"
}
The title
value will be evaluated while analyzing the
artifact.
Please note that there is no right place to define a property. Please remember that the metadata contained in the global configuration is merged with the artifact's metadata. If the same property has been defined, the value of the artifact specification is used (it is more specific).
After the merge, the values containing references to other values are replaced.
For this reason, the title for gcat 1.0.0 will be
gCube Catalogue (gCat) Service 1.0.0
.
As you can see from the example, the Concept DOI of gcat 1.X.X differs from the concept DOI for gcat 2.X.X This means that Zenodo will have two different concepts, each with different versions.
Moreover, the group for gcat 1.X.X is data-publishing
which differs from the default value coming from the global
configuration, which is data-catalogue
.
ZenodoExporter
At the end of the processing phase, the library produces a file containing:
- the configuration (JSON Object);
- the list of software artifacts described by their metadata (JSON Array) with actualized output.
The output of the elaboration is the following.
./examples/gcat.json
The output produced by this processor is quite the same as the input
JSON with the exception of the properties concept_doi_url
and version_doi_url
.
When the library deposits a concept on Zenodo, it creates the
concept_doi_url
and version_doi_url
. The
output file can be used in a future run to update the deposit.
BibLaTeXExporter
This processor produces an output file using the computed metadata +
the metadata obtained from ZenodoExporter
(i.e.,
version_doi_url
).
The format of the output is defined in this template:
../src/main/resources/biblatex.template
Please note that for each entry there are three braces
author = {{{author}}}
The first two are printed as they are because of the bib format, and the third is used as a variable container, as done in values of the JSON properties.
The output generated by BibLaTeXExporter
is
./examples/gcat.bib