Go to file
Antonis Lempesis d754be123c Refactor profile hierarchy: introduce AbstractOpenAireProfile<D>, AbstractXMLOpenAIREProfile, AbstractJSONOpenAIREProfile
Extract a generic base AbstractOpenAireProfile<D> holding the common name field and
validation orchestration (validate, maxScore, guideline lookup). Introduce
AbstractXMLOpenAIREProfile<Document> (XML-specific helpers) and
AbstractJSONOpenAIREProfile<DocumentContext> (shared guidelines list + safeRead).

All XML profiles now extend AbstractXMLOpenAIREProfile; all JSON profiles extend
AbstractJSONOpenAIREProfile, removing ~200 lines of duplicated boilerplate across
CrossrefApiV4Profile, DataCiteApiV4Profile, DcatDataV4Profile and SchemaOrgProfile.
All 64 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-07 00:24:10 +03:00
docs added support for json documents 2026-02-12 15:44:08 +02:00
documentation - Release version 2.3.0 2025-12-22 13:13:02 +02:00
samples initial commit 2022-09-16 13:56:52 +03:00
src Refactor profile hierarchy: introduce AbstractOpenAireProfile<D>, AbstractXMLOpenAIREProfile, AbstractJSONOpenAIREProfile 2026-04-07 00:24:10 +03:00
.gitignore added support for json documents 2026-02-12 15:44:08 +02:00
LICENSE - Release version 2.0.0 2025-02-12 00:09:23 +02:00
README.md Fix requirement levels in JSON profiles after validation against OpenAIRE v4 spec 2026-03-04 15:17:54 +02:00
build.gradle_old Mavenize project 2022-10-03 20:44:58 +03:00
notes.txt initial commit 2022-09-16 13:56:52 +03:00
pom.xml [maven-release-plugin] prepare for next development iteration 2026-03-10 12:24:34 +02:00

README.md

uoa-validator-engine2

Build Status

This engine validates metadata records against OpenAIRE's Guidelines. It supports both XML records (OAI-PMH harvested) and JSON documents (e.g. Crossref Works API responses).

It relies on model definitions from the uoa-validator-engine2-result-models software, included as a dependency.

Supported Profiles

XML Profiles (OAI-PMH / DataCite / Dublin Core)

Profile class Guideline set Description
LiteratureGuidelinesV4Profile OpenAIRE Literature v4 Full 32-guideline validation of literature repository records.
LiteratureGuidelinesV3Profile OpenAIRE Literature v3 Legacy v3 validation.
FAIR_Literature_GuidelinesV4Profile FAIR Literature v4 FAIR-principle checks derived from the Literature v4 guidelines.
FAIR_Literature_GuidelinesV3Profile FAIR Literature v3 FAIR-principle checks derived from the Literature v3 guidelines.
DataArchiveGuidelinesV2Profile OpenAIRE Data v2 Validation of data archive records.
FAIR_Data_GuidelinesProfile FAIR Data FAIR-principle checks for data records.
CrisPublicationV111Profile OpenAIRE CRIS v1.1.1 Eleven CRIS entity profiles (Publication, Project, Person, OrgUnit, etc.).

JSON Profiles

Profile class Source format Description
CrossrefApiV4Profile Crossref Works API Full 32-guideline OpenAIRE Literature v4 validation for Crossref JSON documents. See details below.
DataCiteApiV4Profile DataCite REST API Full 32-guideline OpenAIRE Literature v4 validation for DataCite JSON documents. See details below.
DcatDataV4Profile DCAT / JSON-LD Validation of DCAT dataset metadata (OpenAIRE Data Guidelines v4).
SchemaOrgProfile Schema.org JSON Minimal schema.org validation (name, author).

CrossrefApiV4Profile

eu.dnetlib.validator2.validation.json.CrossrefApiV4Profile

A comprehensive JSON validation profile that maps all 32 OpenAIRE Literature v4 guidelines to Crossref Works API JSON document fields. It is the JSON counterpart of LiteratureGuidelinesV4Profile.

Guideline mapping

# Guideline Requirement level Crossref JSON field(s) Notes
1 Title MANDATORY $.title Non-empty string array.
2 Creator MANDATORY $.author (fallback $.editor) At least one contributor entry.
3 Contributor MANDATORY_IF_APPLICABLE $.editor, $.contributor Validated when present; each entry must have a name.
4 Funding Reference MANDATORY_IF_APPLICABLE $.funder Validated when present; each funder must have name.
5 Alternate Identifier RECOMMENDED $.ISSN, $.ISBN, $.alternative-id Passes when all three are absent.
6 Related Identifier RECOMMENDED $.relation Non-empty relation object when present.
7 Embargo Period Date MANDATORY_IF_APPLICABLE Not available in Crossref API; always passes.
8 Publication Date MANDATORY $.issued, $.created, $.published-online, $.published-print Year extracted from date-parts.
9 Language MANDATORY_IF_APPLICABLE $.language ISO 639-1 string; validated when present.
10 Publisher MANDATORY_IF_APPLICABLE $.publisher Non-empty string; validated when present.
11 Resource Type MANDATORY $.type Must be a known Crossref type (see vocabulary below).
12 Description MANDATORY_IF_APPLICABLE $.abstract Non-empty string; validated when present.
13 Format RECOMMENDED $.link[*].content-type At least one MIME-type entry when $.link is present.
14 Resource Identifier MANDATORY $.DOI Primary persistent identifier.
15 Access Rights MANDATORY_IF_APPLICABLE Not available in Crossref API; always passes.
16 Source RECOMMENDED $.source, $.container-title Passes when both are absent.
17 Subject MANDATORY_IF_APPLICABLE $.subject Non-empty array; validated when present.
18 License Condition RECOMMENDED $.license[*].URL At least one entry with a URL; passes when absent.
19 Coverage RECOMMENDED Not available in Crossref API; always passes.
20 Size OPTIONAL Not available in Crossref API; always passes.
21 Geo Location OPTIONAL Not available in Crossref API; always passes.
22 Resource Version RECOMMENDED Not a standard Crossref field; always passes.
23 File Location MANDATORY_IF_APPLICABLE $.link Each link must have a non-empty URL; validated when present.
24 Citation Title RECOMMENDED $.container-title Validated when present; passes when absent.
25 Citation Volume RECOMMENDED $.volume Validated when present; passes when absent.
26 Citation Issue RECOMMENDED $.issue Validated when present; passes when absent.
27 Citation Start Page RECOMMENDED $.page Extracted as the part before - in start-end format.
28 Citation End Page RECOMMENDED $.page Extracted as the part after -; passes if no range present.
29 Citation Edition RECOMMENDED $.edition-number Validated when present; passes when absent.
30 Citation Conference Place RECOMMENDED $.event.location Validated when $.event is present.
31 Citation Conference Date RECOMMENDED $.event.start Year extracted from date-parts; validated when $.event is present.
32 Audience OPTIONAL Not available in Crossref API; always passes.

Crossref type vocabulary (guideline 11)

The $.type value is validated against the official Crossref type vocabulary and mapped to the corresponding OpenAIRE/COAR resource type category:

Crossref type(s) OpenAIRE general type
journal-article, book, book-chapter, proceedings-article, edited-book, monograph, report, dissertation, peer-review, posted-content, reference-entry, … literature
dataset dataset
software software
component, grant, standard, standard-series, other other research product

Behavior conventions

  • RECOMMENDED fields: follow the same "pass when absent, validate when present" convention as forRecommendedRepeatableElement in the XML profiles — absence does not reduce the score.
  • MANDATORY_IF_APPLICABLE fields not available in the Crossref API (rows 7, 15) use a predicate that always returns true, meaning they are treated as not applicable and always pass.
  • Access Rights (#15): the OpenAIRE spec designates this as MANDATORY; since the Crossref API does not carry access-rights data, the guideline is modelled as MANDATORY_IF_APPLICABLE (always passes) to avoid penalising valid Crossref records for data their source cannot provide.
  • Publication date: the profile accepts any of the four Crossref date fields (issued, created, published-online, published-print) and requires only that a year can be extracted from the date-parts array.

Usage example

import com.jayway.jsonpath.DocumentContext;
import eu.dnetlib.validator2.result_models.ValidationResult;
import eu.dnetlib.validator2.validation.json.CrossrefApiV4Profile;
import eu.dnetlib.validator2.validation.json.JsonUtils;

String crossrefJson = /* fetch from https://api.crossref.org/works/{doi} */;
DocumentContext dc = JsonUtils.parse(crossrefJson);

CrossrefApiV4Profile profile = new CrossrefApiV4Profile();
ValidationResult result = profile.validate("10.1234/example", dc);

System.out.printf("Score: %.1f %%\n", result.getScore());
result.getResults().forEach((name, r) ->
    System.out.printf("  %-30s %s\n", name, r.getStatus()));

DataCiteApiV4Profile

eu.dnetlib.validator2.validation.json.DataCiteApiV4Profile

A comprehensive JSON validation profile that maps all 32 OpenAIRE Literature V4 guidelines to DataCite REST API attributes object fields. It is the JSON counterpart of LiteratureGuidelinesV4Profile for DataCite-hosted records.

Expected input: pass the data.attributes object (not the full API response wrapper) to validate().

Because DataCite's metadata schema is the same one used by OpenAIRE, the mapping is nearly one-to-one. DataCite carries rich information for fields that are absent in Crossref: Access Rights, Size, Geo Location, Resource Version, Embargo Period dates, and most citation fields via $.container.

Guideline mapping

# Guideline Requirement level DataCite attributes field(s) Notes
1 Title MANDATORY $.titles[*].title At least one non-empty title.
2 Creator MANDATORY $.creators[*].name / givenName+familyName At least one creator with a name.
3 Contributor MANDATORY_IF_APPLICABLE $.contributors Each entry needs name + contributorType.
4 Funding Reference MANDATORY_IF_APPLICABLE $.fundingReferences Each entry needs funderName.
5 Alternate Identifier RECOMMENDED $.alternateIdentifiers (v4.5+) / $.identifiers (legacy) Passes when absent.
6 Related Identifier RECOMMENDED $.relatedIdentifiers Each entry needs relatedIdentifier, relatedIdentifierType, relationType.
7 Embargo Period Date MANDATORY_IF_APPLICABLE $.dates[dateType=Available/Accepted] Applicable when Available date exists; both dates must be ISO 8601.
8 Publication Date MANDATORY $.publicationYear or $.dates[dateType=Issued] Year integer or ISO 8601 date.
9 Language MANDATORY_IF_APPLICABLE $.language Validated when present.
10 Publisher MANDATORY_IF_APPLICABLE $.publisher (string or {name}) Supports legacy string and v4.5+ object format.
11 Resource Type MANDATORY $.types.resourceTypeGeneral Must be a known DataCite type (see vocabulary below).
12 Description MANDATORY_IF_APPLICABLE $.descriptions[*].description At least one non-empty description.
13 Format RECOMMENDED $.formats[*] MIME-type strings; passes when absent.
14 Resource Identifier MANDATORY $.doi Primary persistent identifier.
15 Access Rights MANDATORY $.rightsList Must be present and non-empty (COAR access right entries expected).
16 Source RECOMMENDED $.container.title Passes when $.container is absent.
17 Subject MANDATORY_IF_APPLICABLE $.subjects[*].subject At least one non-empty subject.
18 License Condition RECOMMENDED $.rightsList[*].rightsUri At least one entry with a URI; passes when absent.
19 Coverage RECOMMENDED No DataCite equivalent; always passes.
20 Size OPTIONAL $.sizes[*] Non-empty string array; validated when present.
21 Geo Location OPTIONAL $.geoLocations[*] Each entry must have at least one geo sub-field.
22 Resource Version RECOMMENDED $.version Validated when present; passes when absent.
23 File Location MANDATORY_IF_APPLICABLE $.url Landing page URL; validated when present.
24 Citation Title RECOMMENDED $.container.title Validated when present; passes when absent.
25 Citation Volume RECOMMENDED $.container.volume Validated when present; passes when absent.
26 Citation Issue RECOMMENDED $.container.issue Validated when present; passes when absent.
27 Citation Start Page RECOMMENDED $.container.firstPage Validated when present; passes when absent.
28 Citation End Page RECOMMENDED $.container.lastPage Validated when present; passes when absent.
29 Citation Edition RECOMMENDED No container edition field; always passes.
30 Citation Conference Place RECOMMENDED No dedicated DataCite field; always passes.
31 Citation Conference Date RECOMMENDED No dedicated DataCite field; always passes.
32 Audience OPTIONAL No DataCite equivalent; always passes.

DataCite resourceTypeGeneral vocabulary (guideline 11)

DataCite type(s) OpenAIRE general type
JournalArticle, Book, BookChapter, ConferencePaper, ConferenceProceeding, DataPaper, Dissertation, Journal, PeerReview, Preprint, Report, Text, … literature
Dataset, DataPaper dataset
Software, ComputationalNotebook software
Audiovisual, Collection, Event, Image, Instrument, InteractiveResource, Model, OutputManagementPlan, PhysicalObject, Service, Sound, Standard, StudyRegistration, Workflow, Other other research product

Behavior conventions

  • RECOMMENDED fields: pass when absent, validate when present (same as Crossref profile).
  • Embargo Period Date: applicable only when $.dates contains a dateType=Available entry; when applicable both Available and Accepted dates must be valid ISO 8601 (YYYY, YYYY-MM, or YYYY-MM-DD).
  • Access Rights (MANDATORY): requires $.rightsList to be present and non-empty. Records without rights information fail this guideline. COAR Access Right concept URIs (open access, embargoed access, restricted access, metadata only access) are the expected values in each entry.
  • License Condition (RECOMMENDED): also reads $.rightsList, but checks that at least one entry has a non-empty rightsUri. Passes when $.rightsList is absent.
  • Publisher: accepts the legacy string format and the v4.5+ {name: ...} object format.

Usage example

import com.jayway.jsonpath.DocumentContext;
import eu.dnetlib.validator2.result_models.ValidationResult;
import eu.dnetlib.validator2.validation.json.DataCiteApiV4Profile;
import eu.dnetlib.validator2.validation.json.JsonUtils;

// Extract attributes from the DataCite API response:
// String attrJson = fullResponse.get("data").get("attributes").toString();
String attrJson = /* data.attributes JSON string */;
DocumentContext dc = JsonUtils.parse(attrJson);

DataCiteApiV4Profile profile = new DataCiteApiV4Profile();
ValidationResult result = profile.validate("10.1234/example", dc);

System.out.printf("Score: %.1f %%\n", result.getScore());
result.getResults().forEach((name, r) ->
    System.out.printf("  %-30s %s\n", name, r.getStatus()));

Documentation

  • Code Overview: A high-level overview of the most important packages and classes in the project.
  • Logic Flow: A detailed description of the validation process, from the entry point to the final result.
  • Extending the Engine: A guide on how to create custom rules and profiles.
  • Usage Guide: Examples of how to use the validator, both from the command line and programmatically.

Install and run instructions

  • Have JDK 8 and maven installed.
  • Build with mvn clean install -U.
  • Run with java -jar target/uoa-validator-engine2-<VERSION>.jar.

License

This project is licensed under the Apache License, Version 2.0. See the LICENSE.md file for details.