Extract a generic base AbstractOpenAireProfile<D> holding the common name field and validation orchestration (validate, maxScore, guideline lookup). Introduce AbstractXMLOpenAIREProfile<Document> (XML-specific helpers) and AbstractJSONOpenAIREProfile<DocumentContext> (shared guidelines list + safeRead). All XML profiles now extend AbstractXMLOpenAIREProfile; all JSON profiles extend AbstractJSONOpenAIREProfile, removing ~200 lines of duplicated boilerplate across CrossrefApiV4Profile, DataCiteApiV4Profile, DcatDataV4Profile and SchemaOrgProfile. All 64 tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| docs | ||
| documentation | ||
| samples | ||
| src | ||
| .gitignore | ||
| LICENSE | ||
| README.md | ||
| build.gradle_old | ||
| notes.txt | ||
| pom.xml | ||
README.md
uoa-validator-engine2
This engine validates metadata records against OpenAIRE's Guidelines. It supports both XML records (OAI-PMH harvested) and JSON documents (e.g. Crossref Works API responses).
It relies on model definitions from the uoa-validator-engine2-result-models software, included as a dependency.
Supported Profiles
XML Profiles (OAI-PMH / DataCite / Dublin Core)
| Profile class | Guideline set | Description |
|---|---|---|
LiteratureGuidelinesV4Profile |
OpenAIRE Literature v4 | Full 32-guideline validation of literature repository records. |
LiteratureGuidelinesV3Profile |
OpenAIRE Literature v3 | Legacy v3 validation. |
FAIR_Literature_GuidelinesV4Profile |
FAIR Literature v4 | FAIR-principle checks derived from the Literature v4 guidelines. |
FAIR_Literature_GuidelinesV3Profile |
FAIR Literature v3 | FAIR-principle checks derived from the Literature v3 guidelines. |
DataArchiveGuidelinesV2Profile |
OpenAIRE Data v2 | Validation of data archive records. |
FAIR_Data_GuidelinesProfile |
FAIR Data | FAIR-principle checks for data records. |
CrisPublicationV111Profile … |
OpenAIRE CRIS v1.1.1 | Eleven CRIS entity profiles (Publication, Project, Person, OrgUnit, etc.). |
JSON Profiles
| Profile class | Source format | Description |
|---|---|---|
CrossrefApiV4Profile |
Crossref Works API | Full 32-guideline OpenAIRE Literature v4 validation for Crossref JSON documents. See details below. |
DataCiteApiV4Profile |
DataCite REST API | Full 32-guideline OpenAIRE Literature v4 validation for DataCite JSON documents. See details below. |
DcatDataV4Profile |
DCAT / JSON-LD | Validation of DCAT dataset metadata (OpenAIRE Data Guidelines v4). |
SchemaOrgProfile |
Schema.org JSON | Minimal schema.org validation (name, author). |
CrossrefApiV4Profile
eu.dnetlib.validator2.validation.json.CrossrefApiV4Profile
A comprehensive JSON validation profile that maps all 32 OpenAIRE Literature v4 guidelines to
Crossref Works API JSON document fields.
It is the JSON counterpart of LiteratureGuidelinesV4Profile.
Guideline mapping
| # | Guideline | Requirement level | Crossref JSON field(s) | Notes |
|---|---|---|---|---|
| 1 | Title | MANDATORY | $.title |
Non-empty string array. |
| 2 | Creator | MANDATORY | $.author (fallback $.editor) |
At least one contributor entry. |
| 3 | Contributor | MANDATORY_IF_APPLICABLE | $.editor, $.contributor |
Validated when present; each entry must have a name. |
| 4 | Funding Reference | MANDATORY_IF_APPLICABLE | $.funder |
Validated when present; each funder must have name. |
| 5 | Alternate Identifier | RECOMMENDED | $.ISSN, $.ISBN, $.alternative-id |
Passes when all three are absent. |
| 6 | Related Identifier | RECOMMENDED | $.relation |
Non-empty relation object when present. |
| 7 | Embargo Period Date | MANDATORY_IF_APPLICABLE | — | Not available in Crossref API; always passes. |
| 8 | Publication Date | MANDATORY | $.issued, $.created, $.published-online, $.published-print |
Year extracted from date-parts. |
| 9 | Language | MANDATORY_IF_APPLICABLE | $.language |
ISO 639-1 string; validated when present. |
| 10 | Publisher | MANDATORY_IF_APPLICABLE | $.publisher |
Non-empty string; validated when present. |
| 11 | Resource Type | MANDATORY | $.type |
Must be a known Crossref type (see vocabulary below). |
| 12 | Description | MANDATORY_IF_APPLICABLE | $.abstract |
Non-empty string; validated when present. |
| 13 | Format | RECOMMENDED | $.link[*].content-type |
At least one MIME-type entry when $.link is present. |
| 14 | Resource Identifier | MANDATORY | $.DOI |
Primary persistent identifier. |
| 15 | Access Rights | MANDATORY_IF_APPLICABLE | — | Not available in Crossref API; always passes. |
| 16 | Source | RECOMMENDED | $.source, $.container-title |
Passes when both are absent. |
| 17 | Subject | MANDATORY_IF_APPLICABLE | $.subject |
Non-empty array; validated when present. |
| 18 | License Condition | RECOMMENDED | $.license[*].URL |
At least one entry with a URL; passes when absent. |
| 19 | Coverage | RECOMMENDED | — | Not available in Crossref API; always passes. |
| 20 | Size | OPTIONAL | — | Not available in Crossref API; always passes. |
| 21 | Geo Location | OPTIONAL | — | Not available in Crossref API; always passes. |
| 22 | Resource Version | RECOMMENDED | — | Not a standard Crossref field; always passes. |
| 23 | File Location | MANDATORY_IF_APPLICABLE | $.link |
Each link must have a non-empty URL; validated when present. |
| 24 | Citation Title | RECOMMENDED | $.container-title |
Validated when present; passes when absent. |
| 25 | Citation Volume | RECOMMENDED | $.volume |
Validated when present; passes when absent. |
| 26 | Citation Issue | RECOMMENDED | $.issue |
Validated when present; passes when absent. |
| 27 | Citation Start Page | RECOMMENDED | $.page |
Extracted as the part before - in start-end format. |
| 28 | Citation End Page | RECOMMENDED | $.page |
Extracted as the part after -; passes if no range present. |
| 29 | Citation Edition | RECOMMENDED | $.edition-number |
Validated when present; passes when absent. |
| 30 | Citation Conference Place | RECOMMENDED | $.event.location |
Validated when $.event is present. |
| 31 | Citation Conference Date | RECOMMENDED | $.event.start |
Year extracted from date-parts; validated when $.event is present. |
| 32 | Audience | OPTIONAL | — | Not available in Crossref API; always passes. |
Crossref type vocabulary (guideline 11)
The $.type value is validated against the official Crossref type vocabulary and mapped to the
corresponding OpenAIRE/COAR resource type category:
| Crossref type(s) | OpenAIRE general type |
|---|---|
journal-article, book, book-chapter, proceedings-article, edited-book, monograph, report, dissertation, peer-review, posted-content, reference-entry, … |
literature |
dataset |
dataset |
software |
software |
component, grant, standard, standard-series, other |
other research product |
Behavior conventions
- RECOMMENDED fields: follow the same "pass when absent, validate when present" convention as
forRecommendedRepeatableElementin the XML profiles — absence does not reduce the score. - MANDATORY_IF_APPLICABLE fields not available in the Crossref API (rows 7, 15) use a
predicate that always returns
true, meaning they are treated as not applicable and always pass. - Access Rights (#15): the OpenAIRE spec designates this as MANDATORY; since the Crossref API does not carry access-rights data, the guideline is modelled as MANDATORY_IF_APPLICABLE (always passes) to avoid penalising valid Crossref records for data their source cannot provide.
- Publication date: the profile accepts any of the four Crossref date fields (
issued,created,published-online,published-print) and requires only that a year can be extracted from thedate-partsarray.
Usage example
import com.jayway.jsonpath.DocumentContext;
import eu.dnetlib.validator2.result_models.ValidationResult;
import eu.dnetlib.validator2.validation.json.CrossrefApiV4Profile;
import eu.dnetlib.validator2.validation.json.JsonUtils;
String crossrefJson = /* fetch from https://api.crossref.org/works/{doi} */;
DocumentContext dc = JsonUtils.parse(crossrefJson);
CrossrefApiV4Profile profile = new CrossrefApiV4Profile();
ValidationResult result = profile.validate("10.1234/example", dc);
System.out.printf("Score: %.1f %%\n", result.getScore());
result.getResults().forEach((name, r) ->
System.out.printf(" %-30s %s\n", name, r.getStatus()));
DataCiteApiV4Profile
eu.dnetlib.validator2.validation.json.DataCiteApiV4Profile
A comprehensive JSON validation profile that maps all 32 OpenAIRE Literature V4 guidelines to
DataCite REST API attributes object fields.
It is the JSON counterpart of LiteratureGuidelinesV4Profile for DataCite-hosted records.
Expected input: pass the
data.attributesobject (not the full API response wrapper) tovalidate().
Because DataCite's metadata schema is the same one used by OpenAIRE, the mapping is nearly
one-to-one. DataCite carries rich information for fields that are absent in Crossref:
Access Rights, Size, Geo Location, Resource Version, Embargo Period dates, and most citation
fields via $.container.
Guideline mapping
| # | Guideline | Requirement level | DataCite attributes field(s) | Notes |
|---|---|---|---|---|
| 1 | Title | MANDATORY | $.titles[*].title |
At least one non-empty title. |
| 2 | Creator | MANDATORY | $.creators[*].name / givenName+familyName |
At least one creator with a name. |
| 3 | Contributor | MANDATORY_IF_APPLICABLE | $.contributors |
Each entry needs name + contributorType. |
| 4 | Funding Reference | MANDATORY_IF_APPLICABLE | $.fundingReferences |
Each entry needs funderName. |
| 5 | Alternate Identifier | RECOMMENDED | $.alternateIdentifiers (v4.5+) / $.identifiers (legacy) |
Passes when absent. |
| 6 | Related Identifier | RECOMMENDED | $.relatedIdentifiers |
Each entry needs relatedIdentifier, relatedIdentifierType, relationType. |
| 7 | Embargo Period Date | MANDATORY_IF_APPLICABLE | $.dates[dateType=Available/Accepted] |
Applicable when Available date exists; both dates must be ISO 8601. |
| 8 | Publication Date | MANDATORY | $.publicationYear or $.dates[dateType=Issued] |
Year integer or ISO 8601 date. |
| 9 | Language | MANDATORY_IF_APPLICABLE | $.language |
Validated when present. |
| 10 | Publisher | MANDATORY_IF_APPLICABLE | $.publisher (string or {name}) |
Supports legacy string and v4.5+ object format. |
| 11 | Resource Type | MANDATORY | $.types.resourceTypeGeneral |
Must be a known DataCite type (see vocabulary below). |
| 12 | Description | MANDATORY_IF_APPLICABLE | $.descriptions[*].description |
At least one non-empty description. |
| 13 | Format | RECOMMENDED | $.formats[*] |
MIME-type strings; passes when absent. |
| 14 | Resource Identifier | MANDATORY | $.doi |
Primary persistent identifier. |
| 15 | Access Rights | MANDATORY | $.rightsList |
Must be present and non-empty (COAR access right entries expected). |
| 16 | Source | RECOMMENDED | $.container.title |
Passes when $.container is absent. |
| 17 | Subject | MANDATORY_IF_APPLICABLE | $.subjects[*].subject |
At least one non-empty subject. |
| 18 | License Condition | RECOMMENDED | $.rightsList[*].rightsUri |
At least one entry with a URI; passes when absent. |
| 19 | Coverage | RECOMMENDED | — | No DataCite equivalent; always passes. |
| 20 | Size | OPTIONAL | $.sizes[*] |
Non-empty string array; validated when present. |
| 21 | Geo Location | OPTIONAL | $.geoLocations[*] |
Each entry must have at least one geo sub-field. |
| 22 | Resource Version | RECOMMENDED | $.version |
Validated when present; passes when absent. |
| 23 | File Location | MANDATORY_IF_APPLICABLE | $.url |
Landing page URL; validated when present. |
| 24 | Citation Title | RECOMMENDED | $.container.title |
Validated when present; passes when absent. |
| 25 | Citation Volume | RECOMMENDED | $.container.volume |
Validated when present; passes when absent. |
| 26 | Citation Issue | RECOMMENDED | $.container.issue |
Validated when present; passes when absent. |
| 27 | Citation Start Page | RECOMMENDED | $.container.firstPage |
Validated when present; passes when absent. |
| 28 | Citation End Page | RECOMMENDED | $.container.lastPage |
Validated when present; passes when absent. |
| 29 | Citation Edition | RECOMMENDED | — | No container edition field; always passes. |
| 30 | Citation Conference Place | RECOMMENDED | — | No dedicated DataCite field; always passes. |
| 31 | Citation Conference Date | RECOMMENDED | — | No dedicated DataCite field; always passes. |
| 32 | Audience | OPTIONAL | — | No DataCite equivalent; always passes. |
DataCite resourceTypeGeneral vocabulary (guideline 11)
| DataCite type(s) | OpenAIRE general type |
|---|---|
JournalArticle, Book, BookChapter, ConferencePaper, ConferenceProceeding, DataPaper, Dissertation, Journal, PeerReview, Preprint, Report, Text, … |
literature |
Dataset, DataPaper |
dataset |
Software, ComputationalNotebook |
software |
Audiovisual, Collection, Event, Image, Instrument, InteractiveResource, Model, OutputManagementPlan, PhysicalObject, Service, Sound, Standard, StudyRegistration, Workflow, Other |
other research product |
Behavior conventions
- RECOMMENDED fields: pass when absent, validate when present (same as Crossref profile).
- Embargo Period Date: applicable only when
$.datescontains adateType=Availableentry; when applicable both Available and Accepted dates must be valid ISO 8601 (YYYY, YYYY-MM, or YYYY-MM-DD). - Access Rights (MANDATORY): requires
$.rightsListto be present and non-empty. Records without rights information fail this guideline. COAR Access Right concept URIs (open access, embargoed access, restricted access, metadata only access) are the expected values in each entry. - License Condition (RECOMMENDED): also reads
$.rightsList, but checks that at least one entry has a non-emptyrightsUri. Passes when$.rightsListis absent. - Publisher: accepts the legacy string format and the v4.5+
{name: ...}object format.
Usage example
import com.jayway.jsonpath.DocumentContext;
import eu.dnetlib.validator2.result_models.ValidationResult;
import eu.dnetlib.validator2.validation.json.DataCiteApiV4Profile;
import eu.dnetlib.validator2.validation.json.JsonUtils;
// Extract attributes from the DataCite API response:
// String attrJson = fullResponse.get("data").get("attributes").toString();
String attrJson = /* data.attributes JSON string */;
DocumentContext dc = JsonUtils.parse(attrJson);
DataCiteApiV4Profile profile = new DataCiteApiV4Profile();
ValidationResult result = profile.validate("10.1234/example", dc);
System.out.printf("Score: %.1f %%\n", result.getScore());
result.getResults().forEach((name, r) ->
System.out.printf(" %-30s %s\n", name, r.getStatus()));
Documentation
- Code Overview: A high-level overview of the most important packages and classes in the project.
- Logic Flow: A detailed description of the validation process, from the entry point to the final result.
- Extending the Engine: A guide on how to create custom rules and profiles.
- Usage Guide: Examples of how to use the validator, both from the command line and programmatically.
Install and run instructions
- Have JDK 8 and maven installed.
- Build with
mvn clean install -U. - Run with
java -jar target/uoa-validator-engine2-<VERSION>.jar.
License
This project is licensed under the Apache License, Version 2.0. See the LICENSE.md file for details.