-
openaire-solr-docker-2.1.0 Stable
released this
2026-04-15 16:20:31 +02:00 | 7 commits to main since this releaseSummary
This release introduces improvements in the Solr Docker setup and importer tooling, focusing on runtime consistency, documentation clarity, dependency updates, and operational support scripts.
Key changes
Docker and runtime environment
- Switched the base image from
openjdk:17-bullseyetoeclipse-temurin:17-jdk, improving long-term JDK support and consistency with current OpenAIRE runtime standards. - Updated Zookeeper images in
docker-compose.ymlfrom3.6.2to3.9.2across all nodes, ensuring compatibility with newer Solr 9 deployments.
Documentation improvements
- Cleaned up and clarified the
README.md:- Fixed build instructions and directory paths.
- Clarified that no sample data is shipped with the project.
- Explicitly documented the expected input data layout inside the container.
- Improved explanations around Docker Compose startup and container lifecycle.
- Expanded guidance on how to interact with Solr from outside the Docker network, including rationale and client selection.
Solr client interaction
- Updated documentation to explicitly recommend
LBHttpSolrClientfor external access to the Dockerized Solr cluster, avoiding Zookeeper-based discovery issues when connecting from outside the internal Docker network. - Referenced the existing unit test as a concrete example of this interaction pattern.
Importer versioning and startup feedback
- Updated the
openaire-solr-importerMaven coordinates and snapshot version to reflect the current development line. - Added a clear startup message in
init_solr.shto signal when the import application has started.
Operational scripts for cluster management
- Introduced new administrative scripts under
openaire-solr-importer/resources/scripts:add_replicas.sh: a utility to dynamically add replicas per shard, with logic to:- Balance replicas across nodes,
- Avoid placing replicas on leader nodes or the same physical host,
- Support dry-run execution.
hostmap.conf: a mapping file that associates Solr nodes with physical hosts, enabling safer and more balanced replica placement strategies.
Schema changes
- Significantly reduced and streamlined the managed Solr schema by removing unused or redundant definitions, resulting in a leaner and more maintainable
managed-schema.xml.
The ngramtext field type indexes tokens as ngrams with maxGramSize=10.
Tokens longer than 10 characters (e.g. "characterisation", "TopStringDT") are silently truncated at index time, producing no match when the full token is passed at query time. From the helpdesk ticket #7147):- project title searches failing for words longer than 10 characters
- project acronym searches failing for acronyms longer than 10 characters
Existing field definitions are not modified to avoid breaking unknown dependents of the current ngramtext behaviour.
New field types:
ngramtext_long: identical tongramtextwith maxGramSize raised to 30, for substring matching on long tokenskeyword_ci: keyword tokenizer + lowercase + trim, for exact case-insensitive matching on atomic strings such as acronyms
New companion fields (populated via copyField, no mapping layer changes):
projecttitle_ngram(ngramtext_long) <-projecttitleprojectacronym_ci(keyword_ci) <-projectacronym
The Graph API query layer must be updated to search across both the original field and its companion using qf= in an edismax query:
- title searches: qf=projecttitle projecttitle_ngram
- acronym searches: qf=projectacronym projectacronym_ci
A full re-index of the public collection is required after this change.
Overall impact
These changes do not alter indexing semantics but improve operational robustness, maintainability, and clarity. The branch now provides:
- A more future-proof runtime stack,
- Clearer setup and usage instructions,
- Better support for Solr cluster operations,
- A cleaner and easier-to-evolve schema definition.
This makes the 2.1.0 release more aligned with current Solr 9 operational practices and easier to deploy and manage in both development and production environments.
Downloads
- Switched the base image from