Go to file
Claudio Atzori 7fa2430966 updated readme 2024-11-21 15:17:43 +01:00
openaire-solr-test handle the error upon commit 2024-11-06 09:49:20 +01:00
solr-importer updated pom 2024-11-13 11:22:29 +01:00
.gitignore first commit 2024-10-24 16:51:20 +02:00
LICENSE Initial commit 2024-10-24 16:47:15 +02:00
README.md updated readme 2024-11-21 15:17:43 +01:00

README.md

OpenAIRE SOLR Docker

This project defines a docker-compose based Solr cluster and a content indexing procedure. The project is structured in two main components

  1. The Solr cluster is defined in the docker-compose.yml file and consists (as default) of 3 nodes and uses a 3 nodes zookeeper server.
  2. The openaire-solr-importer. It is a java application responsible for
    1. Configure the Solr cluster, loading the configuration file (schema & solrconfig) onto Zookeeper
    2. Create the data collection
    3. Feed the collection with the input documents, applying the necessary conversions

Step by step guide

  1. Verify docker is installed

For this guide the docker version used is the following

docker --version
Docker version 27.2.0, build 3ab4256

Docker Desktop is a one-click-install application for your Mac, Linux, or Windows environment that lets you build, share, and run containerized applications and microservices.

It provides a straightforward GUI (Graphical User Interface) that lets you manage your containers, applications, and images directly from your machine.

Docker Desktop reduces the time spent on complex setups so you can focus on writing code. It takes care of port mappings, file system concerns, and other default settings, and is regularly updated with bug fixes and security updates.

Get it at https://www.docker.com/products/docker-desktop

  1. Clone this project
git clone https://code-repo.d4science.org/D-Net/openaire-solr-docker.git
  1. Compile the solr-importer Docker image
cd openaire-solr-docker/solr-importer

 docker build . -t openaire/solr-importer --no-cache
[+] Building 58.1s (16/16) FINISHED                                                                 docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                0.0s
 => => transferring dockerfile: 537B                                                                                0.0s
 => [internal] load metadata for docker.io/library/openjdk:17-bullseye                                              1.1s
 => [internal] load metadata for docker.io/library/maven:3.9-eclipse-temurin-17                                     1.7s
 => [internal] load .dockerignore                                                                                   0.0s
 => => transferring context: 2B                                                                                     0.0s
 => [builder 1/4] FROM docker.io/library/maven:3.9-eclipse-temurin-17@sha256:d4f3b77119ae1afcdf00276083416d58fd...  5.5s
 => => resolve docker.io/library/maven:3.9-eclipse-temurin-17@sha256:d4f3b77119ae1afcdf00276083416d58fd9c699294...  0.0s
 => => sha256:818a39a5051cd05f4158c0db68fbf2a5bf732b9b8d5ecde89db2a725a35aa731 9.77kB / 9.77kB                      0.0s
 => => sha256:bc78492c245adcc54dde954f84b79e98d71c8ac11fe26c874e1b1347f88550e2 159B / 159B                          0.6s
 => => sha256:d4f3b77119ae1afcdf00276083416d58fd9c69929400cea3595eba8965b6ae6f 7.86kB / 7.86kB                      0.0s
 => => sha256:004549c9d7e4108958645eea52f71a0e8ba6f18d72dd6dca3a47239ce74704dd 24.16MB / 24.16MB                    1.1s
 => => sha256:b3ad492a84cdc67ade7b4db435c7a511fd80135e165d0ab6683529afad3e9cf6 2.90kB / 2.90kB                      0.0s
 => => sha256:225cb5fd64f3a6f155a1301661e18261d430eba8637a6f7e08c50e2774434caa 143.37MB / 143.37MB                  3.4s
 => => sha256:b2ea5a28f7c23fb8f6b9e31e7612b3c542394162012defbebf1f640ad33a5b06 2.28kB / 2.28kB                      0.7s
 => => sha256:878f62a07109a4bc1c6e84f338d9c7d7472bd4ab79c45e369cc350d1be3b3ba8 22.61MB / 22.61MB                    1.7s
 => => extracting sha256:004549c9d7e4108958645eea52f71a0e8ba6f18d72dd6dca3a47239ce74704dd                           1.0s
 => => sha256:ba63355dfc695379f7f9b29e7583aabdb617a26027ae4b37d6bb782f8bb3f14a 9.17MB / 9.17MB                      1.6s
 => => sha256:30ff8964aee50bcf46c1b6a4b4c9fb180c64e917a793571ba3c5b1d1a4481dd1 849B / 849B                          1.8s
 => => sha256:4af42910298d8a6180fd68a7de28cf581859551fda87acf000b0142af306c8d0 154B / 154B                          1.9s
 => => sha256:b4d1d7ad89e02201ce3dd818bf534d160a5b44d1e53b47e57ed29ccd17d5ab31 360B / 360B                          1.9s
 => => extracting sha256:225cb5fd64f3a6f155a1301661e18261d430eba8637a6f7e08c50e2774434caa                           1.2s
 => => extracting sha256:bc78492c245adcc54dde954f84b79e98d71c8ac11fe26c874e1b1347f88550e2                           0.0s
 => => extracting sha256:b2ea5a28f7c23fb8f6b9e31e7612b3c542394162012defbebf1f640ad33a5b06                           0.0s
 => => extracting sha256:878f62a07109a4bc1c6e84f338d9c7d7472bd4ab79c45e369cc350d1be3b3ba8                           0.7s
 => => extracting sha256:ba63355dfc695379f7f9b29e7583aabdb617a26027ae4b37d6bb782f8bb3f14a                           0.1s
 => => extracting sha256:30ff8964aee50bcf46c1b6a4b4c9fb180c64e917a793571ba3c5b1d1a4481dd1                           0.0s
 => => extracting sha256:b4d1d7ad89e02201ce3dd818bf534d160a5b44d1e53b47e57ed29ccd17d5ab31                           0.0s
 => => extracting sha256:4af42910298d8a6180fd68a7de28cf581859551fda87acf000b0142af306c8d0                           0.0s
 => [internal] load build context                                                                                   0.0s
 => => transferring context: 261.51kB                                                                               0.0s
 => CACHED [stage-1 1/6] FROM docker.io/library/openjdk:17-bullseye@sha256:bd3113dee5dfa55c246067cdb20e5880003ed... 0.0s
 => [stage-1 2/6] RUN apt-get update                                                                                4.1s
 => [stage-1 3/6] RUN apt-get install -y zip                                                                        1.1s
 => [builder 2/4] COPY src /usr/src/app/src                                                                         0.0s
 => [builder 3/4] COPY pom.xml /usr/src/app                                                                         0.0s
 => [builder 4/4] RUN mvn -f /usr/src/app/pom.xml clean package -DskipTests                                        49.6s
 => [stage-1 4/6] COPY --from=builder /usr/src/app/target/*jar-with-dependencies.jar /usr/app/app.jar               0.3s
 => [stage-1 5/6] COPY resources/scripts /scripts/                                                                  0.0s
 => [stage-1 6/6] RUN chmod a+x /scripts/init_solr.sh                                                               0.2s
 => exporting to image                                                                                              0.3s
 => => exporting layers                                                                                             0.3s
 => => writing image sha256:d216745b985f2ce75e5c3b26fe0539707573e9593ef37b89d44c7f8acc9d7bbf                        0.0s
 => => naming to docker.io/openaire/solr-importer                                                                   0.0s

View build details: docker-desktop://dashboard/build/desktop-linux/desktop-linux/wfzudmmkob4s0klc2mrj4pcdl

The solr-importer Docker image is now ready.

Note that the project does not come with any example data, which must be retrieved from the ICM's OCEAN cluster. The import procedure assumes to find the input records in the following path

 tree solr-importer/resources/prod_xml_json
solr-importer/resources/prod_xml_json
├── part-00000-016f0fea-e0d0-4210-a4d6-ece83b3e3e18-c000.json.gz
├── part-00001-016f0fea-e0d0-4210-a4d6-ece83b3e3e18-c000.json.gz
├── part-00003-016f0fea-e0d0-4210-a4d6-ece83b3e3e18-c000.json.gz
└── part-00004-016f0fea-e0d0-4210-a4d6-ece83b3e3e18-c000.json.gz
  1. Start the Solr cluster and the importer
 docker compose up -d
[+] Running 7/7
 ✔ Container zoo2           Started              0.3s
 ✔ Container zoo1           Started              0.3s
 ✔ Container zoo3           Started              0.3s
 ✔ Container solr3          Started              0.6s
 ✔ Container solr1          Started              0.6s
 ✔ Container solr2          Started              0.7s
 ✔ Container solr-importer  Started

Interacting with Solr

The Solr and the zookeeper nodes run inside Docker images and interact sharing a network declared in the docker-compose file named solr. This implies that the internal cluster information known to Zookeeper refers to such a network.

When trying to use the org.apache.solr.client.solrj.impl.CloudSolrClient to interact with it from outside the solr network, it uses the information in Zookeeper to discover the solr nodes, which are however not resolvable from the outside.

The unit test eu.dnetlib.dhp.solr.SolrClientTest makes use of the org.apache.solr.client.solrj.impl.LBHttp2SolrClient to showcase how to query the Solr cluster running inside Docker, without needing to rely on the nodes discovery through Zookeeper.