Go to file
Giambattista Bloisi aae37058f7 Increase memory 2024-10-21 20:30:31 +02:00
clusters/local added spark driver image 2024-05-01 16:35:21 +02:00
docker-images/spark-operator Update spark operator version 2024-10-21 09:36:55 +02:00
envs/local added git variable for airflow module 2024-10-16 13:35:49 +02:00
modules Update spark operator version 2024-10-21 09:36:55 +02:00
spark-image Update spark operator version 2024-10-21 09:36:55 +02:00
workflow Increase memory 2024-10-21 20:30:31 +02:00
.gitignore added setup for generating cluster with minio 2024-04-19 15:53:45 +02:00
LICENSE added setup for generating cluster with minio 2024-04-19 15:54:18 +02:00
README.md Update README.md 2024-05-16 12:30:38 +02:00
local.tfvars.template minor fix 2024-05-02 11:51:56 +02:00
main.tf fixed import 2024-10-16 14:08:00 +02:00
providers.tf Update spark operator version 2024-10-21 09:36:55 +02:00
spark-run.yaml removed depenedency to costum spark-operator 2024-05-02 10:33:21 +02:00
variables.tf fixed import 2024-10-16 14:08:00 +02:00

README.md

Code Infrastructure Lab

This module defines configurations for creating a Kubernetes cluster for generating the OpenAIRE Research Graph.

Cluster definition

The Kubernetes cluster will include some essential services for testing OpenAIRE Graph generation:

  • Storage: Minio will be used as storage.
  • Workflow Orchestrator: Airflow will be used as the workflow orchestrator.
  • Processing Framework: Spark-Operator will be used as the processing framework.

Storage

Minio": is an open-source object storage service that will be used to store the data that is used to generate the intermediate version of the OpenAIRE Research Graph.

Workflow Orchestrator

Airflow is an open-source workflow management platform that will be used to orchestrate the generation of the OpenAIRE Research Graph. Airflow is a powerful and flexible workflow orchestration tool that can be used to automate complex workflows.

Processing Framework

Spark-Operator The Kubernetes Operator for Apache Spark aims to make specifying and running Spark applications as easy and idiomatic as running other workloads on Kubernetes. It uses Kubernetes custom resources for specifying, running, and surfacing status of Spark applications. For a complete reference of the custom resource definitions, please refer to the API Definition. For details on its design, please refer to the design doc.

How to use in local

Prerequisite

This section outlines the prerequisites for setting up a developer cluster on your local machine using Docker and Kind. A developer cluster is a small-scale Kubernetes cluster that can be used for developing and testing applications. Docker and Kind are two tools that can be used to create and manage Kubernetes clusters locally.

Before you begin, you will need to install the following software on your local machine:

  1. Docker: Docker is a containerization platform that allows you to run applications in isolated environments. You can download Docker from https://docs.docker.com/get-docker/.

  2. Kind: Kind is a tool for creating local Kubernetes clusters using Docker container "nodes". You can download Kind from https://kind.sigs.k8s.io/docs/user/quick-start/.

  3. Terraform: Terraform lets you define what your infrastructure looks like in a single configuration file. This file describes things like virtual machines, storage, and networking. Terraform then takes that configuration and provisions (creates) all the resources you need in the cloud, following your instructions.

Create Kubernetes cluster

For creating kubernetes cluster run the following command

 kind create cluster --config clusters/local/kind-cluster-config.yaml 

this command will generate a cluster named dnet-data-platform

Then we create Ingress that is a Kubernetes resource that allows you to manage external access to services running on a cluster (like minio console or sparkUI).

To enable ingress run the command:


kubectl apply --context kind-dnet-data-platform -f ./clusters/local/nginx-kind-deploy.yaml

Define the cluster

  • Generate a terraform variable file starting from the file local.tfvars.template and save as local.tfvars
  • Initialize terraform:
    terraform init
    
  • Create the cluster:
     terraform apply -var-file="local.tfvars"