# Code Infrastructure Lab This module defines configurations for creating a Kubernetes cluster for generating the OpenAIRE Research Graph. ## Cluster definition The Kubernetes cluster will include some essential services for testing OpenAIRE Graph generation: - Storage: Minio will be used as storage. - Workflow Orchestrator: Airflow will be used as the workflow orchestrator. - Processing Framework: Spark-Operator will be used as the processing framework. ### Storage [Minio](https://min.io/)": is an open-source object storage service that will be used to store the data that is used to generate the intermediate version of the OpenAIRE Research Graph. ### Workflow Orchestrator [Airflow](https://airflow.apache.org/) is an open-source workflow management platform that will be used to orchestrate the generation of the OpenAIRE Research Graph. Airflow is a powerful and flexible workflow orchestration tool that can be used to automate complex workflows. ### Processing Framework [Spark-Operator](https://github.com/kubeflow/spark-operator) The Kubernetes Operator for Apache Spark aims to make specifying and running Spark applications as easy and idiomatic as running other workloads on Kubernetes. It uses Kubernetes custom resources for specifying, running, and surfacing status of Spark applications. For a complete reference of the custom resource definitions, please refer to the API Definition. For details on its design, please refer to the design doc. ## How to use in local ### Prerequisite This section outlines the prerequisites for setting up a developer cluster on your local machine using Docker and Kind. A developer cluster is a small-scale Kubernetes cluster that can be used for developing and testing applications. Docker and Kind are two tools that can be used to create and manage Kubernetes clusters locally. Before you begin, you will need to install the following software on your local machine: 1. Docker: Docker is a containerization platform that allows you to run applications in isolated environments. You can download Docker from https://docs.docker.com/get-docker/. 2. Kind: Kind is a tool for creating local Kubernetes clusters using Docker container "nodes". You can download Kind from https://kind.sigs.k8s.io/docs/user/quick-start/. 3. Terraform: Terraform lets you define what your infrastructure looks like in a single configuration file. This file describes things like virtual machines, storage, and networking. Terraform then takes that configuration and provisions (creates) all the resources you need in the cloud, following your instructions. ### Create Kubernetes cluster For creating kubernetes cluster run the following command ```console kind create cluster --config clusters/local/kind-cluster-config.yaml ``` this command will generate a cluster named `dnet-data-platform` Then we create Ingress that is a Kubernetes resource that allows you to manage external access to services running on a cluster (like minio console or sparkUI). To enable ingress run the command: ```console kubectl apply --context kind-dnet-data-platform -f ./clusters/local/nginx-kind-deploy.yaml ``` ### Define the cluster - Generate a terraform variable file starting from the file ```local.tfvars.template``` and save as ```local.tfvars``` - Initialize terraform: ```console terraform init ```` - Create the cluster: ```console terraform apply -var-file="local.tfvars" ```