code-infrastructure-lab/README.md

70 lines
3.4 KiB
Markdown
Raw Normal View History

2024-04-19 15:25:44 +02:00
# Code Infrastructure Lab
This module defines configurations for creating a Kubernetes cluster for generating the OpenAIRE Research Graph.
## Cluster definition
The Kubernetes cluster will include some essential services for testing OpenAIRE Graph generation:
- Storage: Minio will be used as storage.
- Workflow Orchestrator: Airflow will be used as the workflow orchestrator.
- Processing Framework: Spark-Operator will be used as the processing framework.
### Storage
[Minio](https://min.io/)": is an open-source object storage service that will be used to store the data that is used to generate the intermediate version of the OpenAIRE Research Graph.
### Workflow Orchestrator
[Airflow](https://airflow.apache.org/) is an open-source workflow management platform that will be used to orchestrate the generation of the OpenAIRE Research Graph. Airflow is a powerful and flexible workflow orchestration tool that can be used to automate complex workflows.
### Processing Framework
[Spark-Operator](https://github.com/kubeflow/spark-operator) The Kubernetes Operator for Apache Spark aims to make specifying and running Spark applications as easy and idiomatic as running other workloads on Kubernetes. It uses Kubernetes custom resources for specifying, running, and surfacing status of Spark applications. For a complete reference of the custom resource definitions, please refer to the API Definition. For details on its design, please refer to the design doc.
## How to use in local
### Prerequisite
This section outlines the prerequisites for setting up a developer cluster on your local machine using Docker and Kind. A developer cluster is a small-scale Kubernetes cluster that can be used for developing and testing applications. Docker and Kind are two tools that can be used to create and manage Kubernetes clusters locally.
Before you begin, you will need to install the following software on your local machine:
1. Docker: Docker is a containerization platform that allows you to run applications in isolated environments. You can download Docker from https://docs.docker.com/get-docker/.
2. Kind: Kind is a tool for creating local Kubernetes clusters using Docker container "nodes". You can download Kind from https://kind.sigs.k8s.io/docs/user/quick-start/.
2024-04-19 16:32:51 +02:00
3. Terraform: Terraform lets you define what your infrastructure looks like in a single configuration file. This file describes things like virtual machines, storage, and networking. Terraform then takes that configuration and provisions (creates) all the resources you need in the cloud, following your instructions.
2024-04-19 15:25:44 +02:00
### Create Kubernetes cluster
For creating kubernetes cluster run the following command
```console
kind create cluster --config clusters/local/kind-cluster-config.yaml
```
2024-05-16 12:30:38 +02:00
this command will generate a cluster named `dnet-data-platform`
2024-04-19 15:25:44 +02:00
Then we create Ingress that is a Kubernetes resource that allows you to manage external access to services running on a cluster (like minio console or sparkUI).
To enable ingress run the command:
```console
2024-05-16 12:30:04 +02:00
kubectl apply --context kind-dnet-data-platform -f ./clusters/local/nginx-kind-deploy.yaml
2024-04-19 15:25:44 +02:00
```
2024-04-19 16:32:51 +02:00
### Define the cluster
- Generate a terraform variable file starting from the file ```local.tfvars.template``` and save as ```local.tfvars```
- Initialize terraform:
```console
terraform init
````
- Create the cluster:
```console
terraform apply -var-file="local.tfvars"
```