d4science_copernicus_cds/config_auth_cds.ipynb

15 KiB
Raw Permalink Blame History

d4science_copernicus_cds Library Setup and Example

This Jupyter notebook will guide you through setting up all dependencies and configuring the environment to use the d4science_copernicus_cds library. It also provides a comprehensive example of the library's features and capabilities, helping you to manage Climate Data Store (CDS) API authentication and make programmatic requests from the CDS.

The d4science_copernicus_cds library simplifies the authentication process. It prompts for credentials the first time, securely saves them in the workspace, and automatically retrieves them in future sessions—allowing seamless access to CDS data.

Obtain Your API Credentials

To begin, youll need your CDS API credentials. Follow these steps to obtain them:

  1. Register or log in to the CDS at https://cds-beta.climate.copernicus.eu.
  2. Visit https://cds-beta.climate.copernicus.eu/how-to-api and copy the API key provided.

The library will prompt you to enter:

  • URL: The URL field is prefilled; simply press Enter to accept the default.
  • KEY: Insert the obtained API key when prompted, then confirm saving your credentials by pressing "y."

Once saved, the credentials will be automatically loaded in subsequent sessions, so there is no need to re-enter them.


With this setup, youll be ready to explore the full functionality of d4science_copernicus_cds in a BlueCloud JupyterLab environment, where you can seamlessly authenticate and interact with the CDS API across multiple notebooks.

Install Required Dependencies

Before we begin using the d4science_copernicus_cds library to access the Climate Data Store (CDS), we need to install a few dependencies to ensure compatibility and functionality.

Run the following commands to install the necessary packages:

  • cdsapi: The official API client for the Climate Data Store, allowing programmatic data access.
  • attrs and typing_extensions: Required packages to support the latest functionality of cdsapi.
In [ ]:
!pip install -U cdsapi
!pip install -U attrs
!pip install -U typing_extensions

Install the d4science_copernicus_cds Library

Next, install the d4science_copernicus_cds library from the D4Science Git repository. This library will handle authentication for the Climate Data Store (CDS) API in the JupyterLab environment, allowing you to request data seamlessly across multiple notebooks.

Once installed, the library will be ready for use, and you can proceed with authenticating and configuring your environment.

In [ ]:
!pip install -U git+https://code-repo.d4science.org/D4Science/d4science_copernicus_cds.git

Import d4science_copernicus_cds Functions

With the d4science_copernicus_cds library installed, we can now import the main functions for managing CDS API authentication and configuration. These functions provide a range of capabilities for handling credentials and data directories.

In [ ]:
from d4science_copernicus_cds import (
    cds_authenticate,
    cds_get_credentials,
    cds_show_conf,
    cds_save_conf,
    cds_remove_conf,
    cds_remove_env,
    cds_datadir
)

Authenticate with the CDS API

To begin accessing data from the Climate Data Store (CDS), start by running the cds_authenticate() function to initialize authentication:

First-Time Setup

The first time you run this function, it will prompt you to enter your CDS API credentials:

  1. URL: The URL field is prefilled with the default CDS API endpoint. Simply press Enter to accept the default.
  2. KEY: You will need to enter your personal API key. To obtain it:
  3. Saving the Credentials: After entering the key, the function will ask if you want to save the credentials in a hidden configuration file in your workspace. Press "y" to confirm saving, which will allow future sessions to load the credentials automatically.

Subsequent Sessions

Once saved, cds_authenticate() will detect and load the credentials from the environment or configuration file automatically, without requiring further interaction. This setup enables seamless, secure access to the CDS API across sessions.

In [ ]:
client = cds_authenticate()

View Current Configuration

The cds_show_conf() function displays the current configuration settings, including the credentials and any other parameters related to your Climate Data Store (CDS) API setup.

This function will output:

  • Environment-Based Credentials: If credentials are stored in environment variables, they will be displayed here.
  • Saved Configuration File: If a configuration file exists in your workspace, the function will show the credentials and settings retrieved from it.

This display helps verify that your credentials are correctly set up and allows you to check whether they are being loaded from the environment or from a saved configuration file.

In [ ]:
cds_show_conf()

Retrieve CDS API Credentials

The cds_get_credentials() function retrieves your CDS API credentials, returning both the URL and KEY used for authentication

If the credentials are already set in the environment or saved in a configuration file, cds_get_credentials() will load them directly.

Automatic Authentication Check

If no credentials are found, cds_get_credentials() will automatically invoke cds_authenticate() to prompt you for your credentials. This ensures that you don't need to call cds_authenticate() separately beforehand, as cds_get_credentials() will handle it if necessary.

In [ ]:
URL, KEY = cds_get_credentials()
print("URL", URL)
print ("KEY", KEY)

Save CDS API Credentials

The cds_save_conf() function saves your CDS API credentials to a hidden configuration file in your workspace. This setup allows future sessions to load the credentials automatically, so you wont need to re-enter them. When executed, this function:

  • Retrieves your current credentials (if already set in the environment).
  • Prompts you to confirm saving them in a hidden file in your workspace.

Once saved, the credentials will be securely stored and automatically loaded in future sessions, ensuring seamless authentication with the CDS API without requiring additional input.

In [ ]:
cds_save_conf()

Remove Saved Configuration from Workspace

The cds_remove_conf() function removes the saved configuration file from your workspace. This is useful if you want to clear your stored credentials, so future sessions will require re-authentication.

To avoid unintentional execution, this line is commented out by default. Remove the comment symbol (#) to execute

When executed, this function permanently deletes the saved configuration file from your workspace, so the credentials will no longer be automatically loaded. You will be prompted to re-enter them next time you authenticate.

In [ ]:
# cds_remove_conf()

Remove Credentials from Environment Variables

The cds_remove_env() function removes the CDS API credentials from the environment variables. This is helpful if you want to clear the credentials from the current session without affecting any saved configuration files.

To prevent accidental execution, this line is commented out by default. Remove the comment symbol (#) to execute

When executed, this function clears the credentials stored in environment variables. This action will require you to re-authenticate in the current session or any future sessions if there is no saved configuration file.

In [ ]:
# cds_remove_env()

Verify Removal of Credentials with cds_show_conf()

After using cds_remove_env() and cds_remove_conf() to clear your credentials from both the environment and workspace, you can run cds_show_conf() to confirm that all credentials have been removed.

This function will display any remaining credentials in the environment or configuration file. If both cds_remove_env() and cds_remove_conf() have been successfully executed, cds_show_conf() should indicate that no credentials are currently set, confirming that your workspace and environment have been cleared.

In [ ]:
cds_show_conf()

Set or Get the Data Directory with cds_datadir()

The cds_datadir() function sets or retrieves a data directory for saving CDS downloads. Instead of directly using the specified folder name, it appends the provided label to a timestamp-based directory structure, making each directory unique.

For example, to create a timestamped data directory with the label "example", use:

!!!python datadir = cds_datadir("example") !!!

This will create a directory with a timestamped format, such as: /home/jovyan/cds_dataDir/out_2024_11_04_13_58_38_example/

  • Timestamped Directory: The function appends a timestamp to the base directory, followed by the label "example". This ensures that each call to cds_datadir() creates a unique directory, ideal for organizing data downloads by session or task.
  • Custom Labels: Use labels like "example" to organize or categorize downloads. Each call to cds_datadir() with a different label or at a different time will create a new directory, keeping data isolated and organized.

This approach simplifies managing multiple data download sessions and ensures your data files are organized with minimal manual intervention.

In [14]:
datadir = cds_datadir("example")
datadir: %s /home/jovyan/cds_dataDir/out_2024_11_04_13_58_38_example/
In [ ]:
datadir_current = cds_datadir("current_example", basepath="./out")

Set or Get the Data Directory with a Custom Base Path

The cds_datadir() function also allows specifying a custom base path for saving CDS data downloads. This function appends a timestamp and a label to the provided base path, creating a unique, organized directory structure.

For example, to set a custom base path "./out" with the label "current_example", use:

datadir_current = cds_datadir("current_example", basepath="./out")

This will create a directory structure with a timestamped format, such as

./out/out_2024_11_04_13_58_38_current_example/

  • Timestamped Directory: The function automatically adds a timestamp and the provided label to create a unique directory. This is useful for organizing data by session or task.
  • Custom Base Path: By specifying basepath="./out", the data directory will be created within the specified location rather than the default path.

This setup provides flexibility, allowing you to easily organize data downloads across multiple directories and ensuring a clear, timestamped folder structure for each session.

In [15]:
datadir_current = cds_datadir("current_example", basepath="./out")
datadir: %s ./out/out_2024_11_04_14_02_53_current_example/