dockerizing_cassandra/docs/dump.md

4.8 KiB

Documentation: Exporting Data from Existing Cassandra Cluster

This process exports data from an existing Cassandra cluster by creating snapshots on each node and copying the data to a local directory.

The steps ensure a consistent and reliable backup of the keyspace data.

The snapshot creation and data synchronization steps are executed in parallel for all nodes to speed up the process and ensure consistency.

Dump Process

The data dump process involves taking a snapshot of the keyspace from each Cassandra node, copying the snapshots locally, and exporting the keyspace schema. This process is performed in parallel for efficiency.

  1. Clear Old Snapshots:

    • For each node, remove any existing snapshots with the specified tag to ensure a clean state.
  2. Create New Snapshots:

    • For each node, create a new snapshot with the specified tag.
  3. Synchronize Snapshots Locally:

    • Copy the snapshot data from each node to the local directory. Each table's data is copied into a directory named after the table.
  4. Export Keyspace Schema:

    • Export the keyspace schema from the first node and save it locally.

Directory Structure on Server

  • Each table in the keyspace has its own directory.
  • Inside each table's directory, there is a snapshots directory.
  • The snapshots directory contains subdirectories for each snapshot, named according to the snapshot tag.

Local Directory Structure

  • The local directory mirrors the server's structure.
  • Each table's snapshot data is stored in a directory named after the table, inside the local dump directory.

By following this process, a consistent and reliable backup of the Cassandra keyspace data is achieved, ensuring that the data can be restored or migrated as needed.

Directory Structure Example

Server-Side Structure

On the server, the directory structure for the snapshots is organized as follows:

/data
└── dev_keyspace_1
    ├── table1-abc1234567890abcdef1234567890abcdef
    │   └── snapshots
    │       └── dump_docker
    │           ├── manifest.json
    │           ├── nb-1-big-CompressionInfo.db
    │           ├── nb-1-big-Data.db
    │           ├── nb-1-big-Digest.crc32
    │           ├── nb-1-big-Filter.db
    │           ├── nb-1-big-Index.db
    │           ├── nb-1-big-Statistics.db
    │           ├── nb-1-big-Summary.db
    │           └── schema.cql
    ├── table2-def4567890abcdef1234567890abcdef
    │   └── snapshots
    │       └── dump_docker
    │           ├── manifest.json
    │           ├── nb-1-big-CompressionInfo.db
    │           ├── nb-1-big-Data.db
    │           ├── nb-1-big-Digest.crc32
    │           ├── nb-1-big-Filter.db
    │           ├── nb-1-big-Index.db
    │           ├── nb-1-big-Statistics.db
    │           ├── nb-1-big-Summary.db
    │           └── schema.cql
    └── table3-ghi7890abcdef1234567890abcdef
        └── snapshots
            └── dump_docker
                ├── manifest.json
                ├── nb-1-big-CompressionInfo.db
                ├── nb-1-big-Data.db
                ├── nb-1-big-Digest.crc32
                ├── nb-1-big-Filter.db
                ├── nb-1-big-Index.db
                ├── nb-1-big-Statistics.db
                ├── nb-1-big-Summary.db
                └── schema.cql

Local Directory Structure

When copied locally, the directory structure is organized as follows:

data/dumps
├──schema
│   ├── dev_keyspace_1_schema.cql
└── node1
    ├── table1
    │   ├── manifest.json
    │   ├── nb-1-big-CompressionInfo.db
    │   ├── nb-1-big-Data.db
    │   ├── nb-1-big-Digest.crc32
    │   ├── nb-1-big-Filter.db
    │   ├── nb-1-big-Index.db
    │   ├── nb-1-big-Statistics.db
    │   ├── nb-1-big-Summary.db
    │   └── schema.cql
    ├── table2
    │   ├── manifest.json
    │   ├── nb-1-big-CompressionInfo.db
    │   ├── nb-1-big-Data.db
    │   ├── nb-1-big-Digest.crc32
    │   ├── nb-1-big-Filter.db
    │   ├── nb-1-big-Index.db
    │   ├── nb-1-big-Statistics.db
    │   ├── nb-1-big-Summary.db
    │   └── schema.cql
    └── table3
        ├── manifest.json
        ├── nb-1-big-CompressionInfo.db
        ├── nb-1-big-Data.db
        ├── nb-1-big-Digest.crc32
        ├── nb-1-big-Filter.db
        ├── nb-1-big-Index.db
        ├── nb-1-big-Statistics.db
        ├── nb-1-big-Summary.db
        └── schema.cql