dockerizing_cassandra/docs/dump.md

121 lines
4.8 KiB
Markdown

# Documentation: Exporting Data from Existing Cassandra Cluster
This process exports data from an existing Cassandra cluster by creating snapshots on each node and copying the data to a local directory.
The steps ensure a consistent and reliable backup of the keyspace data.
The snapshot creation and data synchronization steps are executed in parallel for all nodes to speed up the process and ensure consistency.
## Dump Process
The data dump process involves taking a snapshot of the keyspace from each Cassandra node, copying the snapshots locally, and exporting the keyspace schema. This process is performed in parallel for efficiency.
1. **Clear Old Snapshots:**
- For each node, remove any existing snapshots with the specified tag to ensure a clean state.
2. **Create New Snapshots:**
- For each node, create a new snapshot with the specified tag.
3. **Synchronize Snapshots Locally:**
- Copy the snapshot data from each node to the local directory. Each table's data is copied into a directory named after the table.
4. **Export Keyspace Schema:**
- Export the keyspace schema from the first node and save it locally.
### Directory Structure on Server
- Each table in the keyspace has its own directory.
- Inside each table's directory, there is a `snapshots` directory.
- The `snapshots` directory contains subdirectories for each snapshot, named according to the snapshot tag.
### Local Directory Structure
- The local directory mirrors the server's structure.
- Each table's snapshot data is stored in a directory named after the table, inside the local dump directory.
By following this process, a consistent and reliable backup of the Cassandra keyspace data is achieved, ensuring that the data can be restored or migrated as needed.
## Directory Structure Example
### Server-Side Structure
On the server, the directory structure for the snapshots is organized as follows:
```plaintext
/data
└── dev_keyspace_1
├── table1-abc1234567890abcdef1234567890abcdef
│ └── snapshots
│ └── dump_docker
│ ├── manifest.json
│ ├── nb-1-big-CompressionInfo.db
│ ├── nb-1-big-Data.db
│ ├── nb-1-big-Digest.crc32
│ ├── nb-1-big-Filter.db
│ ├── nb-1-big-Index.db
│ ├── nb-1-big-Statistics.db
│ ├── nb-1-big-Summary.db
│ └── schema.cql
├── table2-def4567890abcdef1234567890abcdef
│ └── snapshots
│ └── dump_docker
│ ├── manifest.json
│ ├── nb-1-big-CompressionInfo.db
│ ├── nb-1-big-Data.db
│ ├── nb-1-big-Digest.crc32
│ ├── nb-1-big-Filter.db
│ ├── nb-1-big-Index.db
│ ├── nb-1-big-Statistics.db
│ ├── nb-1-big-Summary.db
│ └── schema.cql
└── table3-ghi7890abcdef1234567890abcdef
└── snapshots
└── dump_docker
├── manifest.json
├── nb-1-big-CompressionInfo.db
├── nb-1-big-Data.db
├── nb-1-big-Digest.crc32
├── nb-1-big-Filter.db
├── nb-1-big-Index.db
├── nb-1-big-Statistics.db
├── nb-1-big-Summary.db
└── schema.cql
```
#### Local Directory Structure
When copied locally, the directory structure is organized as follows:
```plaintext
data/dumps
├──schema
│ ├── dev_keyspace_1_schema.cql
└── node1
├── table1
│ ├── manifest.json
│ ├── nb-1-big-CompressionInfo.db
│ ├── nb-1-big-Data.db
│ ├── nb-1-big-Digest.crc32
│ ├── nb-1-big-Filter.db
│ ├── nb-1-big-Index.db
│ ├── nb-1-big-Statistics.db
│ ├── nb-1-big-Summary.db
│ └── schema.cql
├── table2
│ ├── manifest.json
│ ├── nb-1-big-CompressionInfo.db
│ ├── nb-1-big-Data.db
│ ├── nb-1-big-Digest.crc32
│ ├── nb-1-big-Filter.db
│ ├── nb-1-big-Index.db
│ ├── nb-1-big-Statistics.db
│ ├── nb-1-big-Summary.db
│ └── schema.cql
└── table3
├── manifest.json
├── nb-1-big-CompressionInfo.db
├── nb-1-big-Data.db
├── nb-1-big-Digest.crc32
├── nb-1-big-Filter.db
├── nb-1-big-Index.db
├── nb-1-big-Statistics.db
├── nb-1-big-Summary.db
└── schema.cql
```