Local cross-cluster DR baseline
This local DR baseline keeps the source, target, and shared trust services separated so backup, restore, unseal, and cutover all cross the same kinds of boundaries used in a real disaster-recovery event.
Validated coverage
- a snapshot can leave the source cluster, cross an object-storage boundary, and restore into a different target cluster
- the restored target can unseal only because it shares the same external Transit root of trust as the source
- restore verification can confirm both credential cutover and data cutover before any manual failover happens
- the operator's backup and restore workflows still work when source and target are split across real ingress and storage boundaries
This local disaster-recovery reference architecture uses k3d to validate the DR invariants for backup, restore, unseal, and manual cutover before moving to a cloud recovery pair.
Decision matrix
Baseline summary
| Surface | Choice | Why it matters |
|---|---|---|
| Cluster split | One infra cluster, one source cluster, one target cluster | Restore crosses a real cluster boundary instead of staying inside one namespace or one API server. |
| Seal path | Shared external Transit key | The target can only unseal restored data because it shares the same external seal root of trust as the source. |
| Transfer boundary | RustFS S3-compatible bucket | Snapshots move through a real object-storage boundary that both clusters can reach independently. |
| Edge model | Dedicated passthrough endpoints for infra, source, and target | The lane validates real ingress reachability without collapsing TLS termination into a single local shortcut. |
| Cutover model | Manual restore and manual client or DNS cutover | The baseline covers restore correctness and does not include automatic failover orchestration. |
Diagram
Baseline topology
The source cluster writes snapshots to shared storage, the target cluster restores from that storage, and both sides depend on the same external Transit key to make restored data usable.
Why this lane exists
Reference table
Key design choices
| Choice | What it protects | Why it stays in the lane |
|---|---|---|
| Shared seal root | Restored data is still decryptable after it lands in the target cluster. | This is the core DR invariant. Without a shared external seal root, the target can restore bits and still fail to unseal. |
| Separate infra cluster for trust services | The shared Transit dependency stays independent from source and target failure domains. | The lane should treat trust services as an external dependency, not as part of either OpenBao cluster. |
| Shared RustFS bucket | The restore uses the same object-transfer shape as a real remote-storage path. | The lane exercises snapshot handoff through shared object storage instead of a local disk copy. |
| Manual cutover | Operators verify the target before traffic moves. | The lane proves correctness and operator workflow, not automated failover policy. |
Baseline requirements
- keep the source and target on the same OpenBao version for the restore event
- keep the source and target pointed at the same Transit address, CA bundle, SNI, and key name
- keep shared object storage reachable from both clusters before you start a backup or restore
- keep the target cluster created ahead of time with restore auth configured
- perform cutover only after credential and data verification succeeds on the restored target
The local DR lane proved source backup to RustFS, restore into a separate target cluster, target unseal with the shared Transit key, and post-restore checks that source credentials and source data replaced the target bootstrap state.
This baseline does not define automatic failover or a cloud DR reference. It covers a validated manual recovery flow with explicit preconditions for backup, restore, and cutover.
Next steps
You are reading the unreleased main docs. Use the version menu for the newest published release, or check the release notes for what is already out.
Was this page helpful?
Use Needs work to open a structured GitHub issue for this page. The Yes button only acknowledges the signal locally.