Protect live clusters with scheduled snapshots and explicit restore requests.
Once the cluster is running in production, durability becomes its own lifecycle. The backup manager schedules and records snapshot Jobs, the restore manager handles destructive restore requests through a separate CRD path, and both rely on shared operation-lock coordination so they do not collide with upgrades.
At a glance
Starts with
- a live initialized cluster and an optional
spec.backupschedule - object-storage configuration plus backup authentication
- explicit
OpenBaoRestorerequests when destructive recovery is needed
Primary owners
- internal/service/backup
- internal/service/restore
- internal/service/opslifecycle
Writes
- backup executor Jobs and
status.backuptiming and failure state OpenBaoRestorephase progression and cluster operation-lock ownership- retention cleanup after successful uploads and restore job launch/cleanup state
Hands off to
- normal steady-state operation when backups succeed
- post-restore follow-up when a restore request completes
- operator-facing backup, restore, and recovery procedures
Architectural Placement
Durability work is shared across two explicit operation surfaces:
- The backup manager lives on the adminops path and handles scheduled, manual, and pre-upgrade snapshot jobs.
- The restore manager runs through the dedicated
OpenBaoRestorecontroller path so destructive recovery stays explicit and auditable. internal/service/opslifecyclesupplies shared lock and retry behavior so backups, restores, and upgrades coordinate instead of colliding.
That model keeps backup routine and restore exceptional, even though both exist in the same durability phase of the lifecycle.
Diagram
Day N durability loop
Backups produce durable recovery points during live operation; restore consumes one of those recovery points through a separate request path when the cluster needs destructive recovery.
Reference table
Durability surfaces
| Surface | Primary owner | Purpose |
|---|---|---|
spec.backup | Backup manager consumes it. | Declares schedule, provider target, auth wiring, and retention policy for routine snapshots. |
status.backup | Backup manager writes it. | Records last attempt, next schedule, last success, and consecutive failures so durability is visible without inspecting Jobs. |
OpenBaoRestore | Restore manager consumes and updates it. | Keeps restore explicit, immutable, and auditable instead of hiding destructive recovery inside cluster status. |
status.operationLock | Shared via opslifecycle. | Blocks conflicting upgrade, backup, or restore work while one disruptive operation is in flight. |
Reference table
Safety boundaries
| Concern | Durability behavior |
|---|---|
| Backup during disruptive work | Scheduled backups do not start while upgrades or restore are already active on the same cluster. |
| Authentication surface | Backup and restore use dedicated auth wiring such as JWT roles or explicit token references; root tokens are not the durability mechanism. |
| Restore visibility | Restore is modeled as a separate CRD-backed request so destructive recovery has its own audit trail and phase status. |
| Retention timing | Retention cleanup runs only after a successful backup so older recovery points are not removed before a new one exists. |
Related durability pages
You are reading docs for version 0.1.0. Use the version menu to switch to next or another archived release.
Was this page helpful?
Use Needs work to open a structured GitHub issue for this page. The Yes button only acknowledges the signal locally.