Version: next

Protect live clusters with scheduled snapshots and explicit restore requests.

Once the cluster is running in production, durability becomes its own lifecycle. The backup manager schedules and records snapshot Jobs, the restore manager handles destructive restore requests through a separate CRD path, and both rely on shared operation-lock coordination so they do not collide with upgrades.

At a glance

Starts with

a live initialized cluster and an optional spec.backup schedule
object-storage configuration plus backup authentication
explicit OpenBaoRestore requests when destructive recovery is needed

Primary owners

internal/service/backup
internal/service/restore
internal/service/opslifecycle

Writes

backup executor Jobs and status.backup timing and failure state
OpenBaoRestore phase progression and cluster operation-lock ownership
retention cleanup after successful uploads and restore job launch/cleanup state

Hands off to

normal steady-state operation when backups succeed
post-restore follow-up when a restore request completes
operator-facing backup, restore, and recovery procedures

Architectural placement

Durability work is shared across two explicit operation surfaces:

The backup manager lives on the adminops path and handles scheduled, manual, and pre-upgrade snapshot jobs.
The restore manager runs through the dedicated OpenBaoRestore controller path so destructive recovery stays explicit and auditable.
internal/service/opslifecycle supplies shared lock and retry behavior so backups, restores, and upgrades coordinate instead of colliding.

That model keeps backup routine and restore exceptional, even though both exist in the same durability phase of the lifecycle.

Durability surfaces.
Surface	Primary owner	Purpose
`spec.backup`	Backup manager consumes it.	Declares schedule, provider target, auth wiring, and retention policy for routine snapshots.
`status.backup`	Backup manager writes it.	Records last attempt, next schedule, last success, and consecutive failures so durability is visible without inspecting Jobs.
`OpenBaoRestore`	Restore manager consumes and updates it.	Keeps restore explicit, immutable, and auditable instead of hiding destructive recovery inside cluster status.
`status.operationLock`	Shared via opslifecycle.	Blocks conflicting upgrade, backup, or restore work while one disruptive operation is in flight.

Safety boundaries.
Concern	Durability behavior
Backup during disruptive work	Scheduled backups do not start while upgrades or restore are already active on the same cluster.
Authentication surface	Backup and restore use dedicated auth wiring such as JWT roles or explicit token references; root tokens are not the durability mechanism.
Restore visibility	Restore is modeled as a separate CRD-backed request so destructive recovery has its own audit trail and phase status.
Retention timing	Retention cleanup runs only after a successful backup so older recovery points are not removed before a new one exists.

Related durability pages

Backup managerOpen the deep dive for scheduled backup execution, status, and retention details.Restore managerOpen the deep dive for explicit restore requests, lock ownership, and destructive workflow handling.Restore guideCompare the internal durability model with the operator-facing restore and recovery procedures.

Next release documentation

You are reading the unreleased main docs. Use the version menu for the newest published release, or check the release notes for what is already out.

Architectural placement​

Architectural placement