Treat restore as a destructive, explicit, lock-aware workflow.
The restore manager keeps disaster recovery separate from normal cluster reconciliation. It models restore as an immutable CRD-backed request, coordinates execution through a dedicated controller path, and protects the cluster with explicit validation and lock ownership.
At a glance
Control path
- dedicated openbaorestore controller
- internal/app/openbaorestore
- internal/service/restore
Owns
- restore request validation
- operation lock lifecycle for restore
- restore job creation and terminal cleanup
Writes
- OpenBaoRestore phase progression
- OpenBaoCluster.status.operationLock for restore ownership
- restore job launch and cleanup state
Depends on
- snapshot source accessibility
- restore authentication and token strategy
- backup provider configuration and cluster lock state
Request Model
Reference table
Restore request contract
| Contract | Why it exists |
|---|---|
| CRD-based request | Restore is visible, declarative, and auditable instead of being hidden inside OpenBaoCluster status or imperative scripts. |
| Immutable spec | Changing restore inputs requires a new request so the audit trail and execution intent stay stable. |
| Stateless controller | The controller polls the restore job rather than depending on broad watch permissions across every child object. |
| Operation lock ownership | Restore must block upgrades and backups while destructive data-plane changes are in flight. |
Restore Lifecycle
Diagram
Restore lifecycle
Restore validates first, acquires the cluster lock second, and only then launches a restore job. Terminal phases keep retrying lock cleanup until the cluster is no longer marked as restore-owned.
Reference table
Restore phases
| Phase | Manager intent |
|---|---|
| Pending / Validating | Reject invalid target clusters, inaccessible snapshots, missing auth, and unsafe conflicting operations before anything destructive starts. |
| Running | Launch the restore job after the restore lock is owned and the request is known-good. |
| Completed | Release the lock and preserve the restore record as the audit trail of what happened. |
| Failed | Expose terminal failure while continuing lock cleanup on later reconciles until the cluster is no longer marked as restore-owned. |
Safety Boundaries
Reference table
Safety boundaries
| Concern | Manager behavior |
|---|---|
| Conflicting operations | Backups and upgrades are blocked by the restore operation lock while restore is active. |
| Emergency override | Override requires explicit force semantics rather than silently ignoring a stuck or conflicting lock. |
| Execution surface | The controller delegates the destructive work to a job instead of embedding restore logic in normal reconcile loops. |
| After restore | The manager may leave the cluster requiring unseal or follow-up recovery work; completion only means the restore workflow finished. |
Restore is intentionally modeled outside the normal OpenBaoCluster lifecycle. The operator treats it as a destructive recovery operation with its own request object, its own controller path, and its own lock semantics.
Related deep dives
You are reading the unreleased main docs. Use the version menu for the newest published release, or check the release notes for what is already out.
Was this page helpful?
Use Needs work to open a structured GitHub issue for this page. The Yes button only acknowledges the signal locally.