Treat restore as a destructive, explicit, lock-aware workflow.
The restore manager keeps disaster recovery separate from normal cluster reconciliation. It models restore as an immutable CRD-backed request, coordinates execution through a dedicated controller path, and protects the cluster with explicit validation and lock ownership.
At a glance
Control path
- dedicated openbaorestore controller
- internal/app/openbaorestore
- internal/service/restore
Owns
- restore request validation
- operation lock lifecycle for restore
- restore job creation and terminal cleanup
Writes
- OpenBaoRestore phase progression
- OpenBaoCluster.status.operationLock for restore ownership
- restore job launch and cleanup state
Depends on
- snapshot source accessibility
- restore authentication and token strategy
- backup provider configuration and cluster lock state
Request Model
Reference table
Restore request contract
| Contract | Why it exists |
|---|---|
| CRD-based request | Restore is visible, declarative, and auditable instead of being hidden inside OpenBaoCluster status or imperative scripts. |
| Immutable spec | Changing restore inputs requires a new request so the audit trail and execution intent stay stable. |
| Stateless controller | The controller polls the restore job rather than depending on broad watch permissions across every child object. |
| Operation lock ownership | Restore must block upgrades and backups while destructive data-plane changes are in flight. |
Restore Lifecycle
Diagram
Restore lifecycle
Restore validates first, acquires the cluster lock second, and only then launches a restore job. Terminal phases keep retrying lock cleanup until the cluster is no longer marked as restore-owned.
Reference table
Restore phases
| Phase | Manager intent |
|---|---|
| Pending / Validating | Reject invalid target clusters, inaccessible snapshots, missing auth, and unsafe conflicting operations before anything destructive starts. |
| Running | Launch the restore job after the restore lock is owned and the request is known-good. |
| Completed | Release the lock and preserve the restore record as the audit trail of what happened. |
| Failed | Expose terminal failure while continuing lock cleanup on later reconciles until the cluster is no longer marked as restore-owned. |
Safety Boundaries
Reference table
Safety boundaries
| Concern | Manager behavior |
|---|---|
| Conflicting operations | Backups and upgrades are blocked by the restore operation lock while restore is active. |
| Emergency override | Override requires explicit force semantics rather than silently ignoring a stuck or conflicting lock. |
| Execution surface | The controller delegates the destructive work to a job instead of embedding restore logic in normal reconcile loops. |
| After restore | The manager may leave the cluster requiring unseal or follow-up recovery work; completion only means the restore workflow finished. |
Restore is intentionally modeled outside the normal OpenBaoCluster lifecycle. The operator treats it as a destructive recovery operation with its own request object, its own controller path, and its own lock semantics.
Related deep dives
This version tracks a prerelease build. Features and behavior may change before the next stable release.
Was this page helpful?
Use Needs work to open a structured GitHub issue for this page. The Yes button only acknowledges the signal locally.