Coordinate locks, retries, and phase transitions across disruptive operations.
internal/service/opslifecycle is the shared service-layer contract behind backup, restore, and upgrade orchestration. It does not own a controller or CRD of its own. Instead, it keeps operation lock identity, retry timing, and phase audit logging consistent whenever a manager needs to take disruptive action against a cluster.
At a glance
Used by
- internal/service/backup
- internal/service/restore
- internal/service/upgrade
Owns
- operation-lock identity helpers for disruptive work
- retry intent classes and default requeue mapping
- phase-transition audit field normalization
Writes through
- internal/adapter/operationlock for status.operationLock updates
- audit event fields for phase transitions
- shared retry delays consumed by controller requeues
Depends on
- OpenBaoCluster.status.operationLock as the persisted mutex surface
- controller requeue behavior for long-running progress polling
- manager-specific phase names and audit metadata
Architectural Placement
Operation lifecycle coordination sits below the concrete managers and above the lock adapter:
- A manager such as backup, restore, or upgrade decides it needs to start or resume work.
- It uses
internal/service/opslifecycleto acquire or release the expected lock identity, classify retry intent, and log phase changes. opslifecycledelegates the actual status patching tointernal/adapter/operationlock.
That keeps the shared safety model in one place instead of scattering lock and retry semantics across several managers.
Diagram
Coordination model
Backup, restore, and upgrade do not each implement their own lock and retry policy. They share one coordination service that wraps the operation-lock adapter and keeps audit fields consistent.
Reference table
Shared primitives
| Primitive | What it standardizes | Why it exists |
|---|---|---|
| OperationLock | A stable holder + operation identity for a long-running action. | Managers need an exact lock identity so renew and release only succeed for the intended owner. |
| Acquire / Release | Status-based lock ownership via the adapter. | Controllers should not each patch status.operationLock differently or invent different lock messages. |
| IsLockHeld / HeldError / AddHeldAuditFields | A shared way to classify contention and enrich audit events with who currently owns the lock. | Contention should produce consistent diagnostics instead of manager-specific strings. |
| LogPhaseTransition | Stable phase_from / phase_to audit fields for long-running operations. | Audit streams stay comparable across backup, restore, and upgrade. |
Retry And Lock Model
Reference table
Retry classes
| Retry class | Default delay | Typical use |
|---|---|---|
| lock-contention | 5s | Another disruptive operation already owns the cluster lock, so the manager should requeue quickly and check again. |
| progress-poll | 5s | A Job or long-running operation is still in progress and the manager is waiting for the next observable state change. |
| standard | 1m by default, overridable with OPENBAO_REQUEUE_STANDARD | Background retry work that does not need tight polling. |
Reference table
Lock contract
| Concern | Shared behavior |
|---|---|
| Acquire vs renew | If the exact holder and operation already own the lock, acquisition renews the same lock instead of treating it as contention. |
| Exact-match release | Release succeeds only when holder and operation match the active lock, so one manager cannot accidentally clear another manager’s ownership. |
| Force override | Force semantics exist for explicit override paths only; normal long-running operations should not silently steal the lock. |
| Contention diagnostics | HeldError exposes the current operation and holder so audit events and logs can explain why a manager requeued. |
opslifecycle does not decide whether an upgrade should roll or blue-green, whether a restore request is valid, or whether a backup target is reachable. It only standardizes the lock, retry, and audit mechanics around those domain decisions.
Related deep dives
You are reading the unreleased main docs. Use the version menu for the newest published release, or check the release notes for what is already out.
Was this page helpful?
Use Needs work to open a structured GitHub issue for this page. The Yes button only acknowledges the signal locally.