Version: next

Operation lifecycle coordination

internal/service/opslifecycle is the shared service-layer contract behind backup, restore, and upgrade orchestration. It does not own a controller or CRD of its own. Instead, it keeps operation lock identity, retry timing, and phase audit logging consistent whenever a manager needs to take disruptive action against a cluster.

At a glance

Used by

internal/service/backup
internal/service/restore
internal/service/upgrade

Owns

operation-lock identity helpers for disruptive work
retry intent classes and default requeue mapping
phase-transition audit field normalization

Writes through

internal/service/opslifecycle for status.operationLock updates
audit event fields for phase transitions
shared retry delays consumed by controller requeues

Depends on

OpenBaoCluster.status.operationLock as the persisted mutex surface
controller requeue behavior for long-running progress polling
manager-specific phase names and audit metadata

Architectural placement

Operation lifecycle coordination sits below the concrete managers and owns the shared lock write path:

A manager such as backup, restore, or upgrade decides it needs to start or resume work.
It uses internal/service/opslifecycle to acquire or release the expected lock identity, classify retry intent, and log phase changes.
opslifecycle applies status.operationLock directly through the shared SSA lock plane.

That keeps the shared safety model in one place instead of scattering lock and retry semantics across several managers.

OpenBaoCluster status ownership planes.
Plane	Field manager	Owned status fields
Observed status	`openbao-status-controller`	`status.observedGeneration`, `status.phase`, `status.activeLeader`, `status.readyReplicas`, `status.currentVersion`, `status.lastBackupTime`, `status.conditions`
Workload status	`openbao-workload-controller`	`status.initialized`, `status.selfInitialized`, `status.workload`
AdminOps status	`openbao-adminops-controller`	`status.upgrade`, `status.upgradeRequests`, `status.backup`, `status.blueGreen`, `status.breakGlass`, `status.adminOps`
Operation lock status	`openbao-operationlock-controller`	`status.operationLock`

Shared primitives.
Primitive	What it standardizes	Why it exists
OperationLock	A stable holder + operation identity for a long-running action.	Managers need an exact lock identity so renew and release only succeed for the intended owner.
Acquire / Release	Status-based lock ownership via the adapter with a fresh read-before-write gateway.	Controllers should not each patch status.operationLock differently, rely on stale cached objects, or invent different lock messages.
IsLockHeld / HeldError / AddHeldAuditFields	A shared way to classify contention and enrich audit events with who currently owns the lock.	Contention should produce consistent diagnostics instead of manager-specific strings.
LogPhaseTransition	Stable phase_from / phase_to audit fields for long-running operations.	Audit streams stay comparable across backup, restore, and upgrade.

Retry and lock model

Retry classes.
Retry class	Default delay	Typical use
lock-contention	`5s`	Another disruptive operation already owns the cluster lock, so the manager should requeue quickly and check again.
progress-poll	`5s`	A Job or long-running operation is still in progress and the manager is waiting for the next observable state change.
standard	`1m` by default, overridable with `OPENBAO_REQUEUE_STANDARD`	Background retry work that does not need tight polling.

Lock contract.
Concern	Shared behavior
Acquire vs renew	If the exact holder and operation already own the lock, acquisition renews the same lock instead of treating it as contention.
Exact-match release	Release succeeds only when holder and operation match the active lock, so one manager cannot accidentally clear another manager’s ownership.
Legacy takeover	The adapter only forces ownership when a clear or explicit override hits an SSA ownership conflict, so normal lock renewals stay non-destructive.
Force override	Force semantics exist for explicit override paths only; normal long-running operations should not silently steal the lock.
Contention diagnostics	HeldError exposes the current operation and holder so audit events and logs can explain why a manager requeued.

This is coordination, not orchestration

opslifecycle does not decide whether an upgrade should roll or blue-green, whether a restore request is valid, or whether a backup target is reachable. It only standardizes the lock, retry, and audit mechanics around those domain decisions.

Related deep dives

Upgrade managerSee how rolling and blue-green orchestration use shared lock and retry primitives during long-running transitions.Backup managerSee how scheduled and manual snapshot flows reuse the same contention and requeue model.Restore managerSee how destructive restore requests rely on the same lock identity and audit mechanics.

Next release documentation

You are reading the unreleased main docs. Use the version menu for the newest published release, or check the release notes for what is already out.

Architectural placement​

Retry and lock model​

Architectural placement

Retry and lock model