Change workload versions without violating Raft safety.
The upgrade manager owns disruptive version changes. It keeps upgrade orchestration out of the workload loop, persists state in status so upgrades survive controller restarts, and prioritizes cluster availability over finishing quickly.
At a glance
Control path
- adminops reconciler
- internal/app/openbaocluster/adminops
- internal/service/upgrade/rolling and internal/service/upgrade/bluegreen
Owns
- rolling and blue-green orchestration state
- upgrade executor jobs and phase transitions
- status-backed retry and rollback coordination
Writes
- status.upgrade and status.blueGreen state
- partition changes, green revision resources, and executor jobs
- break-glass and failure state when rollback safety is compromised
Depends on
- target version policy and image alignment
- backup readiness and network egress for snapshot prerequisites
- operation lifecycle coordination for lock, retry, and phase timing
Architectural Placement
Upgrade execution belongs to the AdminOps orchestration path:
internal/controller/openbaoclusterreceives an adminops reconcile event.- The controller delegates to
internal/app/openbaocluster. - AdminOps orchestration invokes either the rolling or blue-green upgrade manager flow.
That keeps upgrade state machines out of the workload loop and lets long-running transitions own their own retry model.
Decision matrix
Strategy selection
| Strategy | Best fit | Primary tradeoff |
|---|---|---|
| Rolling update | Default upgrades with minimal extra infrastructure. | Lower resource cost, but each pod replacement must preserve Raft health and leader safety. |
| Blue-green | High-control cutovers with explicit promotion and rollback phases. | More orchestration and roughly double storage during the transition. |
- Rolling update
- Blue-green
Diagram
Rolling update flow
Rolling upgrades use StatefulSet partitioning and leader step-down so each pod can be replaced while Raft remains healthy.
Rolling safety controls
- StatefulSet partitioning pauses Kubernetes-driven rollout until the manager explicitly advances each ordinal.
- Reverse ordinal updates and forced leader step-down protect Raft availability during pod replacement.
- Finalization only happens after the StatefulSet revision and observed workload health fully converge.
Blue-green creates a second revision and needs roughly double storage capacity for the duration of the transition.
Diagram
Blue-green flow
Blue-green creates a parallel revision, promotes it through explicit phases, then switches traffic only after leadership and voter transitions are safe.
Blue-green safety controls
- The service selector switches to green only in cleanup.
- Manual promotion, manual rollback, and validation-hook failures all route through explicit phase handling in status.
- If rollback consensus repair fails late, the manager enters break-glass and stops risky automation.
State And Recovery Model
Reference table
Status-backed upgrade state
| State surface | What it preserves |
|---|---|
| status.upgrade | Rolling partition progress, completed pods, and finalization gating. |
| status.blueGreen.phase | The active blue-green phase and whether promotion, cleanup, or rollback is in progress. |
| lastErrorReason / lastErrorMessage | Why the current attempt failed and what must change before retry. |
| status.breakGlass | The nonce and diagnostic state when late rollback automation can no longer continue safely. |
Reference table
Safety boundaries
| Concern | Manager behavior |
|---|---|
| Availability over progress | Rolling pauses or retries when health is ambiguous; blue-green aborts early and rolls back later phases instead of forcing completion. |
| Version policy and image alignment | Invalid semantic versions, downgrades, and conflicting image/version inputs are rejected before orchestration begins. |
| Backup prerequisites | Snapshot prerequisites and backup authentication must already be valid before upgrade safety checks pass. |
| Atomic completion | Rolling finalization updates upgrade state and currentVersion together so status does not split across two truths. |
Related deep dives
This version tracks a prerelease build. Features and behavior may change before the next stable release.
Was this page helpful?
Use Needs work to open a structured GitHub issue for this page. The Yes button only acknowledges the signal locally.