Skip to main content
Version: next

At a glance

Control path

  • adminops reconciler
  • internal/app/openbaocluster/adminops
  • internal/service/upgrade/rolling and internal/service/upgrade/bluegreen
  • shared seams in internal/service/upgrade/core, snapshot, and raftops

Owns

  • strategy-specific rolling and blue-green phase orchestration
  • shared lock, status, metrics, and root-lifecycle mechanics
  • upgrade executor jobs, snapshot prerequisites, and Raft coordination

Writes

  • status.upgrade, status.blueGreen, and status.breakGlass through shared status helpers
  • partition changes, green revision resources, and executor jobs
  • break-glass and failure state when rollback safety is compromised

Depends on

  • target version policy and image alignment
  • backup readiness and network egress for snapshot prerequisites
  • operation lifecycle coordination for lock, retry, and phase timing

Architectural Placement

Upgrade execution belongs to the AdminOps orchestration path:

  1. internal/controller/openbaocluster receives an adminops reconcile event.
  2. The controller delegates to internal/app/openbaocluster.
  3. AdminOps orchestration invokes either the rolling or blue-green upgrade manager flow.
  4. Strategy packages delegate shared mechanics to internal/service/upgrade/core, internal/service/upgrade/snapshot, internal/service/upgrade/raftops, and internal/platform/statusapply.

That keeps upgrade state machines out of the workload loop and lets long-running transitions own their own retry model.

Package Shape

The upgrade subsystem is split so strategy packages keep workflow ownership while shared mechanics live behind narrower seams:

  • internal/service/upgrade keeps root helpers that are shared by both strategies but are not strategy-specific or executor-specific, such as request parsing, version and image policy, shared metrics types, pod client helpers, and root lifecycle helpers.
  • internal/service/upgrade/rolling owns the rolling state machine: partition progression, leader step-down sequencing, per-pod rollout, convergence, and rolling-specific retry/failure handling.
  • internal/service/upgrade/bluegreen owns the blue-green phase machine: green deployment, sync/promotion/cutover, rollback, and break-glass handling.
  • internal/service/upgrade/core owns shared lifecycle mechanics used by strategy code, including upgrade locks, common status mutators, metrics session bookkeeping, and blue-green status/state helpers that are not tied to a single phase.
  • internal/service/upgrade/snapshot owns shared pre-upgrade snapshot preparation: prerequisite validation, runtime bootstrap, Job state modeling, and common existing-Job result handling.
  • internal/service/upgrade/raftops owns executor-side Raft and OpenBao coordination such as leader discovery, leader transfer, peer join/promote/demote/remove, and autopilot capability fallback.
  • internal/platform/statusapply owns the shared AdminOps status apply and merge-patch helpers so upgrade, backup, and adminops flows use the same status-subresource ownership rules.

Decision matrix

Strategy selection

Strategy selection.
StrategyBest fitPrimary tradeoff
Blue-greenHigh-control cutovers with explicit promotion and rollback phases.More orchestration and roughly double storage during the transition.

Diagram

Rolling update flow

Rolling upgrades use StatefulSet partitioning and leader step-down so each pod can be replaced while Raft remains healthy.

Rolling safety controls

  • StatefulSet partitioning pauses Kubernetes-driven rollout until the manager explicitly advances each ordinal.
  • Reverse ordinal updates and forced leader step-down protect Raft availability during pod replacement.
  • Finalization only happens after the StatefulSet revision and observed workload health fully converge.

State And Recovery Model

Reference table

Status-backed upgrade state

Status-backed upgrade state.
State surfaceWhat it preserves
status.blueGreen.phaseThe active blue-green phase and whether promotion, cleanup, or rollback is in progress.
lastErrorReason / lastErrorMessageWhy the current attempt failed and what must change before retry.
status.breakGlassThe nonce and diagnostic state when late rollback automation can no longer continue safely.

Reference table

Safety boundaries

Safety boundaries.
ConcernManager behavior
Version policy and image alignmentInvalid semantic versions, downgrades, and conflicting image/version inputs are rejected before orchestration begins.
Backup prerequisitesSnapshot prerequisites and backup authentication must already be valid before upgrade safety checks pass.
Atomic completionRolling finalization updates upgrade state and currentVersion together so status does not split across two truths.

Related deep dives

Next release documentation

You are reading the unreleased main docs. Use the version menu for the newest published release, or check the release notes for what is already out.

Was this page helpful?

Use Needs work to open a structured GitHub issue for this page. The Yes button only acknowledges the signal locally.