Skip to main content
Version: next

At a glance

Starts with

  • an initialized cluster with steady-state workload reconciliation
  • version drift, backup schedules, or explicit maintenance requests
  • operation lifecycle coordination available for lock and retry management

Primary owners

  • adminops controller path
  • internal/service/upgrade
  • internal/service/backup and internal/service/opslifecycle

Writes

  • status.upgrade, status.blueGreen, and operation-lock state
  • upgrade and backup executor Jobs plus green revision resources when needed
  • maintenance annotations or pause-driven no-op behavior depending on user intent

Hands off to

  • backup and restore flows once a cluster needs ongoing durability
  • troubleshooting and recovery guides when automation must pause
  • steady-state workload reconciliation after an operation completes

Architectural Placement

Day 2 work is intentionally separated from the high-churn workload loop:

  1. Workload reconciliation continues to own the steady-state pod, Service, and config contract.
  2. Admin operations orchestration takes over when a change requires long-running coordination such as upgrade or backup.
  3. internal/service/opslifecycle keeps disruptive operations consistent around lock ownership, retry timing, and audit fields.

That separation prevents upgrades, backups, and other long-running workflows from blocking normal workload repair.

Diagram

Day 2 control-plane handoff

Once the cluster is live, disruptive operations route through the admin operations path instead of staying inside the high-churn workload controller.

Reference table

Day 2 operation families

Day 2 operation families.
Operation familyPrimary ownerLifecycle role
Upgrade orchestrationUpgrade manager via adminops.Handles version drift, strategy-specific state, and Raft-aware cutover logic.
Backup schedulingBackup manager via adminops.Runs snapshot Jobs and updates backup status without moving data through the controller.
Manual intervention gatesUser-driven pause and maintenance settings.Limit or reshape automation when an operator needs to intervene directly.

Rolling path

  • Version drift triggers pre-upgrade validation around semver, health, and optional snapshot prerequisites.
  • The upgrade manager uses StatefulSet partitioning and leader step-down to replace one pod at a time in reverse ordinal order.
  • Progress is preserved in status so a failed step can stop cleanly and later resume from an explicit retry request.
  • Completion updates currentVersion and clears the transient rolling-upgrade state once the workload fully converges.

Reference table

Operational control surfaces

Operational control surfaces.
ControlWhat it doesWhen to use it
spec.maintenance.enabled=trueKeeps reconciliation running, but marks resources for controlled disruptive changes allowed by policy.Use when the operator should continue known-safe automation during maintenance work.
spec.breakGlassAckAcknowledges an issued nonce before risky late-stage recovery automation can continue.Use only after an operator has reviewed a break-glass condition and accepts the next step explicitly.

Continue the lifecycle

Next release documentation

You are reading the unreleased main docs. Use the version menu for the newest published release, or check the release notes for what is already out.

Was this page helpful?

Use Needs work to open a structured GitHub issue for this page. The Yes button only acknowledges the signal locally.