Skip to main content
Version: 0.1.0-rc.5

At a glance

Starts with

  • an initialized cluster with steady-state workload reconciliation
  • version drift, backup schedules, or explicit maintenance requests
  • operation lifecycle coordination available for lock and retry management

Primary owners

  • adminops controller path
  • internal/service/upgrade
  • internal/service/backup and internal/service/opslifecycle

Writes

  • status.upgrade, status.blueGreen, and operation-lock state
  • upgrade and backup executor Jobs plus green revision resources when needed
  • maintenance annotations or pause-driven no-op behavior depending on user intent

Hands off to

  • backup and restore flows once a cluster needs ongoing durability
  • troubleshooting and recovery guides when automation must pause
  • steady-state workload reconciliation after an operation completes

Architectural Placement

Day 2 work is intentionally separated from the high-churn workload loop:

  1. Workload reconciliation continues to own the steady-state pod, Service, and config contract.
  2. Admin operations orchestration takes over when a change requires long-running coordination such as upgrade or backup.
  3. internal/service/opslifecycle keeps disruptive operations consistent around lock ownership, retry timing, and audit fields.

That separation prevents upgrades, backups, and other long-running workflows from blocking normal workload repair.

Diagram

Day 2 control-plane handoff

Once the cluster is live, disruptive operations route through the admin operations path instead of staying inside the high-churn workload controller.

Reference table

Day 2 operation families

Day 2 operation families.
Operation familyPrimary ownerLifecycle role
Upgrade orchestrationUpgrade manager via adminops.Handles version drift, strategy-specific state, and Raft-aware cutover logic.
Backup schedulingBackup manager via adminops.Runs snapshot Jobs and updates backup status without moving data through the controller.
Manual intervention gatesUser-driven pause and maintenance settings.Limit or reshape automation when an operator needs to intervene directly.

Rolling path

  • Version drift triggers pre-upgrade validation around semver, health, and optional snapshot prerequisites.
  • The upgrade manager uses StatefulSet partitioning and leader step-down to replace one pod at a time in reverse ordinal order.
  • Progress is preserved in status so a failed step can stop cleanly and later resume from an explicit retry request.
  • Completion updates currentVersion and clears the transient rolling-upgrade state once the workload fully converges.

Reference table

Operational control surfaces

Operational control surfaces.
ControlWhat it doesWhen to use it
spec.maintenance.enabled=trueKeeps reconciliation running, but marks resources for controlled disruptive changes allowed by policy.Use when the operator should continue known-safe automation during maintenance work.
spec.breakGlassAckAcknowledges an issued nonce before risky late-stage recovery automation can continue.Use only after an operator has reviewed a break-glass condition and accepts the next step explicitly.

Continue the lifecycle

Prerelease documentation

This version tracks a prerelease build. Features and behavior may change before the next stable release.

Was this page helpful?

Use Needs work to open a structured GitHub issue for this page. The Yes button only acknowledges the signal locally.