Version: 0.4.x

Day 2 operations lifecycle

Day 2 starts once the cluster is initialized and the workload path is steady. From that point on, long-running operations such as upgrades and backups move through the admin operations path, while maintenance controls gate how much automation is allowed to continue during manual intervention.

At a glance

Starts with

an initialized cluster with steady-state workload reconciliation
version drift, backup schedules, or explicit maintenance requests
operation lifecycle coordination available for lock and retry management

Primary owners

adminops controller path
internal/service/upgrade
internal/service/backup and internal/service/opslifecycle

Writes

status.upgrade, status.blueGreen, and operation-lock state
upgrade and backup executor Jobs plus green revision resources when needed
maintenance annotations or pause-driven no-op behavior depending on user intent

Hands off to

backup and restore flows once a cluster needs ongoing durability
troubleshooting and recovery guides when automation must pause
steady-state workload reconciliation after an operation completes

Architectural placement

Day 2 work is intentionally separated from the high-churn workload loop:

Workload reconciliation continues to own the steady-state pod, Service, and config contract.
Admin operations orchestration takes over when a change requires long-running coordination such as upgrade or backup.
internal/service/opslifecycle keeps disruptive operations consistent around lock ownership, retry timing, and audit fields.

That separation prevents upgrades, backups, and other long-running workflows from blocking normal workload repair.

Day 2 operation families.
Operation family	Primary owner	Lifecycle role
Routine workload repair	Workload reconcile path.	Keeps StatefulSets, Services, ConfigMaps, and Secrets converged without entering the long-running adminops model.
Upgrade orchestration	Upgrade manager via adminops.	Handles version drift, strategy-specific state, and Raft-aware cutover logic.
Backup scheduling	Backup manager via adminops.	Runs snapshot Jobs and updates backup status without moving data through the controller.
Manual intervention gates	User-driven pause and maintenance settings.	Limit or reshape automation when an operator needs to intervene directly.

Rolling upgrades
Blue-green upgrades

Rolling path

Version drift triggers pre-upgrade validation around semver, health, and optional snapshot prerequisites.
The upgrade manager uses StatefulSet partitioning and leader step-down to replace one pod at a time in reverse ordinal order.
Progress is preserved in status so a failed step can stop cleanly and later resume from an explicit retry request.
Completion updates currentVersion and clears the transient rolling-upgrade state once the workload fully converges.

Operational control surfaces.
Control	What it does	When to use it
`spec.paused=true`	Short-circuits reconcilers so the operator stops mutating managed resources for the cluster.	Use when you need manual intervention and want automation to stop entirely.
`spec.maintenance.enabled=true`	Keeps reconciliation running, but marks resources for controlled disruptive changes allowed by policy.	Use when the operator should continue known-safe automation during maintenance work.
`spec.breakGlassAck`	Acknowledges an issued nonce before risky late-stage recovery automation can continue.	Use only after an operator has reviewed a break-glass condition and accepts the next step explicitly.

Continue the lifecycle

Backups and restoreMove into the durability path that protects live clusters with snapshots and explicit restore requests.Upgrade managerOpen the deep dive for the exact rolling and blue-green orchestration contract.Operate docsCompare the internal Day 2 control-plane model with the operator-facing upgrade, maintenance, and troubleshooting guides.

Published release documentation

You are reading docs for version 0.4.x. Use the version menu to switch to next or another archived release.

Architectural placement​

Architectural placement