Skip to main content
Version: next

Decision matrix

Choose the right control

Choose the right control.
ControlUse it whenOperator behaviorWatch for
Replica scalingYou need more capacity, stronger fault tolerance, or a deliberate reduction after a change.The operator grows or shrinks the StatefulSet and manages peer membership as the replica count changes.Do not treat scale-down as a harmless cost-saving action on a production Raft cluster.
Maintenance modeAdmission policy requires the openbao.org/maintenance=true signal before restarts or controlled deletes.The operator annotates managed resources so callers with maintenance permission on the owning OpenBaoCluster can perform planned restarts or deletes.Grant the custom maintenance verb on the owning OpenBaoCluster before using this path.
Pause reconciliationYou need a short-lived window where the operator stops mutating the cluster while you inspect or repair it.The operator stops normal reconciliation until you resume it.Pausing stops normal reconciliation, but safe-mode incidents still require the dedicated recovery flow.

Drain nodes without breaking quorum

For clusters with three or more replicas, the operator creates a PodDisruptionBudget with maxUnavailable: 1. That is the main guardrail that keeps a normal node drain from evicting too many Pods at once.

Reference table

Pod disruption behavior by replica count

Pod disruption behavior by replica count.
ReplicasPDB createdWhat it means
1NoThere is no redundancy. Any disruption takes the service down.
2NoA two-node Raft cluster cannot tolerate one unavailable voter cleanly enough for a safe maxUnavailable: 1 policy.
5YesThe operator still uses a conservative one-at-a-time disruption model.

Verify

Check the disruption budget before a drain

bash

kubectl get pdb -n <namespace>
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

If more than one OpenBao Pod is concentrated on the same node, the drain may take longer because Kubernetes has to evict the Pods sequentially.

The PDB covers only voluntary disruption

Node drains, autoscaler evictions, and direct eviction API calls are guarded. Node crashes, OOM kills, or kernel failures are not. Those rely on normal Raft quorum behavior instead of disruption-budget enforcement.

Scale the cluster deliberately

Use scaling as an intentional operational change, not a quick patch to quiet a temporary issue.

Configure

Increase the replica count

yaml

spec:
replicas: 5

The operator creates the new Pods, waits for them to join the Raft cluster, and updates the PDB to match the new size.

Use maintenance mode for controlled restarts

Enable maintenance mode when your admission policies require a deliberate maintenance signal before managed Pods or the StatefulSet can be restarted, deleted, or otherwise touched during planned work.

Configure

Enable maintenance mode

yaml

spec:
maintenance:
enabled: true

In this mode, the operator annotates managed Pods and the StatefulSet with openbao.org/maintenance=true. Callers still need normal Kubernetes RBAC on the target resource plus the custom maintenance verb on the owning OpenBaoCluster.

This mode is also required for some day 2 changes that need a controlled restart path, such as finishing filesystem expansion after increasing spec.storage.size.

Trigger a rolling restart

Use spec.runtime.restartAt when you need the workload to roll because an external dependency changed, such as a certificate chain, secret material, or another input that should force a controlled refresh.

Configure

Request a rolling restart

yaml

spec:
runtime:
restartAt: "2026-01-19T00:00:00Z"

This request is independent from maintenance authorization. Set maintenance only when you need disruptive work on managed resources or an operator flow that explicitly requires the maintenance gate.

Use spec.runtime.restartAt for new configurations. The older spec.maintenance.restartAt path remains temporarily for compatibility.

When a leader Pod must be restarted or evicted, the operator handles graceful step-down automatically before termination so the cluster can elect a new leader cleanly.

Verify the cluster before and after the window

Verify

Inspect health before and after maintenance

bash

kubectl get openbaocluster <name> -n <namespace> -o jsonpath='{.status.phase}{"\n"}'
kubectl get pods -n <namespace> -l openbao.org/cluster=<name>
kubectl exec -n <namespace> -it <pod-name> -- bao operator raft list-peers

The important end state is a clean phase, Ready Pods, and a Raft peer set that matches the intended topology after the maintenance action finishes.

External references

Move to the next control

Next release documentation

You are reading the unreleased main docs. Use the version menu for the newest published release, or check the release notes for what is already out.

Was this page helpful?

Use Needs work to open a structured GitHub issue for this page. The Yes button only acknowledges the signal locally.