Skip to main content
Version: 0.1.0

Decision matrix

Choose the right control

Choose the right control.
ControlUse it whenOperator behaviorWatch for
Replica scalingYou need more capacity, stronger fault tolerance, or a deliberate reduction after a change.The operator grows or shrinks the StatefulSet and manages peer membership as the replica count changes.Do not treat scale-down as a harmless cost-saving action on a production Raft cluster.
Maintenance modeAdmission policy requires the openbao.org/maintenance=true signal before restarts or controlled deletes.The operator annotates managed resources so maintenance-only actions are allowed under the configured break-glass groups.This is not a generic bypass for random edits. It is a controlled operational mode.
Pause reconciliationYou need a short-lived window where the operator stops mutating the cluster while you inspect or repair it.The operator stops normal reconciliation until you resume it.Pausing is not the same thing as recovery and is not enough for safe-mode incidents.

Drain nodes without breaking quorum

For clusters with three or more replicas, the operator creates a PodDisruptionBudget with maxUnavailable: 1. That is the main guardrail that keeps a normal node drain from evicting too many Pods at once.

Reference table

Pod disruption behavior by replica count

Pod disruption behavior by replica count.
ReplicasPDB createdWhat it means
1NoThere is no redundancy. Any disruption takes the service down.
2NoA two-node Raft cluster cannot tolerate one unavailable voter cleanly enough for a safe maxUnavailable: 1 policy.
5YesThe operator still uses a conservative one-at-a-time disruption model.

Verify

Check the disruption budget before a drain

bash

kubectl get pdb -n <namespace>
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

If more than one OpenBao Pod is concentrated on the same node, the drain may take longer because Kubernetes has to evict the Pods sequentially.

The PDB covers only voluntary disruption

Node drains, autoscaler evictions, and direct eviction API calls are guarded. Node crashes, OOM kills, or kernel failures are not. Those rely on normal Raft quorum behavior instead of disruption-budget enforcement.

Scale the cluster deliberately

Use scaling as an intentional operational change, not a quick patch to quiet a temporary issue.

Configure

Increase the replica count

yaml

spec:
replicas: 5

The operator creates the new Pods, waits for them to join the Raft cluster, and updates the PDB to match the new size.

Use maintenance mode for controlled restarts

Enable maintenance mode when your admission policies require a deliberate maintenance signal before managed Pods or the StatefulSet can be restarted, deleted, or otherwise touched during planned work.

Configure

Enable maintenance mode

yaml

spec:
maintenance:
enabled: true

In this mode, the operator annotates managed Pods and the StatefulSet with openbao.org/maintenance=true. By default, maintenance-only bypass is limited to callers in the Kubernetes group system:masters unless you changed the configured break-glass groups at install time.

This mode is also required for some day 2 changes that need a controlled restart path, such as finishing filesystem expansion after increasing spec.storage.size.

Trigger a rolling restart

Use restartAt when you need the workload to roll because an external dependency changed, such as a certificate chain, secret material, or another input that should force a controlled refresh.

Configure

Request a rolling restart

yaml

spec:
maintenance:
restartAt: "2026-01-19T00:00:00Z"

When a leader Pod must be restarted or evicted, the operator handles graceful step-down automatically before termination so the cluster can elect a new leader cleanly.

Verify the cluster before and after the window

Verify

Inspect health before and after maintenance

bash

kubectl get openbaocluster <name> -n <namespace> -o jsonpath='{.status.phase}{"\n"}'
kubectl get pods -n <namespace> -l openbao.org/cluster=<name>
kubectl exec -n <namespace> -it <pod-name> -- bao operator raft list-peers

The important end state is a clean phase, Ready Pods, and a Raft peer set that matches the intended topology after the maintenance action finishes.

External references

Move to the next control

Published release documentation

You are reading docs for version 0.1.0. Use the version menu to switch to next or another archived release.

Was this page helpful?

Use Needs work to open a structured GitHub issue for this page. The Yes button only acknowledges the signal locally.