Version: 0.1.0-rc.5

Use safe mode to stop risky automation and recover control.

Break glass or safe mode is the operator's explicit stop signal when continuing rollback automation could make availability or Raft safety worse. Use this page to inspect the break-glass state, stabilize the cluster, repair the failure, and only then let automation resume.

What safe mode means.
Signal	What the operator is doing	Why it matters
Risky automation halted	The operator stops the affected upgrade or rollback workflow instead of pushing forward blindly.	This is the point where a human has to evaluate whether the live cluster is still repairable.
status.breakGlass populated	The cluster status contains the reason, message, nonce, and suggested next checks.	You should diagnose from that status first instead of guessing which internal job failed.
Manual acknowledgment required	Automation stays paused until `spec.breakGlassAck` matches the current nonce.	Acknowledgment is the explicit signal that you have repaired the issue and accept resumed automation.

Inspect the break-glass state

kubectl get openbaocluster <name> -n <namespace> -o jsonpath='{.status.breakGlass}' | jq

{
"active": true,
"reason": "RollbackConsensusRepairFailed",
"message": "Rollback consensus repair Job upgrade-prod-cluster-rollback-retry-1 failed; manual intervention required.",
"nonce": "abc-123-def-456",
"steps": [
  "Inspect rollback Job logs",
  "Inspect pod status",
  "Perform any required Raft recovery steps, then acknowledge the nonce"
]
}

The reason, message, and steps fields are the fastest way to decide whether you are looking at an upgrade rollback problem, a Raft recovery problem, or a broader cluster-health issue.

Repair the underlying issue before you acknowledge

Start with the operator-visible status and the last failed job, then move into the narrower runbook that matches the cluster state.

kubectl get openbaocluster <name> -n <namespace> \
-o jsonpath='{range .status.conditions[*]}{.type}={.status} {.reason}{"\n"}{end}'
kubectl get openbaocluster <name> -n <namespace> \
-o jsonpath='{.status.blueGreen.lastJobFailure}{"\n"}'
kubectl logs -n <namespace> job/<job-from-status>
kubectl get pods -n <namespace> -l openbao.org/cluster=<name> -o wide
kubectl exec -n <namespace> -it <pod-name> -- bao operator raft list-peers

These commands tell you whether the failure is still centered on the rollback job, whether the Pods are actually healthy, and whether Raft membership is still coherent enough for a safe retry.

Use maintenance mode for controlled manual repair

If you need to restart or delete managed Pods while admission policies require the openbao.org/maintenance=true signal, enable maintenance mode first and follow Run Planned Maintenance.

If the cluster needs a deeper incident path, move directly into the matching runbook instead of staying in generic safe mode:

Acknowledge and resume automation

Only acknowledge the nonce after the cluster is healthy enough for the operator to continue the paused workflow.

kubectl patch openbaocluster <name> -n <namespace> --type merge -p '{
"spec": {
  "breakGlassAck": "<NONCE_FROM_STATUS>"
}
}'

If the operator re-enters break glass later, it will issue a new nonce. Always use the current value from status.breakGlass.nonce, not a previously copied one.

Go deeper

Recover from failed rollbackUse the rollback-specific runbook when the break-glass reason is tied to blue-green rollback repair.Recover from no leaderSwitch here when the cluster cannot elect or keep a leader after the rollback failure.Recover a sealed clusterUse the seal-focused path when Pods are running but trust or unseal dependencies still block service.

Prerelease documentation

This version tracks a prerelease build. Features and behavior may change before the next stable release.

Inspect the break-glass state​

Repair the underlying issue before you acknowledge​

Acknowledge and resume automation​

Inspect the break-glass state

Repair the underlying issue before you acknowledge

Acknowledge and resume automation