Break Glass / Safe Mode¶
Critical State: Automation Halted
The Operator has entered Safe Mode because it detected a high-risk failure (e.g., loss of quorum during an upgrade). All automation is paused to prevent data loss.
Overview¶
Safe Mode (also known as "Break Glass") is a safety mechanism. When the Operator encounters a situation where continuing an automated workflow (like a rolling upgrade or rollback) could compromise data integrity or availability, it stops and waits for human operator intervention.
Common triggers:
- Blue/Green rollback failure (risk of split-brain).
- Quorum loss during critical reconfiguration.
When active:
- Automation Stops: The Operator stops reconciling the specific
OpenBaoCluster. - Status Updates: The
status.breakGlassfield is populated with diagnostic info. - Manual Ack Required: You must explicitly "break the glass" to resume automation.
1. Inspect the Situation¶
Check if your cluster is in Safe Mode by inspecting its status.
Example Output:
{
"active": true,
"reason": "QuorumRisk",
"message": "Detected split-brain potential during rollback. Manual intervention required.",
"nonce": "abc-123-def-456",
"steps": "1. Verify network connectivity. 2. Restore quorum manually. 3. Acknowledge."
}
2. Fix the Underlying Issue¶
Follow the specific guidance provided in the message and steps fields.
- If Quorum is lost: See Recovering from No Leader.
- If Sealed: See Recovering from Sealed Cluster.
- If Network Partitioned: Verify CNI and network policies.
3. Acknowledge and Resume¶
Once you have performed the necessary manual repairs, you must tell the Operator it is safe to proceed. This is done by acknowledging the unique nonce.
Action Required
Copy the nonce from step 1 and use it in the command below.
# Replace 'abc-123-def-456' with your actual nonce
kubectl -n security patch openbaocluster prod-cluster --type merge \
-p '{"spec":{"breakGlassAck":"abc-123-def-456"}}'
If the issue persists, the Operator may re-enter Safe Mode with a new nonce, requiring you to repeat the diagnosis.
Related Runbooks¶
-
Recovery steps when the Raft cluster loses consensus.
-
How to unseal a cluster manually or diagnose auto-unseal failures.
-
Specific steps for handling a failed Blue/Green rollback.