Recovering From a Sealed Cluster¶
This runbook applies when OpenBao pods are running (Running state) but remain sealed, preventing the application from starting.
Symptoms
kubectl get openbaoclusterreportsSealed=True.- Pods are ready
0/1inkubectl get pods. bao statusshowsSealed: true.
Troubleshooting Flow¶
graph TD
Start(["Start"]) --> CheckStatus{"Sealed?"}
CheckStatus -- No --> Done(["Healthy"])
CheckStatus -- Yes --> IdentifyMode{"Unseal Mode?"}
IdentifyMode -- "Static" --> CheckSecret["Check Secret"]
IdentifyMode -- "Auto-Unseal" --> CheckLogs["Check Logs"]
CheckSecret -- "Missing" --> CreateSecret["Create Secret"]
CheckLogs -- "403/Auth" --> FixIAM["Fix IAM Permissions"]
CheckLogs -- "Timeout" --> FixNet["Fix Network/DNS"]
FixIAM --> ManualTry["Restart Pods"]
FixNet --> ManualTry
ManualTry -- "Still Fails" --> ManualUnseal["Manual Unseal (Emergency)"]
Diagnostics by Mode¶
Identify your unseal mode in the OpenBaoCluster configuration:
In Static mode, the operator assumes a Kubernetes Secret named <cluster-name>-unseal-key contains the key.
Common Failure: The Secret is missing or has the wrong key name.
- Verify Secret Existence:
- Verify Key Format:
The Secret must have a key named
bao-root(or as configured).
Fix: If missing, you must provide the unseal key (e.g., from a backup).
In Auto-Unseal mode, OpenBao connects to a remote KMS (AWS, GCP, Azure, OCI). failures are usually due to Identity or Network.
1. Check OpenBao Logs
Inspect the logs for "failed to unseal" messages.
Common Errors:
| Log Message | Root Cause | Fix |
|---|---|---|
403 Forbidden / AccessDeniedPath |
The IAM Role / ServiceAccount lacks permission to Decrypt. |
Grant kms:Decrypt (AWS) or cloudkms.cryptoKeyVersions.useToDecrypt (GCP) to the role. |
context deadline exceeded |
Network connectivity to the KMS endpoint is blocked. | Check NetworkPolicies (egress), Istio Sidecars, or Firewall rules blocking HTTPS (443). |
Internal (500) |
The Cloud Provider is experiencing an outage. | Check configured Region status. |
Emergency Only
Use this only if automation is permanently broken and you need immediate access.
If the Operator cannot unseal the pods, you can manually unseal them using the bao CLI (if you have the unseal keys/shares).
- Exec into Pod 0:
- Run Unseal:
- Repeat:
You must perform this on every pod in the cluster (
prod-cluster-1,cluster-2...).
Post-Recovery¶
Once unsealed, verify the cluster is initialized and active.
If the cluster unsealed successfully but assumes a Standby role (no active leader), check the No Leader guide.