Version: 0.4.x

Production readiness checklist

Use this checklist after the first cluster path succeeds and before teams depend on the service. It covers security posture, data protection, observability, and the operator-reported signals that should be clean before production use.

Production gates.
Gate	What must be true	Why it matters	Go deeper
Security posture	The cluster runs the hardened path: secure profile, external seal, deliberate TLS, and self-init with real auth methods.	The defaults that help evaluation can become long-lived risk in production.	Security profiles, self-init, and workload TLS configuration.
Durability	Storage, replica count, and scheduled backups are all deliberate and already tested.	Upgrades, restore workflows, and voter recovery all assume the data path is stable.	Backups, storage, and topology spread.
Observability	Metrics, logs, and alerts reach the systems operators actually watch.	Incidents are slower and riskier when the first debugging session starts after go-live.	Observability and network egress configuration.
Cluster readiness	The status conditions show healthy convergence and no unresolved integration blockers.	Start a production launch from a stable status surface rather than an optimistic assumption.	Use the final verification commands on this page.

Lock down the security baseline

Set spec.profile: Hardened so the workload starts from the strict controller posture rather than the evaluation defaults.
Use a non-static external seal such as Transit, cloud KMS, ocikms, kmip, or pkcs11. Do not keep long-lived unseal keys in Kubernetes Secrets for the production path.
Confirm your Kubernetes cluster already encrypts Secrets at rest. The operator cannot compensate for an unencrypted control plane.
Use ACME or External TLS for public or shared edges. Avoid OperatorManaged certificates for public-facing production entry points.
Enable spec.selfInit and configure real user authentication in spec.selfInit.requests so the first operator-driven bootstrap does not end in a lockout.
For auto-unseal production clusters, declare spec.recoveryKeys.initial before first reconcile and complete the external recovery-key custody ceremony after bootstrap.
If you rely on operator lifecycle auth for backups and upgrades, enable spec.selfInit.oidc.enabled: true or deliberately provision the equivalent JWT roles yourself.

Install success is not the production gate

A cluster that initializes successfully still needs security hardening, backup readiness, and clean status conditions before it is ready for production.

Enforce the tenant guardrails

Verify the ValidatingAdmissionPolicies and related guardrails are installed and enforced, including:
- openbao-validate-openbaocluster
- openbao-validate-openbao-tenant
- openbao-validate-openbaorestore
- openbao-lock-controller-statefulset-mutations
- openbao-lock-managed-resource-mutations
- openbao-enforce-managed-image-digests
- openbao-restrict-provisioner-rbac
- openbao-restrict-provisioner-namespace-mutations
- openbao-restrict-provisioner-tenant-governance
- openbao-restrict-controller-rbac
- openbao-restrict-controller-secret-writes
Confirm that the operator namespace, tenant onboarding flow, and shared-controller trust boundaries match the tenancy model you chose during Get Started.
Bind delegated CR verbs only to the human or GitOps identities trusted for that authority:
- usecustomexecutables for custom helper images, hooks, plugin executables, backup images, upgrade images, and restore images.
- useimagetrustroots for custom Hardened image-verification trust roots.
- usecloudidentities for cloud workload identity metadata on main, backup, or restore workloads.
- restore for identities trusted to run destructive restores against a target OpenBaoCluster.
Grant use or get on referenced Secrets, PVCs, ServiceAccounts, Gateways, IngressClasses, StorageClasses, and monitoring TLS objects before handing the manifest to GitOps.

kubectl get validatingadmissionpolicy | grep openbao
kubectl get deploy -n <operator-namespace>
kubectl get openbaotenant -A

The exact number of policies and controller Deployments depends on the features you enabled, but the OpenBao guardrail set must be visible before you bring real tenants onto the platform.

Make the cluster durable

Set explicit CPU and memory requests and limits. A cluster that only works under zero pressure is not production-ready.
Choose a low-latency StorageClass and set spec.storage.storageClassName explicitly for new clusters. The effective storage class is not something you want to discover by accident after PVC creation.
Use at least three replicas for a highly available Raft cluster and verify the Kubernetes nodes span the intended zones or failure domains.
Configure scheduled backups and test a restore path before the first risky upgrade.
If you use file audit devices, configure spec.auditFileStorage on encrypted RWX storage and verify the mounted path is writable by the OpenBao runtime identity.
Confirm spec.network.egressRules allow the cluster to reach the services it really depends on: cloud KMS, OIDC discovery, backup storage, and any external gateway edges.

Backups are part of the production gate

Include backup success and restore confidence in the launch checklist rather than deferring them to later follow-up work.

Prove observability and operational response

Configure metrics scraping through Prometheus Operator (ServiceMonitor) or VictoriaMetrics Operator (VMServiceScrape).
Grant the scraping identity permission to read sys/metrics and keep TLS verification strict in production.
Ship OpenBao audit files from the audit storage PVC into the external log system or immutable archive that owns retention.
Make sure structured logs including cluster_name and cluster_namespace reach the log system your operators actually use.
Alert on backup staleness, degradation, reconciliation failures, and other conditions that warrant human intervention before tenants feel the failure.

Verify the cluster before routing traffic

kubectl describe openbaocluster <name> -n <namespace>
kubectl get openbaocluster <name> -n <namespace> -o jsonpath='{.status.phase}{"\n"}{range .status.conditions[*]}{.type}={.status}{"\n"}{end}'

Run both commands from the target namespace so you can see the reconciler status, recent events, and the final condition set in one pass.

Signals to see before go-live.
Signal	Healthy state	Why it is important
Phase	Running	The cluster has converged past bootstrap and is not stuck in an intermediate lifecycle state.
Available	True	The workload is up and the operator believes the service is available to consumers.
ProductionReady	True	This is the clearest signal that the cluster passed the production-readiness gate.
Integration-specific conditions	Healthy for the features you enabled, such as CloudUnsealIdentityReady, GatewayIntegrationReady, APIServerNetworkReady, AuditFileStorageReady, or BackupConfigurationReady.	These conditions expose dependency problems that may not show up as plain pod readiness failures.

Continue operating

Configure backupsLock in the snapshot path as part of upgrade and maintenance planning.Plan upgradesChoose the right rollout strategy for the next production version change.Troubleshoot the clusterUse the incident routing guide when health, TLS, gateway, or network assumptions drift after launch.

Published release documentation

You are reading docs for version 0.4.x. Use the version menu to switch to next or another archived release.

Lock down the security baseline​

Enforce the tenant guardrails​

Make the cluster durable​

Prove observability and operational response​

Verify the cluster before routing traffic​

Lock down the security baseline

Enforce the tenant guardrails

Make the cluster durable

Prove observability and operational response

Verify the cluster before routing traffic