Rotation
This runbook covers OpenBao Transit key rotation for bao-kms-provider. Transit key rotation and Kubernetes storage migration are separate operations. The plugin observes Transit key versions and exposes a new Kubernetes key_id only after the rotation state machine decides the new version is stable. Operators rewrite Kubernetes resources so old encrypted data is updated.
For the design rationale behind the rotation state machine, including the flip-flop guards and observation thresholds, see Architecture: Rotation Model .
Rotation changes the active Transit version under an existing Transit key. It must not change the provider name, cluster ID, OpenBao instance ID, Transit mount ID, key lineage ID, mount path, or key name. Those fields are identity-bearing and changing them requires a migration plan; see Configuration: Identity-Bearing Fields .
Preview Boundary
Current preview tooling reports local registry state and OpenBao Transit metadata. It does not enumerate Kubernetes resources, inspect etcd, prove that every targeted object was rewritten, or prove that retained backups no longer require old Transit versions.
Treat verify-rotation as a local preflight signal. Rewrite proof,
backup-retention proof, and any recommendation to raise min_decryption_version
remain operator-controlled until a proof-producing command exists.
Before Rotation
Verify:
- OpenBao backup is current.
- etcd backup is current.
- The plugin is healthy on every control-plane node.
- All nodes report the same active
key_idhash. bao-kms-provider doctor --config /etc/openbao-kms/config.yaml --encryption-config /etc/kubernetes/encryption-config.yamlpasses on every control-plane node.- No
identityfallback remains unexpectedly in the API serverEncryptionConfiguration. - OpenBao
min_decryption_versionallows every version still present in etcd and backups.
Record:
- the current Kubernetes
key_idhash, - the current Transit key version,
- the OpenBao backup ID,
- the etcd backup ID,
- the plugin version,
- the control-plane node list.
Rotate The Transit Key
Rotation is performed by an operator with OpenBao administrative rights:
bao write -f transit/keys/k8s-workload-a-etcd/rotate
The plugin token must not have rotate permission. The provisioned policy excludes this capability by design; see Reference: Transit Policy Examples .
Observe Promotion
After rotation:
- The plugin background probe observes the new Transit latest version.
- The plugin waits for
rotation.requireStableObservationCountsuccessful observations. - The plugin waits
rotation.activationDelay. - The plugin promotes a new active key snapshot.
- KMS
Status.key_idchanges. - New encrypt operations use the explicit Transit
key_versionfor the new version.
Watch the rotation state from the CLI:
bao-kms-provider rotation-plan --config /etc/openbao-kms/config.yaml
Watch the metric on each control-plane node:
curl -sf http://127.0.0.1:8081/metrics \
| grep -E 'openbao_kms_status_key_id_hash|openbao_kms_key_version|openbao_kms_rotation_state'
Expected state:
- the old version remains decryptable,
- the new version becomes active once the stability window passes,
- every control-plane node converges to the same
key_idhash, - no node flips back to the old
key_id.
If latest_version jumps over one or more Transit versions, the provider
requires OpenBao to report creation metadata for every skipped version. Complete
metadata lets the provider retain skipped versions as decrypt-only historical
snapshots. Missing intermediate metadata fails closed because another
control-plane node may have already encrypted data under a skipped version.
The rotation metric is intentionally bounded to state="active", state="pending", and state="unknown". Use rotation-plan for the detailed promotion reason and timing.
Migrate Kubernetes Data
Rewrite targeted resources after Status exposes the new key_id. Define the
complete resource list from the API server EncryptionConfiguration before
starting migration, and keep the command output, timestamps, and resource list
for recovery records.
For Secrets:
kubectl get secrets --all-namespaces -o json | kubectl replace -f -
Repeat for each configured resource type. The pattern is kubectl get <resource> --all-namespaces -o json | kubectl replace -f -.
Verify Rotation
bao-kms-provider verify-rotation --config /etc/openbao-kms/config.yaml
This command confirms the provider’s local registry and Transit metadata view. When it succeeds, it still reports limited confidence because it does not scan Kubernetes resources, inspect etcd, or evaluate retained backups.
Then collect independent verification:
- run
bao-kms-provider doctor --config /etc/openbao-kms/config.yaml --encryption-config /etc/kubernetes/encryption-config.yaml, - restart one API server and verify reads succeed,
- verify new writes carry the new
key_id, - verify every configured resource type was included in the rewrite procedure,
- check the provider decrypt-error metrics on every control-plane node,
- check the OpenBao decrypt error rate,
- check API server encryption metrics where available,
- compare
openbao_kms_status_key_id_hashacross all control-plane nodes, - confirm retained backup sets either still have old Transit versions available or no longer need them,
- inspect etcd in a controlled environment if required.
For the metric and log catalog used during these checks see Reference: Observability .
min_decryption_version
Do not raise OpenBao min_decryption_version until:
- every targeted live object has been rewritten,
- old backups have expired or are known not to need the old version,
- restore testing has proved that the remaining backup set can decrypt,
- a human-reviewed change record identifies the exact Transit versions that remain required and the rollback plan.
verify-rotation is not a recommendation engine for this setting. It cannot
prove that old ciphertext no longer exists in Kubernetes, etcd snapshots, or
retained backups.
Raising min_decryption_version too early can make old Kubernetes data unreadable even when the Transit key still exists. Lowering the value may help only when the old key version still exists and policy allows it. Treat this as an emergency recovery step, not a rollback plan.
Rollback
If new encrypt or decrypt behavior fails before migration completes:
- Stop promotion by restoring the previous known-good plugin configuration and version if the rotation state machine has not yet activated the new version.
- Keep old Transit key versions decryptable. Do not raise
min_decryption_version. - Restore the previous plugin version and configuration if the failure is plugin-related.
- Do not delete the new Transit version.
- Do not recreate the Transit key.
- Use
doctor,rotation-plan, and the metric catalog in Reference: Observability to identify the failing layer.
If objects have already been rewritten with the new version, rollback still requires the new Transit version to remain decryptable.
Stop Rotation If
Abort rotation and consult Operations: Troubleshooting when:
- nodes report different active
key_idhashes, - Status flips old to new to old,
- a back-to-back Transit rotation occurs before every node converges,
- unknown
key_iddecrypt errors appear in metrics or logs, - AAD mismatch errors appear,
- OpenBao metadata reads are inconsistent or missing intermediate Transit version creation metadata,
min_decryption_versionwas changed unexpectedly,- any control-plane API server cannot restart cleanly.