Upgrade
This runbook covers upgrading bao-kms-provider across a control-plane fleet. Provider upgrades are operational events on the API server boot path, so they follow a one-node-at-a-time pattern with explicit verification between nodes.
For wire-format compatibility expectations and the upgrade-window history, see Reference: Release Policy and Reference: Compatibility .
Before You Start
Verify:
- the new binary or image is fetched and verified per Install: Verify Release Artifacts ,
- the cluster is not mid-rotation (check
bao-kms-provider rotation-plan --config /etc/openbao-kms/config.yaml), - OpenBao is healthy and the configured auth credentials on every node are valid,
- the existing plugin reports a stable
key_idhash on every control-plane node, bao-kms-provider doctor --config /etc/openbao-kms/config.yaml --encryption-config /etc/kubernetes/encryption-config.yamlpasses,- the previous binary or image is still available on every node in case of rollback.
Record:
- current plugin version,
- current Kubernetes
key_idhash, - current Transit key version,
- wire-format expectations from the new release notes.
Upgrade Procedure
Upgrade one control-plane node at a time. Do not upgrade all plugin instances simultaneously unless the cluster is in a controlled maintenance window and OpenBao and API server recovery has been tested.
For each node:
Run
bao-kms-provider doctor --config /etc/openbao-kms/config.yaml --encryption-config /etc/kubernetes/encryption-config.yamlwith the new binary or image. Resolve any failed checks before continuing.Stop the plugin on the target node.
Replace the binary or image with the new version.
Start the plugin.
Verify KMS Status and the active
key_idhash:curl -sf http://127.0.0.1:8081/metrics \ | grep -E 'openbao_kms_status_key_id_hash|openbao_kms_grpc_requests_total'Confirm
/readyreturns HTTP 200 on the target node.Confirm the
key_idhash matches the other control-plane nodes.Restart the local API server only if required by the new release notes.
Repeat for the next node.
After every node has been upgraded, confirm the metric openbao_kms_status_key_id_hash reports the same hash on every node.
Rollback
Rollback is safe only when the older version understands every active key_id, annotation, and AAD format currently present in etcd. Mixed wire formats during rollback can produce unknown-key_id decrypt errors.
Before rolling back:
- verify the older version supports the current
key_idand AAD formats by reviewing release notes for wire-format changes, - confirm the older version can decrypt the current set of objects on a non-production probe if possible,
- keep the new binary or image available until rollback is fully verified.
For each node:
- Stop the plugin on the target node.
- Replace the binary or image with the previous version.
- Run
bao-kms-provider doctor --config /etc/openbao-kms/config.yaml --encryption-config /etc/kubernetes/encryption-config.yamlwith the previous version. - Start the plugin.
- Verify Status and the
key_idhash match the rest of the fleet. - Restart the API server only if required.
If rollback produces unknown-key_id errors, return to the newer version and investigate per Operations: Troubleshooting
.
For systemd deployments, rollback means replacing the host binary or package and restarting bao-kms-provider.service. For static-pod deployments, rollback means restoring the previous image digest in the manifest and verifying that image is already present or pullable on the node.
When Not To Upgrade
Defer the upgrade if:
- a Transit key rotation is in progress or pending stability,
- OpenBao is in an HA failover or restore window,
- the new release notes call out wire-format changes and storage migration has not been planned,
- backup and restore drills have not been completed for the current release.
When Not To Roll Back
A rollback is unsafe and must not be attempted if:
- the new version introduced a new wire format and ciphertext is already encrypted under it,
- the older version cannot interpret current annotations or AAD shape,
- a Transit key rotation completed under the new version and the older version did not promote the new active snapshot,
- release notes call out a one-way upgrade.
Apply a patch to the new version in these cases. Do not attempt rollback.