Cluster Upgrades¶
The Operator supports two powerful upgrade strategies: Rolling Update (default) for efficiency, and Blue/Green for zero-downtime safety.
One-Time Setup¶
To perform upgrades safely, the Operator uses a temporary "Upgrade Executor" job that requires permissions to talk to OpenBao.
Prerequisite: Configure Upgrade Authentication
The Upgrade Executor needs a JWT Auth Role to authenticate with the cluster during upgrades.
1. Configure OpenBao (via selfInit)
spec:
selfInit:
requests:
# Create policy for upgrade operations
- name: create-upgrade-policy
operation: update
path: sys/policies/acl/upgrade
policy:
policy: |
path "sys/health" { capabilities = ["read"] }
path "sys/step-down" { capabilities = ["update"] }
path "sys/storage/raft/snapshot" { capabilities = ["read"] }
# Create JWT role bound to the upgrade ServiceAccount
- name: create-upgrade-jwt-role
operation: update
path: auth/jwt/role/upgrade
data:
role_type: jwt
bound_audiences: ["openbao-internal"]
bound_claims:
kubernetes.io/namespace: openbao
kubernetes.io/serviceaccount/name: upgrade-cluster-upgrade-serviceaccount
token_policies: upgrade
ttl: 1h
JWT audience
The upgrade Job uses the audience from OPENBAO_JWT_AUDIENCE (default: openbao-internal).
Set the same value in the OpenBao role bound_audiences and pass the env var to the operator
(controller.extraEnv and provisioner.extraEnv in Helm).
2. Configure the Operator to use this role
Executing Upgrades¶
To upgrade, simply update the spec.version field. The updateStrategy determines how this change is applied.
Best for: Standard upgrades, Dev/Test environments, Minimizing resource usage.
The Operator updates pods one by one, ensuring the active leader steps down gracefully before termination to maintain availability.
How it works:
1. Validation: Checks if the new version is valid.
2. Snapshot (Optional): Takes a pre-upgrade backup.
3. Rolling Replace: Updates Pod 0 -> Pod 1 -> Pod 2.
4. Leader Handling: If updating the active leader, triggers sys/step-down first.
Best for: Production critical paths, Major version jumps, Instant rollback capability.
The Operator spins up a parallel "Green" cluster, syncs data, validates it, and then switches traffic over atomically.
flowchart TB
Start[Start Upgrade]
subgraph Blue["Blue Revision (Current)"]
B[Active Cluster]
end
subgraph Green["Green Revision (New)"]
direction TB
Deploy[1. Deploy Green Pods]
Sync[2. Sync Data form Blue]
Test[3. Run Verification]
end
Start --> Deploy
Deploy --> Sync
Sync --> Test
Test -- "Success" --> Switch[4. Switch Traffic to Green]
Switch --> Cleanup[5. Delete Blue Cluster]
style Blue fill:transparent,stroke:#2979ff,stroke-width:2px
style Green fill:transparent,stroke:#00e676,stroke-width:2px,color:#fff
style Switch fill:transparent,stroke:#ffa726,stroke-width:2px
Configuration:
Advanced Upgrade Options¶
Verification Hooks¶
Run a custom container to "smoke test" the Green cluster before cutover.
spec:
updateStrategy:
blueGreen:
verification:
prePromotionHook:
image: curlimages/curl
command: ["curl", "-f", "https://green-cluster:8200/v1/sys/health"]
Auto-Rollback¶
If the Green cluster fails validation or upgrade jobs fail during the early upgrade phases, the Operator can automatically roll back.
spec:
updateStrategy:
blueGreen:
autoRollback:
enabled: true
onJobFailure: true
onValidationFailure: true
Gateway API and Blue/Green upgrades¶
When using Gateway API, the Operator creates an HTTPRoute that targets the cluster's main external Service (<cluster>-public). During cutover, the operator updates that Service's selector to point at the Green revision.
spec:
gateway:
enabled: true
hostname: bao.example.com
gatewayRef:
name: main-gateway
updateStrategy:
type: BlueGreen
blueGreen:
autoPromote: true
Monitoring Progress¶
Track the upgrade status directly on the CR: