Testing
A KMS plugin failure can prevent the API server from starting or make encrypted Kubernetes resources unreadable. The test strategy therefore prioritizes negative paths, rotation, and recovery over a single happy-path encrypt and decrypt check.
This page explains what the test system must prove. For runnable E2E lanes and Make targets, see E2E Framework . For CI lanes and release evidence requirements, see CI And Supply Chain . For captured load and cold-start evidence, see Performance Evidence .
Test Priorities
| Priority | What must be proven |
|---|---|
| KMS v2 protocol correctness | Kubernetes accepts Status, Encrypt, and Decrypt responses and the provider preserves the Status.key_id == EncryptResponse.key_id invariant. |
key_id stability | Rotation, rollback rejection, and decrypt lookup depend on deterministic, non-reused key_id values. |
| Decrypt compatibility | Data written before restart, upgrade, rollback, and Transit rotation remains readable. |
| Fail-closed behavior | OpenBao, auth material, socket, policy, and Transit key failures do not lead to plaintext exposure or unsafe fallback. |
| API server startup | The provider is available early enough for kube-apiserver restart and recovery paths. |
| Deployment behavior | systemd and static-pod modes have different boot, socket, filesystem, and rollback failure modes. |
| Disaster recovery | OpenBao, provider state, and etcd restore procedures preserve decryptability only when the correct backup pair is restored. |
| Observability and redaction | Logs, metrics, reports, and diagnostics never expose plaintext, JWTs, OpenBao tokens, full ciphertext, or raw Transit key material. |
Test Layers
| Layer | Purpose | Canonical location |
|---|---|---|
| Unit and golden tests | Validate key registry decisions, AAD construction, config validation, socket handling, logging redaction, and stable wire-format fixtures. | go test ./..., golden fixtures under testdata/ |
| KMS v2 conformance | Start the real Unix-socket server path with fake OpenBao behavior and exercise Kubernetes KMS v2 protobuf requests. | fast local and PR checks |
| OpenBao client integration | Verify Transit, auth login, TLS, policy diagnostics, and OpenBao error handling without external credentials. | hermetic integration tests and OpenBao CI E2E |
| Operator CLI E2E | Run provider image CLI diagnostics against real OpenBao/config/state and assert redacted hardening failures. | provider CLI E2E |
| Kubernetes API server E2E | Prove real API server encryption, raw etcd envelope storage, restart readback, and multi-control-plane convergence. | Kind lanes and local kubeadm VM validation |
| Rotation and compatibility | Prove Transit version promotion, old ciphertext readback, new ciphertext write path, historical decryptability guards, missing-state fail-closed behavior, and rollback rejection. | OpenBao rotation E2E |
| Failure injection | Exercise OpenBao outage, sealed state, failover, bad policy, expired or identity-drifted auth material, missing Transit key, stale socket, and startup failures. | provider failure and HA E2E lanes |
| Performance and load | Bound Status, Encrypt, Decrypt, direct decrypt soak, startup decrypt, and resource growth behavior. | provider load lanes and performance evidence |
| Security and supply chain | Run redaction checks, fuzz targets, static analysis, vulnerability scan, license check, SBOM, and vendor verification. | make ci-core, security CI, release workflow |
| Disaster recovery | Validate OpenBao raft restore, provider state rehydration, etcd restore pairing, and Kubernetes readback after replacement. | Kind DR, OpenBao restore, and local VM validation |
Negative Path Bias
Encrypt and decrypt working once is not enough. The test suite must prove that the provider fails safely when:
- OpenBao is down, sealed, unavailable during failover, or restored from an old backend,
- the JWT expires, rotates, has the wrong claims, or cannot be read,
- the certificate is expired, identity-drifted, unavailable from PKCS#11 or SPIFFE, or cannot be validated,
- the Transit key is missing, soft-deleted, recreated under the same name, or has unsafe version bounds,
key_idvalues are unknown, malformed, rolled back, or associated with changed identity scope,- AAD annotations are missing, modified, or from another provider or cluster,
- the Unix socket is stale, has the wrong ownership, or points at an unsafe filesystem object,
- systemd, kubelet, static pods, images, and host paths fail during API server restart or node recovery.
Every negative test should assert both the returned error behavior and the absence of sensitive values in logs, metrics, and artifacts.
Performance Model
Performance targets reflect Kubernetes behavior and the provider’s role in API server startup:
| Path | Initial target | Reason |
|---|---|---|
Status | p99 under 5 ms and no OpenBao call | kube-apiserver polls Status frequently. |
Encrypt | p95 under 100 ms, p99 under 250 ms | writes call OpenBao Transit and must tolerate network reality. |
Decrypt | p95 under 10 ms, p99 under 50 ms | API server startup can perform many decrypts. |
| Startup storm | bounded memory and goroutines | recovery paths must not deadlock or leak resources. |
These are test targets, not published service-level guarantees. The release evidence can adjust thresholds when OpenBao or network behavior justifies it, but the trade-off must remain explicit.
Decrypt micro-batching is not implemented in the provider runtime. The current direct-path evidence did not show provider or OpenBao decrypt fan-out proportional to Kubernetes object count. A production coalescer should only move into the release scope if sustained direct decrypt soak or local kubeadm VM cold-start evidence shows a release-blocking need.
Release Evidence
Release evidence is assembled from the test layers above and the supply-chain controls documented in CI And Supply Chain . In summary:
- every PR should prove deterministic logic, conformance, redaction, formatting, static analysis, vendor integrity, and fast parser fuzz smoke;
- main and nightly lanes should add OpenBao, Kind, rotation, failure injection, restore, load, and supply-chain checks;
- release candidates should add exact-pinned version matrices, local kubeadm VM validation, recovery evidence, and release artifact evidence.
The exact Kubernetes patch versions, Kind node images, OpenBao image, and tool
versions live in .ci/versions.yaml. Kubernetes 1.36 is tracked as the
intended next validation line until a digest-pinned Kind node image exists.
Additional Kubernetes or OpenBao versions remain candidates until exact-pinned
lanes and release evidence exist. See Reference: Compatibility
for the support boundary.
Local Fuzz Campaigns
make ci-core runs short fuzz smoke campaigns with FUZZTIME=10s by default.
To spend more time on the curated parser and preflight targets without changing
CI defaults, run:
FUZZTIME=1m make fuzz
For a single target, use Go’s native fuzz command, for example:
go test ./internal/keyregistry -run '^$' -fuzz '^FuzzStateFileDecode$' -fuzztime=5m
go test ./internal/aad -run '^$' -fuzz '^FuzzPrepareDecrypt$' -fuzztime=5m