Testing

A KMS plugin failure can prevent the API server from starting or make encrypted Kubernetes resources unreadable. The test strategy therefore prioritizes negative paths, rotation, and recovery over a single happy-path encrypt and decrypt check.

This page explains what the test system must prove. For runnable E2E lanes and Make targets, see E2E Framework . For CI lanes and release evidence requirements, see CI And Supply Chain . For captured load and cold-start evidence, see Performance Evidence .

Test Priorities

PriorityWhat must be proven
KMS v2 protocol correctnessKubernetes accepts Status, Encrypt, and Decrypt responses and the provider preserves the Status.key_id == EncryptResponse.key_id invariant.
key_id stabilityRotation, rollback rejection, and decrypt lookup depend on deterministic, non-reused key_id values.
Decrypt compatibilityData written before restart, upgrade, rollback, and Transit rotation remains readable.
Fail-closed behaviorOpenBao, auth material, socket, policy, and Transit key failures do not lead to plaintext exposure or unsafe fallback.
API server startupThe provider is available early enough for kube-apiserver restart and recovery paths.
Deployment behaviorsystemd and static-pod modes have different boot, socket, filesystem, and rollback failure modes.
Disaster recoveryOpenBao, provider state, and etcd restore procedures preserve decryptability only when the correct backup pair is restored.
Observability and redactionLogs, metrics, reports, and diagnostics never expose plaintext, JWTs, OpenBao tokens, full ciphertext, or raw Transit key material.

Test Layers

LayerPurposeCanonical location
Unit and golden testsValidate key registry decisions, AAD construction, config validation, socket handling, logging redaction, and stable wire-format fixtures.go test ./..., golden fixtures under testdata/
KMS v2 conformanceStart the real Unix-socket server path with fake OpenBao behavior and exercise Kubernetes KMS v2 protobuf requests.fast local and PR checks
OpenBao client integrationVerify Transit, auth login, TLS, policy diagnostics, and OpenBao error handling without external credentials.hermetic integration tests and OpenBao CI E2E
Operator CLI E2ERun provider image CLI diagnostics against real OpenBao/config/state and assert redacted hardening failures.provider CLI E2E
Kubernetes API server E2EProve real API server encryption, raw etcd envelope storage, restart readback, and multi-control-plane convergence.Kind lanes and local kubeadm VM validation
Rotation and compatibilityProve Transit version promotion, old ciphertext readback, new ciphertext write path, historical decryptability guards, missing-state fail-closed behavior, and rollback rejection.OpenBao rotation E2E
Failure injectionExercise OpenBao outage, sealed state, failover, bad policy, expired or identity-drifted auth material, missing Transit key, stale socket, and startup failures.provider failure and HA E2E lanes
Performance and loadBound Status, Encrypt, Decrypt, direct decrypt soak, startup decrypt, and resource growth behavior.provider load lanes and performance evidence
Security and supply chainRun redaction checks, fuzz targets, static analysis, vulnerability scan, license check, SBOM, and vendor verification.make ci-core, security CI, release workflow
Disaster recoveryValidate OpenBao raft restore, provider state rehydration, etcd restore pairing, and Kubernetes readback after replacement.Kind DR, OpenBao restore, and local VM validation

Negative Path Bias

Encrypt and decrypt working once is not enough. The test suite must prove that the provider fails safely when:

  • OpenBao is down, sealed, unavailable during failover, or restored from an old backend,
  • the JWT expires, rotates, has the wrong claims, or cannot be read,
  • the certificate is expired, identity-drifted, unavailable from PKCS#11 or SPIFFE, or cannot be validated,
  • the Transit key is missing, soft-deleted, recreated under the same name, or has unsafe version bounds,
  • key_id values are unknown, malformed, rolled back, or associated with changed identity scope,
  • AAD annotations are missing, modified, or from another provider or cluster,
  • the Unix socket is stale, has the wrong ownership, or points at an unsafe filesystem object,
  • systemd, kubelet, static pods, images, and host paths fail during API server restart or node recovery.

Every negative test should assert both the returned error behavior and the absence of sensitive values in logs, metrics, and artifacts.

Performance Model

Performance targets reflect Kubernetes behavior and the provider’s role in API server startup:

PathInitial targetReason
Statusp99 under 5 ms and no OpenBao callkube-apiserver polls Status frequently.
Encryptp95 under 100 ms, p99 under 250 mswrites call OpenBao Transit and must tolerate network reality.
Decryptp95 under 10 ms, p99 under 50 msAPI server startup can perform many decrypts.
Startup stormbounded memory and goroutinesrecovery paths must not deadlock or leak resources.

These are test targets, not published service-level guarantees. The release evidence can adjust thresholds when OpenBao or network behavior justifies it, but the trade-off must remain explicit.

Decrypt micro-batching is not implemented in the provider runtime. The current direct-path evidence did not show provider or OpenBao decrypt fan-out proportional to Kubernetes object count. A production coalescer should only move into the release scope if sustained direct decrypt soak or local kubeadm VM cold-start evidence shows a release-blocking need.

Release Evidence

Release evidence is assembled from the test layers above and the supply-chain controls documented in CI And Supply Chain . In summary:

  • every PR should prove deterministic logic, conformance, redaction, formatting, static analysis, vendor integrity, and fast parser fuzz smoke;
  • main and nightly lanes should add OpenBao, Kind, rotation, failure injection, restore, load, and supply-chain checks;
  • release candidates should add exact-pinned version matrices, local kubeadm VM validation, recovery evidence, and release artifact evidence.

The exact Kubernetes patch versions, Kind node images, OpenBao image, and tool versions live in .ci/versions.yaml. Kubernetes 1.36 is tracked as the intended next validation line until a digest-pinned Kind node image exists. Additional Kubernetes or OpenBao versions remain candidates until exact-pinned lanes and release evidence exist. See Reference: Compatibility for the support boundary.

Local Fuzz Campaigns

make ci-core runs short fuzz smoke campaigns with FUZZTIME=10s by default. To spend more time on the curated parser and preflight targets without changing CI defaults, run:

FUZZTIME=1m make fuzz

For a single target, use Go’s native fuzz command, for example:

go test ./internal/keyregistry -run '^$' -fuzz '^FuzzStateFileDecode$' -fuzztime=5m
go test ./internal/aad -run '^$' -fuzz '^FuzzPrepareDecrypt$' -fuzztime=5m