Initialize one node first, then scale only after the cluster is safe to join.
The init manager owns the first-boot contract for a new OpenBaoCluster. It keeps bootstrap on a single node, handles operator-driven or self-init flows, stores or suppresses root material appropriately, and configures Raft autopilot before the workload expands to full replica count.
At a glance
Control path
- workload reconciler
- internal/controller/openbaocluster
- internal/service/init
Owns
- bootstrap detection and first init call behavior
- root token handling or self-init completion detection
- initial autopilot configuration immediately after cluster initialization
Writes
- status.initialized and status.selfInitialized
- root token Secret when self-init is disabled
- autopilot configuration through the OpenBao API once initialization completes
Depends on
- single-replica bootstrap from the infrastructure path
- pod readiness and TLS Secret availability before init proceeds
- self-init requests and auth bootstrap configuration when self-init is enabled
Architectural Placement
Initialization stays on the workload-side controller path while the cluster is not yet ready for normal steady-state reconciliation:
internal/controller/openbaoclusterkeeps the cluster on the uninitialized path.- The controller calls
internal/service/initonce the first pod and TLS prerequisites are ready. - The init manager marks initialization state, configures autopilot, and only then allows the infrastructure path to scale to the requested replica count.
That separation prevents first-boot logic from leaking into every steady-state reconcile.
Bootstrap Flow
Diagram
Initialize, then scale
The infrastructure path holds the workload at one replica until the init manager confirms the cluster is initialized. Only then does the cluster expand to the requested replica count.
Initialization Phases
- Bootstrap one node
- Initialize safely
- Scale after success
Bootstrap contract
- A new cluster starts at one replica even when spec.replicas is greater than one.
- The infrastructure manager keeps the StatefulSet capped until status.initialized becomes true.
- This avoids race conditions where multiple uninitialized pods could compete to become the first Raft leader.
Initialization contract
- The manager first checks for an already initialized cluster and skips the init call when status or health proves bootstrap already happened.
- When self-init is disabled, it performs the init call, captures the root material once, and stores the root token in a Secret without logging the response.
- When self-init is enabled, it treats pod readiness and initialization signals as the completion boundary and does not create a root-token Secret.
Scale-out contract
- The manager sets status.initialized after the cluster is known-good and, when relevant, also sets status.selfInitialized.
- Autopilot defaults are configured immediately after initialization so day-2 health policy exists before the cluster grows.
- Only after that handoff does the workload path expand the StatefulSet and let additional pods join through retry_join.
Autopilot Defaults
Reference table
Autopilot configuration defaults
| Setting | Default behavior | Why the init manager sets it early |
|---|---|---|
| cleanup_dead_servers | Enabled by default, but forced off when minQuorum < 3 and the user did not explicitly override it. | The rendered policy must remain valid for small clusters before steady-state operations begin. |
| dead_server_last_contact_threshold | 5m | Dead-peer cleanup should wait long enough to avoid reacting to short network turbulence. |
| last_contact_threshold | 10s | Autopilot needs a consistent heartbeat tolerance before higher replica counts join. |
| server_stabilization_time | 10s | New members should remain stable briefly before being treated as healthy participants. |
| max_trailing_logs | 1000 | Replication lag needs a default budget before dead-server or readiness logic starts treating peers as unhealthy. |
| min_quorum | Hardened profile defaults to 3, or replicas when replicas > 3; other profiles use max(1, replicas). | The cleanup policy and quorum safety model must be aligned from the first initialized reconcile. |
If the manager detects that a cluster is already initialized, it takes the initialized-cluster path as recovery for an operator-managed cluster. It is not a generic import path for arbitrary unmanaged OpenBao clusters.
Reference table
Safety boundaries
| Concern | Manager behavior |
|---|---|
| Split-brain at bootstrap | Single-pod bootstrap stays in force until initialization is confirmed, so the first Raft leader forms in a controlled way. |
| Root material handling | The init response is used in-memory only for the current request and is not logged; self-init intentionally avoids creating a root-token Secret. |
| TLS readiness | Initialization waits for the TLS server Secret when TLS is managed by the operator so the API path is not used before the workload is ready. |
| Invalid autopilot cleanup | The manager forces cleanupDeadServers off for small-cluster configurations that OpenBao would reject. |
Related deep dives
You are reading docs for version 0.1.0. Use the version menu to switch to next or another archived release.
Was this page helpful?
Use Needs work to open a structured GitHub issue for this page. The Yes button only acknowledges the signal locally.