Version: 0.1.0-rc.5

Run Raft snapshots as stateless jobs and keep retention out of the data plane.

The backup manager owns scheduled and manual snapshot orchestration for OpenBaoCluster. It validates cluster readiness, acquires the operation lock, creates executor Jobs, and records backup state so backups stay auditable and resumable without embedding snapshot transport inside the controller.

At a glance

Control path

adminops reconciler
internal/app/openbaocluster/adminops
internal/service/backup

Owns

backup trigger detection for schedules and manual requests
preflight validation and operation-lock ownership for backup
retention evaluation after successful uploads

Writes

backup executor Jobs and job annotations
status.backup timing, success, and failure counters
operation lock state while backup is in progress

Depends on

cluster health and absence of conflicting upgrade or restore work
spec.backup target, authentication, and executor image configuration
object-storage reachability and trust configuration for the selected provider

Architectural Placement

Backup orchestration belongs to the AdminOps path:

internal/controller/openbaocluster receives an adminops reconcile event.
The controller delegates into internal/app/openbaocluster/adminops.
AdminOps orchestration invokes internal/service/backup to validate, launch, and observe backup execution.

That keeps the controller focused on reconcile plumbing while the backup manager owns timing, job launch, and retention decisions.

Owned surfaces.
Surface	What the manager decides	Why it matters
Backup trigger window	Whether a cron window, manual trigger, or pre-upgrade request should launch a new Job.	Backups need at-most-once behavior per scheduled window and predictable manual overrides.
Executor Job	Job name, annotations, auth wiring, and provider-specific environment for the backup binary.	The controller should schedule work, not stream snapshot data itself.
status.backup	Attempt timing, next schedule, last success, and consecutive failure state.	Operators need backup visibility without inspecting transient Jobs.
Retention policy	Which completed backups can be deleted after a successful upload.	Retention belongs to the control plane so cleanup stays consistent across providers.

Backup Flow

Preflight and status model.
Check	Manager behavior
Cluster readiness	Backup launches only when the cluster is in a stable running phase and the workload is not already mid-transition.
Conflicting operations	Restore and active upgrade state block backup launch; only one long-running operation may own the cluster lock at a time.
At-most-once scheduling	status.backup.lastAttemptScheduledTime and nextScheduledBackup prevent duplicate launches in the same cron window.
Failure accounting	Consecutive failures increase only when a terminal Job fails, not on every reconcile that notices the same failed Job.

Provider And Retention Surfaces

Provider integration surfaces.
Provider family	Auth patterns the manager supports	What stays the same
S3-compatible	Static access keys, explicit web identity, ambient workload identity, or ServiceAccount annotation-driven identity.	The manager still creates one executor Job and records status the same way after upload completes.
GCS	Service account key, Application Default Credentials, or Workload Identity metadata on the generated pod identity.	Upload and retention stay job-driven; only the credential wiring changes.
Azure Blob Storage	Account key, connection string, or managed identity/workload identity defaults.	Retention and backup naming stay provider-agnostic at the manager boundary.

Backups are stored under a stable object prefix so restore workflows can locate artifacts without reverse-engineering Job names:

<pathPrefix>/<namespace>/<cluster>/<timestamp>-<short-uuid>.snap

Safety boundaries.
Concern	Manager behavior
No data-plane coupling	The controller never handles snapshot bytes directly; the executor Job performs authentication, snapshot, and upload work.
Retention timing	Retention runs only after a successful upload so cleanup never removes older recovery points before a new one exists.
Upgrade coordination	Pre-upgrade snapshots reuse backup job machinery rather than creating a second snapshot implementation in the upgrade manager.
Local buffering risk	The backup path is designed around streaming to object storage rather than writing large transient snapshot files inside the controller.

Related deep dives

Restore managerRestore consumes the snapshot contract that backup writes and protects with lock ownership.Upgrade managerPre-upgrade snapshots depend on the same backup execution surface instead of a separate snapshot implementation.Backups guideCompare the internal backup orchestration model with the user-facing schedule, provider, and restore instructions.

Prerelease documentation

This version tracks a prerelease build. Features and behavior may change before the next stable release.

Architectural Placement​

Backup Flow​

Provider And Retention Surfaces​

Architectural Placement

Backup Flow

Provider And Retention Surfaces