Day N: Backups & Disaster Recovery¶
Day N operations ensure data durability through regular backups and disaster recovery procedures.
Backups¶
- User configures backup schedule (
spec.backup.schedule) and target object storage in theOpenBaoClusterspec. Supported providers: - S3: AWS S3 or S3-compatible storage (MinIO, Ceph, etc.)
- GCS: Google Cloud Storage
- Azure: Azure Blob Storage
- User configures authentication method:
- JWT Auth (Preferred): Set
spec.backup.jwtAuthRoleand configure the role in OpenBao - Static Token (Fallback): For all clusters, set
spec.backup.tokenSecretRefpointing to a backup token Secret (root tokens are not used) - Backup Manager (adminops controller) schedules backups using cron expressions (e.g.,
"0 3 * * *"for daily at 3 AM). - On schedule, Backup Manager:
- Creates a Kubernetes Job with the backup executor container
- Job uses
<cluster-name>-backup-serviceaccount(automatically created by operator) - Backup executor:
- Authenticates to OpenBao using JWT Auth (via projected ServiceAccount token) or static token
- Discovers the current Raft leader via OpenBao API
- Streams
GET /v1/sys/storage/raft/snapshotdirectly to object storage (no disk buffering) - Names backups predictably:
<prefix>/<namespace>/<cluster>/<timestamp>-<uuid>.snap - Verifies upload completion
- Backup status is recorded in
Status.Backup: LastBackupTime,NextScheduledBackupfor visibilityConsecutiveFailuresfor alerting- Optional retention policies (
spec.backup.retention) automatically delete old backups: MaxCount: Keep only the N most recent backupsMaxAge: Delete backups older than a specified duration
Backup Limitations
Backups are skipped during upgrades to avoid inconsistent snapshots. Backups are optional for all clusters. If backups are enabled, either jwtAuthRole or tokenSecretRef must be configured. Root tokens are not used for backup operations.
Sequence Diagram¶
sequenceDiagram
autonumber
participant U as User
participant K as Kubernetes API
participant Op as OpenBao Operator
participant Job as Backup Job Pod
participant Bao as OpenBao API
participant Storage as Object Storage
U->>K: Configure backup schedule and target in OpenBaoCluster
K-->>Op: Watch OpenBaoCluster (backup spec)
Op->>Op: Schedule backup via cron
Op->>K: Create Job/<cluster>-backup
K-->>Job: Start backup executor Pod
Job->>Bao: Authenticate (JWT or token)
Job->>Bao: GET /v1/sys/storage/raft/snapshot
Job->>Storage: Stream snapshot to object storage
Job-->>Op: Exit status (success/failure)
Op->>K: Update OpenBaoCluster.status.backup (last backup, failures)
Op->>Storage: Apply retention policies (via backup manager, if configured)