OpenBao observability model

Use this explainer to understand how the reference architecture turns OpenBao metrics, logs, audit logs, and platform state into dashboards, alerts, and runbooks. It is for operators who need to reason about what each signal can prove before they depend on it.

Why this matters

OpenBao observability is not one dashboard or one scrape job. You operate a security-critical service by combining source signals, derived signals, and response guidance.

Each signal answers a different class of question. Metrics show health, rate, latency, saturation, and trends. Operational logs show server behavior. Audit logs show API activity that passed through the audit system. Platform signals show whether the runtime environment is healthy enough for OpenBao to operate.

Mental model

Read the reference architecture as a signal pipeline.

OpenBao deployment
  -> source signals
  -> collectors and scrapes
  -> recording rules and log queries
  -> dashboards, alerts, and runbooks
  -> operator decisions

The project keeps source signals and derived signals separate. You validate source metrics and log streams first. You then use recording rules, alert rules, and generated dashboards to keep operator-facing views stable.

Source signals

Signal	Use it for	Do not use it for
Metrics	Health, request latency, HA state, Raft state, runtime pressure, token counts, lease counts, and audit-device failures.	Full request reconstruction or compliance evidence.
Operational logs	Startup, shutdown, listener, storage, Raft, plugin, and process troubleshooting.	Security audit trails or durable evidence by themselves.
Audit logs	API request and response security records for audited paths.	General application debugging or platform troubleshooting.
Platform signals	Pod, container, host, volume, network, and service discovery state.	OpenBao internal semantics such as seal state or audit-device health.

Dashboards combine these signals for interpretation. They do not become the source of truth themselves.

Derived signals

The repository uses contracts and generators so dashboards and alerts depend on explicit signal definitions.

Derived signal	Purpose
Recording rules	Normalize OpenBao source metrics into stable `openbao:` series.
Alert rules	Turn critical and warning conditions into named operational events.
Dashboard contracts	Define the panels, queries, variables, and data-source expectations for generated Grafana dashboards.
Runbooks	Describe how you respond when an alert fires.

This separation matters because OpenBao deployments can emit either vault_* or openbao_* Prometheus metrics, and live label sets vary by scrape profile. The dashboards consume normalized rules where possible, while validation still checks the raw source signals.

OpenBao behavior

OpenBao telemetry exposes counters, gauges, and summaries. High-cardinality usage gauges, such as token, entity, and secret counts, update on the usage_gauge_period interval.

OpenBao audit devices write request and response entries for audited API paths. Some system paths bypass the audit system, including health, seal, unseal, leader, and initialization paths. sys/metrics, sys/pprof/*, and sys/in-flight-req also bypass audit when listener configuration allows unauthenticated access.

OpenBao completed request logging is separate from audit devices. It is disabled by default and depends on both log_requests_level and the main OpenBao log_level.

Design recommendations

Use metrics for fast health and trend detection. Alert on source metrics or normalized recording rules when the condition has clear operational meaning.

Use operational logs for server behavior and troubleshooting. Keep them separate from audit logs so operational dashboards do not require broad access to security records.

Use audit logs for security investigation and canary validation. Keep sensitive fields out of Loki labels and parse them at query time in restricted dashboards.

Use platform signals to explain why OpenBao cannot serve traffic, write audit files, or expose metrics. Do not replace OpenBao metrics with platform health checks.

Use all-node metrics scraping when you need standby, follower, or per-node Raft visibility. Use authenticated active-node scraping as the secure baseline when you only need cluster-level health.

Common mistakes

Treating an imported dashboard as the observability design.
Treating Loki as a compliance archive without an approved retention and access-control design.
Labeling request paths, secret paths, request IDs, entity IDs, token accessors, or client addresses.
Expecting active-node scraping to provide complete standby and follower visibility.
Using sys/health, sys/leader, or sys/metrics as an audit canary path.
Reading token and lease inventory gauges as real-time values without checking usage_gauge_period.

Evidence basis

Classification	Meaning in this project
Confirmed OpenBao docs behavior	OpenBao documents telemetry metric types, the `/sys/metrics` scrape endpoint, audit-device behavior, unaudited paths, and completed request logging.
Observed fixture behavior	The OpenBao 2.5.4 fixtures in this repository exercise HA/Raft metrics, audit streams, and both supported metric-prefix variants.
Design decision	This project normalizes source metrics into `openbao:` recording rules and keeps audit fields out of Loki labels.
To validate	Deployment-specific labels, scrape identities, retention controls, and OpenBao versions outside the validated fixture set.

What’s next

Use Metrics, logs, and audit logs to choose the right signal for a question.
Use High-cardinality and label safety before you change Prometheus or Loki labels.
Use Active-node and all-node observability to choose a metrics scrape profile.
Use Audit logs as security records before you expand audit-log access or retention.
Use Token and lease observability to interpret token and lease panels.
Use OpenBao overview dashboard to read the generated first-stop dashboard.
Use Configure a secure metrics scrape for the authenticated active-node baseline.
Use Configure an all-node metrics scrape when you need per-node HA and Raft visibility.
Use Configure declarative audit devices for repeatable audit-log setup.
Use Understand metric prefixes and recording rules to map source metrics to generated rules.

Source: OpenBao documents telemetry behavior in the OpenBao telemetry documentation and OpenBao telemetry metrics overview . OpenBao documents audit-device behavior and unaudited paths in the OpenBao audit device documentation . OpenBao documents completed request logging in the OpenBao completed request logging documentation .