OpenBao overview dashboard

Use this explainer to read the generated OpenBao overview dashboard. It is for operators who need a fast health view before they move into HA/Raft, audit, auth, token, lease, log, or secret-engine drilldowns.

What this dashboard is for

Use the overview dashboard as the first stop during routine checks and incidents. It combines scrape health, cluster state, request health, HA/Raft state, runtime pressure, storage context, audit-log presence, and operational errors.

The dashboard answers these questions:

Can Prometheus scrape OpenBao?
Does the cluster have exactly one active node?
How many nodes report as unsealed?
Are regular requests, login requests, or token checks getting slower?
Does Raft report expected peer and Autopilot health?
Are barrier latency, cache hit ratio, or runtime memory signals changing?
Are audit events and operational errors visible in Loki?

What this dashboard is not for

Do not use the overview dashboard as the only investigation surface. It is a triage view, not a forensic record.

Use a more specific dashboard when you need:

Per-node Raft detail.
Audit request ID drilldown.
Auth and identity activity.
Token and lease lifecycle detail.
Secret engine and mount activity.
Runtime and storage correlation.
Full operational log exploration.

Required data sources

The generated dashboard expects these Grafana data sources:

Data source	Expected UID	Used for
Prometheus	`prometheus`	OpenBao source metrics and normalized recording rules.
Loki	`loki`	Operational logs and audit logs.

Deploy the generated Prometheus recording rules before you rely on this dashboard. Most metric panels use normalized openbao: rules instead of raw vault_* or openbao_* source metrics.

Scrape profile assumptions

The dashboard works best when the scrape profile matches the question you ask.

Scrape profile	What works well	Limitation
Authenticated active-node scrape	Cluster-level health, active state, request latency, audit metrics, and runtime pressure on the active node.	Standby and follower node runtime state is limited.
Private all-node scrape	Per-node HA, Raft, Autopilot, runtime, and standby visibility.	Requires stronger network isolation and label review.
Local Docker Compose scrape	Reference-stack validation and dashboard development.	Not a production security model.

The scrape health panel currently queries up{job="openbao"}. If your Prometheus job label is different, update the dashboard contract or add a compatible recording rule before you treat scrape health as authoritative.

How to read cluster health

Start with the top row:

Panel	Healthy interpretation
Scrape health	All expected OpenBao scrape targets return `up=1`.
Active nodes	Exactly one active node exists.
Unsealed nodes	The expected number of nodes report as unsealed.
Audit request failures	The value stays at `0` over the five-minute window.

Investigate immediately when active nodes is 0 or greater than 1. A sealed or unreachable active node changes the meaning of every other panel because the dashboard can only show what Prometheus and Loki still receive.

How to read request health

The request-health row shows:

In-flight requests.
Non-login request rate.
Non-login request latency.
Login request latency.
Token check latency.

Read these panels together. Higher latency with normal request rate points to server-side or dependency pressure. Higher request rate with stable latency often points to load growth. Rising token-check latency can affect many paths because token checks sit on the request path for authenticated operations.

Do not compare request latency across environments without checking OpenBao version, storage backend, auth methods, scrape interval, and load profile.

How to read HA and Raft health

The HA/Raft row shows peer count, Autopilot health, failure tolerance, and unhealthy Raft nodes. The Autopilot node health panel shows node-level health by Raft node ID.

Use these panels to decide whether to open the HA/Raft dashboard. The overview does not replace the dedicated HA/Raft dashboard because it does not show all Raft timing, contact, and peer-detail signals.

How to read runtime pressure

Lease count and goroutines are trend panels. Use them to spot changes in shape, not to declare an incident from one sample.

Lease count can lag real activity because OpenBao high-cardinality usage gauges update on usage_gauge_period. Treat sudden growth as a prompt to open the token and lease lifecycle dashboard.

How to read storage and cache context

The storage and cache row shows barrier GET latency, barrier PUT latency, cache hit ratio, runtime heap objects, runtime system bytes, and mount table entries.

Read this row after request health. Barrier latency or cache-ratio changes are most useful when they line up with request latency, token-check latency, or operational log symptoms.

Mount table entries use bounded aggregate labels in the generated recording rule. The overview shows inventory context without exposing mount paths.

How to read audit and logs

The bottom row shows the recent audit stream and operational log entries that contain error.

Use the audit stream panel as a presence check only. A quiet audit stream does not prove that audit logging works, because quiet clusters can have no audited activity. Use the canary-backed audit alert and the audit overview dashboard when you need audit-pipeline confidence.

Use the operational errors panel to find process-level symptoms. Do not treat operational logs as audit evidence.

Known limitations

The dashboard depends on generated recording rules for most metrics.
The scrape health panel assumes job="openbao".
Active-node scraping does not provide complete standby visibility.
Loki panels depend on log_stream="openbao.audit" and log_stream="openbao.operational".
Audit stream presence is not the same as audit canary success.
Inventory-style gauges can update on usage_gauge_period rather than every scrape.

What’s next

Use OpenBao observability model to understand the source and derived signals behind the dashboard.
Use Metrics, logs, and audit logs to choose the right follow-up signal.
Use OpenBao HA/Raft dashboard when HA, Raft, or Autopilot panels look unhealthy.
Use OpenBao audit overview dashboard when audit metrics or audit streams look unhealthy.
Use OpenBao audit investigation dashboard when you need request ID, path, operation, or node drilldown.
Use OpenBao operational logs dashboard when operational errors need deeper log context.
Use OpenBao runtime and storage dashboard when runtime, barrier, cache, or mount table signals need correlation.
Use Understand metric prefixes and recording rules when a metric panel is empty.

Source: OpenBao documents telemetry behavior in the OpenBao telemetry metrics overview . OpenBao documents audit-device behavior and unaudited paths in the OpenBao audit device documentation . This page describes the generated dashboard contract in contracts/dashboards/openbao-overview.yaml.