OpenBao SLO and availability dashboard

Use this explainer to read the generated OpenBao SLO and availability dashboard. It is for SRE and platform teams who need user-facing availability, synthetic probe, latency, scrape, and error-budget context for OpenBao.

What this dashboard is for

Use the SLO and availability dashboard when you need to separate user-facing availability from internal OpenBao health.

The dashboard answers these questions:

  • Are synthetic OpenBao probes succeeding?
  • What is the 30-day probe availability?
  • How much availability error budget remains?
  • Is the error budget burning at a high rate?
  • Are synthetic probe durations increasing?
  • Do OpenBao request, login, or token-check latency signals correlate?
  • Did Prometheus scrape availability change at the same time?

What this dashboard is not for

Do not use this dashboard as a compliance report. It is an operational SLO view over the metrics you collect.

Do not treat OpenBao scrape health as the same thing as user-facing availability. Scrape health tells you whether Prometheus can collect metrics. Synthetic probes tell you whether a selected user path works from a selected network location.

Required data sources

The generated dashboard expects these Grafana data sources:

Data sourceExpected UIDUsed for
PrometheusprometheusSynthetic probe metrics, OpenBao recording rules, and scrape target health.
LokilokiOperational log context for availability and latency symptoms.

Synthetic panels depend on probe_success and probe_duration_seconds, which are commonly exposed by Prometheus blackbox-style probes. If you do not deploy synthetic probes, those panels remain empty and synthetic-probe alerts do not fire.

Dashboard variables

The dashboard exposes these variables:

VariableDefaultPurpose
Synthetic probe job.*openbao.*Selects synthetic probe scrape jobs.
Synthetic probe target.*Selects probe targets by instance.
OpenBao scrape jobopenbao.*Selects OpenBao metrics scrape jobs.
Availability SLO target0.999Sets the SLO target used for error-budget calculations.

Keep probe labels bounded. Do not put request paths, secret paths, token metadata, entity identifiers, or tenant identifiers into probe labels.

How to read availability

Start with probe success, 30-day availability, and error budget remaining.

probe_success is a binary signal: 1 means the probe succeeded, and 0 means it failed. Availability over a window is the average of this binary signal over that window.

The dashboard calculates approximate error budget remaining as:

1 - ((1 - observed availability) / (1 - SLO target))

A value near 1 means most of the error budget remains. A value near 0 means the selected target has consumed the 30-day budget for the configured SLO target. Negative values mean the target is outside budget.

How to read burn rate

Burn rate compares the current failure ratio with the allowed failure ratio. With the default 0.999 target, the allowed failure ratio is 0.001.

Use short and medium windows together. A high five-minute burn rate can be a small transient failure. A high one-hour burn rate means the symptom persisted long enough to threaten the budget.

The generated warning alerts use fixed 99.9 percent target math. If your production SLO target differs, update the alert contract before you use the alerts for paging.

How to read latency context

Probe duration shows end-to-end synthetic latency from the probe location. OpenBao request, login, and token-check latency show server-side internal timing from OpenBao telemetry.

Read them together:

  • Probe latency high and OpenBao latency normal usually points to network, DNS, load balancer, TLS, or probe-location issues.
  • Probe latency high and OpenBao request latency high points to OpenBao, storage, audit, auth, or runtime pressure.
  • OpenBao latency high while probes succeed can still be user-impacting for workloads that are not covered by the synthetic probe.

How to read scrape context

Scrape availability is an observability-path signal. If probes fail and scrape availability is healthy, Prometheus can still observe OpenBao during the incident. If scrape availability is also degraded, you may have both a service problem and an observability problem.

Use scrape availability with the metrics scrape runbook before you assume the SLO dashboard is complete.

Common mistakes

  • Treating Prometheus scrape health as user-facing availability.
  • Probing unaudited or overly narrow paths and calling the result an OpenBao availability SLO.
  • Using one probe location to represent every client network.
  • Putting request paths or tenant identifiers into probe labels.
  • Using the default 99.9 percent alert math for an environment with a different approved SLO target.
  • Treating error-budget dashboards as compliance evidence without an approved SLO policy.

Known limitations

  • The synthetic probe contract is optional.
  • The generated warning alerts assume a 99.9 percent availability target.
  • The dashboard calculates availability from probe success, not from every OpenBao client request.
  • Probe labels and target names are deployment-specific.
  • The dashboard does not define incident severity policy. Your organization owns paging thresholds and error-budget policy.

What’s next

Source: Prometheus documents blackbox-style multi-target probes with probe_success and probe_duration_seconds in the Prometheus multi-target exporter guide . Prometheus documents alerting rules and the for clause in the Prometheus alerting rules documentation . This page describes the generated dashboard contract in contracts/dashboards/openbao-slo-availability.yaml.