OpenBao metrics scrape failing

Use this runbook when the OpenBaoUnreachable alert fires because Prometheus cannot scrape one or more OpenBao metrics targets. The steps help you separate an OpenBao health problem from a scrape configuration, network, or token issue.

Before you begin

  • Get access to Prometheus or the metrics backend that evaluates the alert.
  • Get network access to the OpenBao metrics listener.
  • Get a metrics token when your profile protects /v1/sys/metrics with an OpenBao token.
  • Know whether your deployment uses the default vault metric prefix or an explicit openbao prefix.

Confirm the scrape failure

  1. Query the scrape result for all OpenBao targets.

    up{job=~"openbao.*"}
    
  2. Check whether the target stayed down for the alert window.

    min_over_time(
      up{job=~"openbao.*"}[5m]
    )
    

    A value of 0 means Prometheus could not scrape the target during the window.

  3. Open the Prometheus targets page and inspect the error for the OpenBao job. The target error usually identifies DNS, TCP, TLS, HTTP status, or token failures.

Check OpenBao health

  1. Query the health endpoint on the affected node.

    curl -fsS http://<openbao_address>/v1/sys/health
    
    • <openbao_address>: OpenBao API address for the affected node, including scheme and port.
  2. Interpret the health response.

    StatusMeaning
    200The node is initialized, unsealed, and active.
    429The node is initialized, unsealed, and standby.
    501The node is not initialized.
    503The node is sealed.
  3. If the node is sealed, switch to OpenBao sealed unexpectedly .

Check the metrics endpoint

  1. Query the metrics endpoint from the same network path that Prometheus uses.

    curl -fsS --header "X-Vault-Token: <metrics_token>" 'http://<openbao_address>/v1/sys/metrics?format=prometheus'
    
    • <metrics_token>: Token allowed to read metrics when the listener requires authentication.
    • <openbao_address>: OpenBao API or metrics listener address for the affected node.
  2. If the endpoint returns 403, check the token policy and token expiration.

  3. If the endpoint returns 404 or an empty response, check the OpenBao listener configuration. The listener must allow metrics and must not set disallow_metrics = true on the listener that Prometheus scrapes.

  4. If the endpoint times out, check security groups, firewalls, NetworkPolicy, service selectors, and TLS settings between Prometheus and OpenBao.

Restore the scrape

  1. Fix the failing layer identified by the previous checks.

    FailureAction
    OpenBao is sealedUnseal or restore the seal backend.
    Token is deniedRotate or reissue the metrics token with the required policy.
    DNS or service target is wrongCorrect the scrape target, Service, or service discovery labels.
    TLS verification failsCorrect the CA bundle, server name, or scrape scheme.
    Listener blocks metricsEnable metrics on the private metrics listener.
  2. Reload or restart Prometheus when you changed scrape configuration.

  3. Avoid exposing unauthenticated metrics on a broad network. If you use unauthenticated metrics access, keep the listener private and enforce access with network controls.

Verify the result

  1. Confirm that Prometheus sees the target as up.

    up{job=~"openbao.*"}
    
  2. Confirm that OpenBao metrics are present.

    ${p}_core_active
    
    • ${p}: Metric prefix for your deployment. Use vault for the OpenBao default prefix or openbao when you configured metrics_prefix = "openbao".
  3. Wait for the alert window to pass and confirm that OpenBaoUnreachable resolves.

Troubleshooting

Health works but metrics fail

Check listener-specific metrics settings. OpenBao can expose health and API paths while the scraped listener still blocks /v1/sys/metrics.

Prometheus target is up but the alert still fires

Check whether another target in the same job remains down. The OpenBaoUnreachable alert evaluates all openbao.* jobs.

Metrics work from your workstation but not Prometheus

Run the same request from the Prometheus network path. Scrape failures often come from service discovery, NetworkPolicy, or TLS trust differences.

What’s next

Source: OpenBao documents /v1/sys/health status codes in the OpenBao health API documentation . OpenBao documents Prometheus format metrics in the OpenBao metrics API documentation .