The token manager (hotdata/_auth.py) treats any non-200 from the token-exchange endpoint (/v1/auth/jwt) as fatal: _mint raises TokenExchangeError immediately on the first failed response, with no retry.
This means a brief, transient server-side error (e.g. a momentary 500) on the token endpoint fails the caller outright, even though an immediate re-attempt would succeed. We hit this in CI: a single transient 500 during a mint failed one query, while every request around it succeeded.
Ask: retry token exchange on transient failures before giving up.
- Retry on
5xx responses and transport errors (connection/read errors).
- Do not retry on
4xx (e.g. 400/401 -- bad/expired credentials are not transient).
- Use a small bounded retry budget with exponential backoff + jitter (e.g. 2-3 attempts).
- Applies to both the
api_token mint path and the refresh_token path (the refresh path already falls back to a re-mint, so retry should wrap the underlying request).
- Surface the final error as
TokenExchangeError once retries are exhausted, preserving the last status/body.
Context on the server-side transient errors that motivated this: hotdata-dev/monopoly#1128.
The token manager (
hotdata/_auth.py) treats any non-200 from the token-exchange endpoint (/v1/auth/jwt) as fatal:_mintraisesTokenExchangeErrorimmediately on the first failed response, with no retry.This means a brief, transient server-side error (e.g. a momentary
500) on the token endpoint fails the caller outright, even though an immediate re-attempt would succeed. We hit this in CI: a single transient500during a mint failed one query, while every request around it succeeded.Ask: retry token exchange on transient failures before giving up.
5xxresponses and transport errors (connection/read errors).4xx(e.g.400/401-- bad/expired credentials are not transient).api_tokenmint path and therefresh_tokenpath (the refresh path already falls back to a re-mint, so retry should wrap the underlying request).TokenExchangeErroronce retries are exhausted, preserving the last status/body.Context on the server-side transient errors that motivated this: hotdata-dev/monopoly#1128.