Symptom
Running hermes chat against a deployed CODA app returns HTTP 403: Invalid access token on the first call after a PAT rotation, then succeeds on retry. Captured trace:
⚠️ API call failed (attempt 1/3): PermissionDeniedError [HTTP 403]
🌐 Endpoint: https://adb-<workspace>.azuredatabricks.net/serving-endpoints/
📝 Error: HTTP 403: Invalid access token
What I confirmed empirically
- Workspace + PAT are good —
databricks current-user me succeeds; databricks serving-endpoints list returns 7 endpoints; opus-4-6 is ready=READY.
- A fresh PAT against
{host}/serving-endpoints/chat/completions (the URL Hermes constructs) returns 200 with a valid model response. So the URL pattern is correct and the endpoint serves OpenAI-style requests.
- Claude Code with the same workspace's PAT works fine (sonnet-4-6 via
/anthropic).
- This is not a Geo Designated Services block — same PAT can call the same endpoint via curl.
Root cause
cli_auth.py:update_cli_tokens() is called by pat_rotator._persist_token() after every rotation. Each _update_* function does non-atomic read-modify-write:
with open(path) as f:
content = f.read()
new_content = ... # regex-replace api_key
if new_content != content:
with open(path, "w") as f: # window starts here
f.write(new_content) # window closes here
Hermes specifically re-reads ~/.hermes/config.yaml on every invocation. If the rotator is mid-write when Hermes opens the file, it sees a partial / empty / stale-token state → 403. Other CLIs (Claude / Codex / Gemini / OpenCode) read the token into their process env at startup and don't re-read, so they never observe the partial-write state — but they also never benefit from rotation within a long-running process, which is a separate bug.
Compounding: every _update_* swallows OSError silently. If a write actually fails (perms, locked file, disk full), the file stays stale forever and the user just gets 403s with no log line to debug from.
Fix (PR coming)
Adds _atomic_write_text() — write to <path>.tmp, then os.replace(). POSIX rename is atomic, so concurrent readers see either the old file whole or the new file whole, never a partial state.
Replaces silent except OSError: pass with logger.warning(...). The "file doesn't exist yet" case (rotator firing during the window between app start and setup-script completion) is handled by an explicit os.path.exists() guard so it stays quiet.
Applied to all 5 update functions, not just Hermes — the same race exists in _update_claude / _update_opencode and the dotenv helper. Hermes just exposed it first because of its read-on-every-call invocation pattern.
Out of scope (worth a follow-up)
- PAT rotator stops when active sessions hit zero. If a session is reaped and a new one starts >15 min later, PAT_v1 has expired but the rotator hasn't yet woken to mint v2. First Hermes call still fails — atomic write doesn't help. Fix is to trigger an immediate
_persist_token() on session creation. Filing as separate issue if this turns out to also be hit in practice.
- Other agents don't refresh in-process. Claude / Codex / Gemini / OpenCode read the token at process startup. If you stay in a Claude Code session for >15 min, it's using a stale token and would 401 on its next call. Different problem from this issue (token-in-process vs token-on-disk); fixing here would be scope creep.
Symptom
Running
hermes chatagainst a deployed CODA app returnsHTTP 403: Invalid access tokenon the first call after a PAT rotation, then succeeds on retry. Captured trace:What I confirmed empirically
databricks current-user mesucceeds;databricks serving-endpoints listreturns 7 endpoints; opus-4-6 isready=READY.{host}/serving-endpoints/chat/completions(the URL Hermes constructs) returns 200 with a valid model response. So the URL pattern is correct and the endpoint serves OpenAI-style requests./anthropic).Root cause
cli_auth.py:update_cli_tokens()is called bypat_rotator._persist_token()after every rotation. Each_update_*function does non-atomic read-modify-write:Hermes specifically re-reads
~/.hermes/config.yamlon every invocation. If the rotator is mid-write when Hermes opens the file, it sees a partial / empty / stale-token state → 403. Other CLIs (Claude / Codex / Gemini / OpenCode) read the token into their process env at startup and don't re-read, so they never observe the partial-write state — but they also never benefit from rotation within a long-running process, which is a separate bug.Compounding: every
_update_*swallowsOSErrorsilently. If a write actually fails (perms, locked file, disk full), the file stays stale forever and the user just gets 403s with no log line to debug from.Fix (PR coming)
Adds
_atomic_write_text()— write to<path>.tmp, thenos.replace(). POSIX rename is atomic, so concurrent readers see either the old file whole or the new file whole, never a partial state.Replaces silent
except OSError: passwithlogger.warning(...). The "file doesn't exist yet" case (rotator firing during the window between app start and setup-script completion) is handled by an explicitos.path.exists()guard so it stays quiet.Applied to all 5 update functions, not just Hermes — the same race exists in
_update_claude/_update_opencodeand the dotenv helper. Hermes just exposed it first because of its read-on-every-call invocation pattern.Out of scope (worth a follow-up)
_persist_token()on session creation. Filing as separate issue if this turns out to also be hit in practice.