Skip to content

Hermes 403 on first call after PAT rotation — non-atomic config writes + silent failures #22

@dgokeeffe

Description

@dgokeeffe

Symptom

Running hermes chat against a deployed CODA app returns HTTP 403: Invalid access token on the first call after a PAT rotation, then succeeds on retry. Captured trace:

⚠️  API call failed (attempt 1/3): PermissionDeniedError [HTTP 403]
   🌐 Endpoint: https://adb-<workspace>.azuredatabricks.net/serving-endpoints/
   📝 Error: HTTP 403: Invalid access token

What I confirmed empirically

  • Workspace + PAT are good — databricks current-user me succeeds; databricks serving-endpoints list returns 7 endpoints; opus-4-6 is ready=READY.
  • A fresh PAT against {host}/serving-endpoints/chat/completions (the URL Hermes constructs) returns 200 with a valid model response. So the URL pattern is correct and the endpoint serves OpenAI-style requests.
  • Claude Code with the same workspace's PAT works fine (sonnet-4-6 via /anthropic).
  • This is not a Geo Designated Services block — same PAT can call the same endpoint via curl.

Root cause

cli_auth.py:update_cli_tokens() is called by pat_rotator._persist_token() after every rotation. Each _update_* function does non-atomic read-modify-write:

with open(path) as f:
    content = f.read()
new_content = ... # regex-replace api_key
if new_content != content:
    with open(path, "w") as f:           # window starts here
        f.write(new_content)             # window closes here

Hermes specifically re-reads ~/.hermes/config.yaml on every invocation. If the rotator is mid-write when Hermes opens the file, it sees a partial / empty / stale-token state → 403. Other CLIs (Claude / Codex / Gemini / OpenCode) read the token into their process env at startup and don't re-read, so they never observe the partial-write state — but they also never benefit from rotation within a long-running process, which is a separate bug.

Compounding: every _update_* swallows OSError silently. If a write actually fails (perms, locked file, disk full), the file stays stale forever and the user just gets 403s with no log line to debug from.

Fix (PR coming)

Adds _atomic_write_text() — write to <path>.tmp, then os.replace(). POSIX rename is atomic, so concurrent readers see either the old file whole or the new file whole, never a partial state.

Replaces silent except OSError: pass with logger.warning(...). The "file doesn't exist yet" case (rotator firing during the window between app start and setup-script completion) is handled by an explicit os.path.exists() guard so it stays quiet.

Applied to all 5 update functions, not just Hermes — the same race exists in _update_claude / _update_opencode and the dotenv helper. Hermes just exposed it first because of its read-on-every-call invocation pattern.

Out of scope (worth a follow-up)

  • PAT rotator stops when active sessions hit zero. If a session is reaped and a new one starts >15 min later, PAT_v1 has expired but the rotator hasn't yet woken to mint v2. First Hermes call still fails — atomic write doesn't help. Fix is to trigger an immediate _persist_token() on session creation. Filing as separate issue if this turns out to also be hit in practice.
  • Other agents don't refresh in-process. Claude / Codex / Gemini / OpenCode read the token at process startup. If you stay in a Claude Code session for >15 min, it's using a stale token and would 401 on its next call. Different problem from this issue (token-in-process vs token-on-disk); fixing here would be scope creep.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions