Skip to content

fix(hermes): route via AI Gateway with auto-constructed URL#21

Open
dgokeeffe wants to merge 3 commits intomainfrom
fix/hermes-gateway-url
Open

fix(hermes): route via AI Gateway with auto-constructed URL#21
dgokeeffe wants to merge 3 commits intomainfrom
fix/hermes-gateway-url

Conversation

@dgokeeffe
Copy link
Copy Markdown
Collaborator

@dgokeeffe dgokeeffe commented May 6, 2026

Priority

P0 — Hermes Agent is fully broken on deploy. Every hermes chat call 404s on the primary model and 403s on the fallback because Hermes is hitting {workspace}/serving-endpoints without an OpenAI namespace, and main's setup_hermes.py doesn't auto-discover the AI Gateway from the workspace ID.


Summary

Fix for #20. Hermes Agent currently 404s on every workspace because setup_hermes.py configures the OpenAI-style endpoint as {DATABRICKS_HOST}/serving-endpoints (no API namespace). opus-4-7 isn't served at that path — only on the AI Gateway. Other CLIs work because they target namespaced paths (/anthropic, /openai, /gemini).

This PR routes Hermes through DATABRICKS_GATEWAY_HOST and auto-constructs the gateway URL from DATABRICKS_HOST so the same code works across workspaces and clouds.

Changes

utils.py (+33/-3):

  • get_gateway_host() derives workspace ID from DATABRICKS_HOST on Azure (adb-{ws}.{region}.azuredatabricks.net) when DATABRICKS_WORKSPACE_ID isn't set.
  • Builds the cloud-specific gateway URL: {ws}.0.ai-gateway.azuredatabricks.net on Azure, existing {ws}.ai-gateway.cloud.databricks.com on AWS.
  • Probe-then-cache logic unchanged; just adds two helpers (_derive_workspace_id_from_host, _build_gateway_candidate) and threads DATABRICKS_HOST into resolution.
  • Falls back to direct serving endpoints if the gateway probe fails.

app.yaml (+7):

  • Adds HERMES_MODEL (databricks-claude-opus-4-6 — what most gateways serve via OpenAI-style routing), HERMES_FALLBACK_MODEL, ENABLE_HERMES. Mirrors app.yaml.template.

Why opus-4-6 not opus-4-7 for Hermes

Hermes uses OpenAI-style API format. The Databricks AI Gateway typically maps opus-4-6 onto the /openai/{model}/chat/completions path; opus-4-7 is served via /anthropic/v1/messages (the path Claude Code uses, not Hermes). So opus-4-6 is the correct default for Hermes specifically.

ANTHROPIC_MODEL (Claude Code) stays on opus-4-7 — that path serves it correctly.

Out of scope

  • 403 invalid access token on fallback after this PR resolves the 404. That's a workspace-config issue (gateway ACL not granting the workspace's PAT). Not addressed here; will need separate workspace-level fix.
  • Did not include the workspace-specific ANTHROPIC_MODEL opus-4-7 → opus-4-6 flip from my local branch — ANTHROPIC_MODEL is correctly served via /anthropic on most gateways.

Test plan

  • Deploy to a workspace with AI Gateway available; run hermes chat, send a prompt, confirm a response.
  • Verify get_gateway_host() returns the right URL on both an Azure workspace (adb-...) and an AWS workspace (dbc-...).
  • Deploy to a workspace WITH the gateway blocked at the network level — confirm fallback to direct serving endpoints still works (Hermes will still 404 in that case, but no regression vs status quo).

Closes #20

This pull request and its description were written by Isaac.


Test Evidence (verified on the live deployment 2026-05-06)

Captured from the user running hermes chat on https://coding-agents-7405613340366915.15.azure.databricksapps.com:

⚠️  API call failed (attempt 1/3): NotFoundError [HTTP 404]
   🔌 Provider: custom  Model: databricks-claude-opus-4-7
   🌐 Endpoint: https://adb-7405613340366915.15.azuredatabricks.net/serving-endpoints
   📝 Error: HTTP 404: The given endpoint does not exist
   📋 Details: {'error_code': 'ENDPOINT_NOT_FOUND'}

The endpoint URL has no namespace — Hermes hits /serving-endpoints directly, which doesn't expose opus-4-7. Other CLIs (Claude, Codex, Gemini, OpenCode) work because they hit namespaced paths (/serving-endpoints/anthropic, .../openai, .../gemini). The fix routes Hermes through the AI Gateway and auto-constructs the gateway URL from DATABRICKS_HOST.

Direct probe shows the underlying model serving works fine when the URL is right:

$ curl -X POST "$HOST/serving-endpoints/chat/completions" \
       -H "Authorization: Bearer $PAT" \
       -d '{"model":"databricks-claude-opus-4-6","messages":[{"role":"user","content":"hi"}]}'
{"model":"au.anthropic.claude-opus-4-6-v1","choices":[...],"usage":{...}}
HTTP 200

So this is purely a URL-routing fix in setup_hermes.py + utils.py's gateway resolver. Workspace permissions, PAT validity, and model availability all check out independently.

dgokeeffe added 3 commits May 6, 2026 16:50
…pp.yaml

Mirrors app.yaml.template so the deployed Databricks App config exposes
Hermes settings explicitly in the Apps UI rather than relying on the
Python defaults baked into setup_hermes.py.

Co-authored-by: Isaac
setup_hermes.py's direct-serving path is `{host}/serving-endpoints`
with no API namespace, so the OpenAI-style /chat/completions append
404s on Databricks. The other CLIs work direct because they target
namespaced paths (/serving-endpoints/anthropic, /openai, /gemini).

Setting DATABRICKS_GATEWAY_HOST routes all five CLIs through the AI
Gateway uniformly, matching the recommendation in app.yaml.template.
For daveok the gateway is workspace-id 7405611674437990 on Azure.

Co-authored-by: Isaac
utils.py:
- get_gateway_host() now derives the workspace ID from DATABRICKS_HOST
  on Azure (host pattern: adb-{ws}.{region}.azuredatabricks.net) when
  DATABRICKS_WORKSPACE_ID isn't set.
- Builds the cloud-specific gateway URL by checking for azuredatabricks.net
  in the host (Azure: {ws}.0.ai-gateway.azuredatabricks.net; AWS keeps
  the existing {ws}.ai-gateway.cloud.databricks.com pattern).
- Probe-then-cache logic is unchanged; this just adds two helpers
  (_derive_workspace_id_from_host, _build_gateway_candidate) and threads
  DATABRICKS_HOST into the resolution.

app.yaml:
- Drop the explicit DATABRICKS_GATEWAY_HOST hardcode added in bce6dab.
  Auto-construction now picks up the right Azure URL from DATABRICKS_HOST,
  so the env var is redundant on daveok and would be wrong on other
  workspaces.
- HERMES_MODEL: opus-4-7 -> opus-4-6 (only opus-4-6 is on this gateway
  for OpenAI/MLflow-style routing).

Co-authored-by: Isaac
@dgokeeffe
Copy link
Copy Markdown
Collaborator Author

@datasciencemonkey — flagging for priority review. P0: Hermes Agent is fully broken on every deploy until this lands. Captured trace + a direct curl probe showing the underlying model serving works at 200 when the URL is constructed correctly are in the PR body. The fix is two files (utils.py + app.yaml).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hermes Agent fails with 404 on AI Gateway — opus-4-7 not on /serving-endpoints

1 participant