fix(hermes): route via AI Gateway with auto-constructed URL#21
Open
fix(hermes): route via AI Gateway with auto-constructed URL#21
Conversation
…pp.yaml Mirrors app.yaml.template so the deployed Databricks App config exposes Hermes settings explicitly in the Apps UI rather than relying on the Python defaults baked into setup_hermes.py. Co-authored-by: Isaac
setup_hermes.py's direct-serving path is `{host}/serving-endpoints`
with no API namespace, so the OpenAI-style /chat/completions append
404s on Databricks. The other CLIs work direct because they target
namespaced paths (/serving-endpoints/anthropic, /openai, /gemini).
Setting DATABRICKS_GATEWAY_HOST routes all five CLIs through the AI
Gateway uniformly, matching the recommendation in app.yaml.template.
For daveok the gateway is workspace-id 7405611674437990 on Azure.
Co-authored-by: Isaac
utils.py:
- get_gateway_host() now derives the workspace ID from DATABRICKS_HOST
on Azure (host pattern: adb-{ws}.{region}.azuredatabricks.net) when
DATABRICKS_WORKSPACE_ID isn't set.
- Builds the cloud-specific gateway URL by checking for azuredatabricks.net
in the host (Azure: {ws}.0.ai-gateway.azuredatabricks.net; AWS keeps
the existing {ws}.ai-gateway.cloud.databricks.com pattern).
- Probe-then-cache logic is unchanged; this just adds two helpers
(_derive_workspace_id_from_host, _build_gateway_candidate) and threads
DATABRICKS_HOST into the resolution.
app.yaml:
- Drop the explicit DATABRICKS_GATEWAY_HOST hardcode added in bce6dab.
Auto-construction now picks up the right Azure URL from DATABRICKS_HOST,
so the env var is redundant on daveok and would be wrong on other
workspaces.
- HERMES_MODEL: opus-4-7 -> opus-4-6 (only opus-4-6 is on this gateway
for OpenAI/MLflow-style routing).
Co-authored-by: Isaac
Collaborator
Author
|
@datasciencemonkey — flagging for priority review. P0: Hermes Agent is fully broken on every deploy until this lands. Captured trace + a direct curl probe showing the underlying model serving works at 200 when the URL is constructed correctly are in the PR body. The fix is two files ( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Priority
P0 — Hermes Agent is fully broken on deploy. Every
hermes chatcall 404s on the primary model and 403s on the fallback because Hermes is hitting{workspace}/serving-endpointswithout an OpenAI namespace, and main'ssetup_hermes.pydoesn't auto-discover the AI Gateway from the workspace ID.Summary
Fix for #20. Hermes Agent currently 404s on every workspace because
setup_hermes.pyconfigures the OpenAI-style endpoint as{DATABRICKS_HOST}/serving-endpoints(no API namespace). opus-4-7 isn't served at that path — only on the AI Gateway. Other CLIs work because they target namespaced paths (/anthropic,/openai,/gemini).This PR routes Hermes through
DATABRICKS_GATEWAY_HOSTand auto-constructs the gateway URL fromDATABRICKS_HOSTso the same code works across workspaces and clouds.Changes
utils.py(+33/-3):get_gateway_host()derives workspace ID fromDATABRICKS_HOSTon Azure (adb-{ws}.{region}.azuredatabricks.net) whenDATABRICKS_WORKSPACE_IDisn't set.{ws}.0.ai-gateway.azuredatabricks.neton Azure, existing{ws}.ai-gateway.cloud.databricks.comon AWS._derive_workspace_id_from_host,_build_gateway_candidate) and threadsDATABRICKS_HOSTinto resolution.app.yaml(+7):HERMES_MODEL(databricks-claude-opus-4-6— what most gateways serve via OpenAI-style routing),HERMES_FALLBACK_MODEL,ENABLE_HERMES. Mirrorsapp.yaml.template.Why opus-4-6 not opus-4-7 for Hermes
Hermes uses OpenAI-style API format. The Databricks AI Gateway typically maps opus-4-6 onto the
/openai/{model}/chat/completionspath; opus-4-7 is served via/anthropic/v1/messages(the path Claude Code uses, not Hermes). So opus-4-6 is the correct default for Hermes specifically.ANTHROPIC_MODEL(Claude Code) stays on opus-4-7 — that path serves it correctly.Out of scope
ANTHROPIC_MODELopus-4-7 → opus-4-6 flip from my local branch —ANTHROPIC_MODELis correctly served via/anthropicon most gateways.Test plan
hermes chat, send a prompt, confirm a response.get_gateway_host()returns the right URL on both an Azure workspace (adb-...) and an AWS workspace (dbc-...).Closes #20
This pull request and its description were written by Isaac.
Test Evidence (verified on the live deployment 2026-05-06)
Captured from the user running
hermes chatonhttps://coding-agents-7405613340366915.15.azure.databricksapps.com:The endpoint URL has no namespace — Hermes hits
/serving-endpointsdirectly, which doesn't expose opus-4-7. Other CLIs (Claude, Codex, Gemini, OpenCode) work because they hit namespaced paths (/serving-endpoints/anthropic,.../openai,.../gemini). The fix routes Hermes through the AI Gateway and auto-constructs the gateway URL fromDATABRICKS_HOST.Direct probe shows the underlying model serving works fine when the URL is right:
So this is purely a URL-routing fix in
setup_hermes.py+utils.py's gateway resolver. Workspace permissions, PAT validity, and model availability all check out independently.