Symptom
When the user runs Hermes (hermes chat) on a deployed CODA app, the call 404s on the primary model and 403s on the fallback. Captured trace:
⚠️ API call failed (attempt 1/3): NotFoundError [HTTP 404]
🔌 Provider: custom Model: databricks-claude-opus-4-7
🌐 Endpoint: https://adb-<workspace>.azuredatabricks.net/serving-endpoints
📝 Error: HTTP 404: The given endpoint does not exist
📋 Details: {'error_code': 'ENDPOINT_NOT_FOUND'}
🔄 Switching to fallback: databricks-claude-opus-4-6 via custom
⚠️ API call failed (attempt 1/3): PermissionDeniedError [HTTP 403]
📝 Error: HTTP 403: Invalid access token
Root cause
setup_hermes.py configures the OpenAI-style endpoint as {DATABRICKS_HOST}/serving-endpoints with no API namespace. The Databricks per-workspace direct-serving path doesn't expose opus-4-7 at that URL — only the AI Gateway does. The other CLIs (Claude, Codex, OpenCode, Gemini) work because they target namespaced paths (/serving-endpoints/anthropic, .../openai, .../gemini).
Two-part fix needed:
- Route Hermes through the AI Gateway by setting
DATABRICKS_GATEWAY_HOST so the OpenAI-style URL becomes {gateway}/openai instead of {workspace}/serving-endpoints.
- Auto-construct the gateway host from
DATABRICKS_HOST rather than hardcoding it. The previous incarnation hardcoded a specific Azure gateway URL — fine for one workspace, broken on every other one.
Branch
fix/hermes-gateway-url is open (3 commits, +40 / -3 across app.yaml + utils.py):
- Adds
HERMES_MODEL / HERMES_FALLBACK_MODEL / ENABLE_HERMES to app.yaml (mirrors app.yaml.template).
- Routes Hermes through
DATABRICKS_GATEWAY_HOST.
get_gateway_host() derives workspace ID from DATABRICKS_HOST on Azure (adb-{ws}.{region}.azuredatabricks.net), builds the cloud-specific URL ({ws}.0.ai-gateway.azuredatabricks.net on Azure, existing {ws}.ai-gateway.cloud.databricks.com on AWS), probes for reachability, and falls back to direct serving endpoints if the gateway is unreachable. Two new helpers: _derive_workspace_id_from_host and _build_gateway_candidate.
Out of scope
- The 403 on fallback (above) is a separate workspace-config issue — even with the URL fix, the gateway needs an ACL granting the workspace's PAT access. Mention here for completeness; not addressed by this PR.
ANTHROPIC_MODEL flip from opus-4-7 to opus-4-6 (which I included in my local branch) is workspace-specific — Claude Code uses /anthropic routing where opus-4-7 IS served on most gateways. Excluded from this PR; opt in per-workspace via env var if needed.
Test plan
Symptom
When the user runs Hermes (
hermes chat) on a deployed CODA app, the call 404s on the primary model and 403s on the fallback. Captured trace:Root cause
setup_hermes.pyconfigures the OpenAI-style endpoint as{DATABRICKS_HOST}/serving-endpointswith no API namespace. The Databricks per-workspace direct-serving path doesn't expose opus-4-7 at that URL — only the AI Gateway does. The other CLIs (Claude, Codex, OpenCode, Gemini) work because they target namespaced paths (/serving-endpoints/anthropic,.../openai,.../gemini).Two-part fix needed:
DATABRICKS_GATEWAY_HOSTso the OpenAI-style URL becomes{gateway}/openaiinstead of{workspace}/serving-endpoints.DATABRICKS_HOSTrather than hardcoding it. The previous incarnation hardcoded a specific Azure gateway URL — fine for one workspace, broken on every other one.Branch
fix/hermes-gateway-urlis open (3 commits, +40 / -3 acrossapp.yaml+utils.py):HERMES_MODEL/HERMES_FALLBACK_MODEL/ENABLE_HERMEStoapp.yaml(mirrorsapp.yaml.template).DATABRICKS_GATEWAY_HOST.get_gateway_host()derives workspace ID fromDATABRICKS_HOSTon Azure (adb-{ws}.{region}.azuredatabricks.net), builds the cloud-specific URL ({ws}.0.ai-gateway.azuredatabricks.neton Azure, existing{ws}.ai-gateway.cloud.databricks.comon AWS), probes for reachability, and falls back to direct serving endpoints if the gateway is unreachable. Two new helpers:_derive_workspace_id_from_hostand_build_gateway_candidate.Out of scope
ANTHROPIC_MODELflip from opus-4-7 to opus-4-6 (which I included in my local branch) is workspace-specific — Claude Code uses/anthropicrouting where opus-4-7 IS served on most gateways. Excluded from this PR; opt in per-workspace via env var if needed.Test plan
hermes chat, send a prompt, confirm a response.get_gateway_host()falls back to direct serving endpoints; Hermes will still 404 (expected — the bug is that opus-4-7 isn't on direct), so HERMES_MODEL should be set to a model that IS on direct.databricks-claude-opus-4-6reachable on both AWS + Azure workspaces via constructed gateway URL.