Skip to content

Hermes Agent fails with 404 on AI Gateway — opus-4-7 not on /serving-endpoints #20

@dgokeeffe

Description

@dgokeeffe

Symptom

When the user runs Hermes (hermes chat) on a deployed CODA app, the call 404s on the primary model and 403s on the fallback. Captured trace:

⚠️  API call failed (attempt 1/3): NotFoundError [HTTP 404]
   🔌 Provider: custom  Model: databricks-claude-opus-4-7
   🌐 Endpoint: https://adb-<workspace>.azuredatabricks.net/serving-endpoints
   📝 Error: HTTP 404: The given endpoint does not exist
   📋 Details: {'error_code': 'ENDPOINT_NOT_FOUND'}
🔄 Switching to fallback: databricks-claude-opus-4-6 via custom
⚠️  API call failed (attempt 1/3): PermissionDeniedError [HTTP 403]
   📝 Error: HTTP 403: Invalid access token

Root cause

setup_hermes.py configures the OpenAI-style endpoint as {DATABRICKS_HOST}/serving-endpoints with no API namespace. The Databricks per-workspace direct-serving path doesn't expose opus-4-7 at that URL — only the AI Gateway does. The other CLIs (Claude, Codex, OpenCode, Gemini) work because they target namespaced paths (/serving-endpoints/anthropic, .../openai, .../gemini).

Two-part fix needed:

  1. Route Hermes through the AI Gateway by setting DATABRICKS_GATEWAY_HOST so the OpenAI-style URL becomes {gateway}/openai instead of {workspace}/serving-endpoints.
  2. Auto-construct the gateway host from DATABRICKS_HOST rather than hardcoding it. The previous incarnation hardcoded a specific Azure gateway URL — fine for one workspace, broken on every other one.

Branch

fix/hermes-gateway-url is open (3 commits, +40 / -3 across app.yaml + utils.py):

  • Adds HERMES_MODEL / HERMES_FALLBACK_MODEL / ENABLE_HERMES to app.yaml (mirrors app.yaml.template).
  • Routes Hermes through DATABRICKS_GATEWAY_HOST.
  • get_gateway_host() derives workspace ID from DATABRICKS_HOST on Azure (adb-{ws}.{region}.azuredatabricks.net), builds the cloud-specific URL ({ws}.0.ai-gateway.azuredatabricks.net on Azure, existing {ws}.ai-gateway.cloud.databricks.com on AWS), probes for reachability, and falls back to direct serving endpoints if the gateway is unreachable. Two new helpers: _derive_workspace_id_from_host and _build_gateway_candidate.

Out of scope

  • The 403 on fallback (above) is a separate workspace-config issue — even with the URL fix, the gateway needs an ACL granting the workspace's PAT access. Mention here for completeness; not addressed by this PR.
  • ANTHROPIC_MODEL flip from opus-4-7 to opus-4-6 (which I included in my local branch) is workspace-specific — Claude Code uses /anthropic routing where opus-4-7 IS served on most gateways. Excluded from this PR; opt in per-workspace via env var if needed.

Test plan

  • Deploy to a workspace with the AI Gateway available; run hermes chat, send a prompt, confirm a response.
  • Deploy to a workspace WITHOUT the gateway (or block it via firewall) → confirm get_gateway_host() falls back to direct serving endpoints; Hermes will still 404 (expected — the bug is that opus-4-7 isn't on direct), so HERMES_MODEL should be set to a model that IS on direct.
  • databricks-claude-opus-4-6 reachable on both AWS + Azure workspaces via constructed gateway URL.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions