Skip to content

fix(claude): pick models from workspace's serving endpoints (GDS-aware)#27

Open
dgokeeffe wants to merge 1 commit intomainfrom
fix/model-discovery-claude
Open

fix(claude): pick models from workspace's serving endpoints (GDS-aware)#27
dgokeeffe wants to merge 1 commit intomainfrom
fix/model-discovery-claude

Conversation

@dgokeeffe
Copy link
Copy Markdown
Collaborator

Priority

P0 — Claude Code's Opus tier is broken on every non-US-geo workspace. Workspaces in AU / EU / etc. serve databricks-claude-opus-4-6 but not opus-4-7; setup_claude.py hardcodes the latter, so selecting Opus 404s. App-side workaround for the gateway-side issue tracked at #8.


Summary

Fix for #26. Setup scripts had hardcoded model names that don't survive Geo Designated Services restrictions. This PR teaches setup_claude.py to query the workspace's /api/2.0/serving-endpoints, treat that list as the GDS-respecting "what's actually served here" oracle, and pick a working model in each tier.

Changes

utils.py (+52, no logic changes elsewhere):

  • discover_serving_endpoints(host, token, timeout=5.0) -> set[str] — returns READY endpoint names. Empty set on any failure (preserves caller's fallback behaviour).
  • pick_in_geo_model(preferred, available, fallback) -> str — first preferred entry in available; else fallback.

setup_claude.py (+38/-5):

  • Calls discovery once after host+token are known.
  • Picks ANTHROPIC_MODEL, ANTHROPIC_DEFAULT_OPUS_MODEL, ANTHROPIC_DEFAULT_SONNET_MODEL, ANTHROPIC_DEFAULT_HAIKU_MODEL from the discovered list, walking a per-tier priority chain.
  • Logs when an env-set request is substituted.
  • Falls back to original env defaults if discovery returns empty (network failure, auth error, etc.) — behaviour matches main in failure modes.

Test Evidence (verified on the live deployment 2026-05-06)

Smoke test against daveok (AU geo workspace):

$ python3 -c "from utils import discover_serving_endpoints, pick_in_geo_model; \
              avail = discover_serving_endpoints('https://adb-7405613340366915.15...', '$PAT'); \
              print(sorted(avail))"
['databricks-claude-haiku-4-5',
 'databricks-claude-opus-4-6',
 'databricks-claude-sonnet-4-5',
 'databricks-claude-sonnet-4-6',
 'databricks-gpt-oss-120b',
 'databricks-gpt-oss-20b',
 'databricks-qwen3-embedding-0-6b']

# pick_in_geo_model picks correctly:
Active model:  databricks-claude-opus-4-6   (env asked for opus-4-7, not served)
Opus tier:     databricks-claude-opus-4-6   (preferred opus-4-7 unavailable)
Sonnet tier:   databricks-claude-sonnet-4-6 (preferred, served)
Haiku tier:    databricks-claude-haiku-4-5  (only option, served)

Before: /model shows opus-4-7 → 404 on every Opus call. After: opus-4-6 used as Opus default → works.

Test plan

  • Deploy on a workspace WITH opus-4-7 (US geo) — confirm opus-4-7 still chosen (preferred winner).
  • Deploy on a workspace WITHOUT opus-4-7 (AU/EU) — confirm opus-4-6 substituted, log line emitted.
  • Disable network briefly during setup (or revoke the PAT temporarily) — confirm fallback to env-set defaults so setup doesn't fail outright.

Out of scope

  • setup_codex.py, setup_hermes.py, setup_gemini.py, setup_opencode.py need the same treatment. Will follow as separate single-agent PRs once this helper lands and the semantics are reviewed. Single-agent scope here keeps the helper introduction reviewable on its own.
  • Doesn't fix Test app against Geo Designated Services — AI Gateway Beta surfaces x-geo FMs #8 (the gateway-Beta picker showing x-geo models). That's gateway-side. This is the app-side workaround so users in non-US geos get a working Claude Code today.

Closes #26

This pull request and its description were written by Isaac.

`setup_claude.py` hardcoded ANTHROPIC_DEFAULT_OPUS_MODEL=opus-4-7
regardless of what the workspace actually serves. On workspaces in
geos that don't have opus-4-7 (e.g. AU's adb-7405613340366915.15
serves only opus-4-6 / sonnet-4-6 / sonnet-4-5 / haiku-4-5), every
opus-tier call ENDPOINT_NOT_FOUNDs.

Adds `utils.discover_serving_endpoints()` to query the workspace's
`/api/2.0/serving-endpoints` and return the READY model names.
Workspace direct-serving endpoints reflect Databricks Geo Designated
Services policy — using this list as the validation oracle gets
GDS compliance for free, no policy parsing needed.

`setup_claude.py` now picks each tier (opus / sonnet / haiku) by
walking a priority chain against the discovered list; falls back to
the original env-set default if discovery fails (e.g. workspace
unreachable at startup) so behaviour matches main when discovery
isn't available. Logs the substitution when it happens.

Verified against live daveok (AU geo, no opus-4-7):
  Active model: databricks-claude-opus-4-6 (was opus-4-7)
  Opus tier:    databricks-claude-opus-4-6
  Sonnet tier:  databricks-claude-sonnet-4-6
  Haiku tier:   databricks-claude-haiku-4-5

Setup_codex / setup_hermes / setup_gemini follow the same pattern;
filed as follow-up so this PR stays single-agent surgical.

Co-authored-by: Isaac
@dgokeeffe
Copy link
Copy Markdown
Collaborator Author

@datasciencemonkey — flagging for priority review. P0: every non-US-geo workspace breaks Claude Code's Opus tier without this. Helper is small (utils.py +52 lines) and safe (failure ⇒ fallback to existing env defaults, behaviour matches main). Smoke test against the live AU workspace in PR body.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant