Skip to content

fix(sglang): authenticate engine control-plane and router calls#2068

Open
EazyReal wants to merge 1 commit into
THUDM:mainfrom
EazyReal:upstream-pr/sglang-control-plane-auth
Open

fix(sglang): authenticate engine control-plane and router calls#2068
EazyReal wants to merge 1 commit into
THUDM:mainfrom
EazyReal:upstream-pr/sglang-control-plane-auth

Conversation

@EazyReal

Copy link
Copy Markdown
Contributor

Problem. slime cannot drive SGLang servers or sgl-routers that enforce --api-key. Only the post-launch health wait sends an Authorization header — and when no key is configured it sends the literal string Bearer None on every poll. Every other control-plane call is unauthenticated: _make_request (weight updates, memory release/resume), health_generate, flush_cache, get_weight_version, pause/continue generation, start/stop profile, external-engine /server_info discovery, the /abort_request + /v1/loads abort loop (#2056), and all router worker registration/listing/removal calls.

Before.

  • With --sglang-api-key set and the server enforcing it: the server launches (the health wait carries the key), then the first control call — e.g. flush_cache before weight sync — gets 401; flush_cache retries for 60 s and dies with TimeoutError("Timeout while flushing cache."), hiding the cause.
  • With a router enforcing its --api-key: worker registration gets 401 and bring-up fails.
  • With no key configured: every health poll still sends Authorization: Bearer None, which any intermediary that validates the header rejects.

After.

  • --sglang-api-key is attached as bearer auth to every request slime makes to its engines, including the /v1/loads re-abort path and external-engine discovery.
  • --router-api-key (the existing RouterArgs CLI passthrough; verified present in the pinned sgl-router 0.3.2 wheel) is attached to router worker registration, listing, and removal — including all four shutdown() paths. The worker's key is included in the /workers registration payload so the router can authenticate the traffic it forwards to that worker.
  • When no key is configured, no Authorization header is sent at all — requests are byte-identical to before (bearer_auth_headers(None) returns None, the requests/httpx default).
  • flush_cache fails fast with HTTPError on 401/403 instead of spinning for the full 60 s timeout (intentional behavior change: an auth misconfiguration never recovers by retrying).

Why this fix. The root cause is that auth was a property of one call site instead of a property of the engine/router client. This change makes the key part of engine state (server_api_key, captured once from the resolved server args) and adds a single bearer_auth_headers helper in slime/utils/http_utils.py threaded through every call boundary (_server_auth_headers() / _router_auth_headers() on the engine, api_key parameters on the async abort helpers and external discovery), rather than patching individual endpoints — so future endpoints inherit auth instead of re-introducing the gap.

Out of scope. Data-plane /generate traffic: the router authenticates to workers with the key supplied at registration; authenticating slime→router data-plane traffic against a router that enforces its own key is a follow-up.

Tests. tests/test_sglang_control_plane_auth.py (CPU-only, registered in the cpu-unittest matrix; stubs the sglang module imports and monkeypatches HTTP). 7 of its 9 tests fail against the previous code; the other 2 pin the "no key ⇒ no Authorization header" invariant. tests/test_external_sglang_engines.py is extended for the get_server_info api_key threading.

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant