fix(sglang): authenticate engine control-plane and router calls#2068
Open
EazyReal wants to merge 1 commit into
Open
fix(sglang): authenticate engine control-plane and router calls#2068EazyReal wants to merge 1 commit into
EazyReal wants to merge 1 commit into
Conversation
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem. slime cannot drive SGLang servers or sgl-routers that enforce
--api-key. Only the post-launch health wait sends anAuthorizationheader — and when no key is configured it sends the literal stringBearer Noneon every poll. Every other control-plane call is unauthenticated:_make_request(weight updates, memory release/resume),health_generate,flush_cache,get_weight_version, pause/continue generation, start/stop profile, external-engine/server_infodiscovery, the/abort_request+/v1/loadsabort loop (#2056), and all router worker registration/listing/removal calls.Before.
--sglang-api-keyset and the server enforcing it: the server launches (the health wait carries the key), then the first control call — e.g.flush_cachebefore weight sync — gets 401;flush_cacheretries for 60 s and dies withTimeoutError("Timeout while flushing cache."), hiding the cause.--api-key: worker registration gets 401 and bring-up fails.Authorization: Bearer None, which any intermediary that validates the header rejects.After.
--sglang-api-keyis attached as bearer auth to every request slime makes to its engines, including the/v1/loadsre-abort path and external-engine discovery.--router-api-key(the existing RouterArgs CLI passthrough; verified present in the pinned sgl-router 0.3.2 wheel) is attached to router worker registration, listing, and removal — including all fourshutdown()paths. The worker's key is included in the/workersregistration payload so the router can authenticate the traffic it forwards to that worker.Authorizationheader is sent at all — requests are byte-identical to before (bearer_auth_headers(None)returnsNone, therequests/httpxdefault).flush_cachefails fast withHTTPErroron 401/403 instead of spinning for the full 60 s timeout (intentional behavior change: an auth misconfiguration never recovers by retrying).Why this fix. The root cause is that auth was a property of one call site instead of a property of the engine/router client. This change makes the key part of engine state (
server_api_key, captured once from the resolved server args) and adds a singlebearer_auth_headershelper inslime/utils/http_utils.pythreaded through every call boundary (_server_auth_headers()/_router_auth_headers()on the engine,api_keyparameters on the async abort helpers and external discovery), rather than patching individual endpoints — so future endpoints inherit auth instead of re-introducing the gap.Out of scope. Data-plane
/generatetraffic: the router authenticates to workers with the key supplied at registration; authenticating slime→router data-plane traffic against a router that enforces its own key is a follow-up.Tests.
tests/test_sglang_control_plane_auth.py(CPU-only, registered in the cpu-unittest matrix; stubs the sglang module imports and monkeypatches HTTP). 7 of its 9 tests fail against the previous code; the other 2 pin the "no key ⇒ noAuthorizationheader" invariant.tests/test_external_sglang_engines.pyis extended for theget_server_infoapi_key threading.🤖 Generated with Claude Code