fix(agent): reuse a pooled SGLang client across turns and retry once on pre-connect connector errors by EazyReal · Pull Request #2069 · THUDM/slime

EazyReal · 2026-06-12T08:02:30Z

Problem. call_sglang_generate opens a fresh aiohttp.ClientSession (and TCP connector) for every /generate call — one per agent turn — plus another short-lived session per /abort_request. Under concurrent multi-turn agent rollouts this churns sockets (accumulating TIME_WAIT connections) and intermittently surfaces as aiohttp.ClientConnectorError, which fails the whole turn — and with it the trajectory — even though the request never reached SGLang.

Before. Each turn: new session → new TCP connection → request → teardown. A transient connect-time failure (ephemeral-port/socket churn, brief router unavailability) raises ClientConnectorError with no recovery: a 200-turn rollout batch dies on a single failed connect() even though no request was ever in flight.

After. Each adapter app owns one pooled ClientSession, registered via aiohttp's cleanup_ctx in BaseAdapter.__init__ and closed on app shutdown, so turns reuse warm keep-alive connections instead of reconnecting per call. A ClientConnectorError is retried exactly once with the same rid: connector errors are raised before any request bytes are written, so the server never saw the rid and the retry cannot double-generate. Errors after the request may have reached the server (disconnects, timeouts, HTTP >= 400) are NOT retried — they still go through the existing abort-by-rid path, now over the pooled client.

Why this fix. The root cause is per-call connection churn, so the fix pools at the adapter-app boundary where the lifecycle hook already exists (cleanup_ctx), rather than papering over symptoms with a blanket retry. The retry is restricted to the only failure class that is provably idempotent (pre-connect), preserving the invariant that any request that may have reached SGLang is aborted by rid, never re-issued. Request body, headers, sampling params, rid semantics, and the call_sglang_generate signature are unchanged; both provider adapters (openai/anthropic) get the pooled client for free through BaseAdapter. The connector uses limit=0 to match the previous (unbounded, per-call) behavior — concurrency stays governed by the rollout scheduler.

Tests (tests/test_agent_adapters.py, CPU-only, already in CI):

pooled client identity is stable across turns;
a pre-connect connector error is retried exactly once and reuses the same rid;
a mid-flight failure (ServerDisconnectedError) is never retried and aborts by rid;
two consecutive connector errors propagate after exactly one retry, then abort.

Each test was mutation-checked: regenerating the rid on retry, bypassing the pooled client, widening the retry to aiohttp.ClientError, or removing the retry bound each fails at least one test.

🤖 Generated with Claude Code

… pre-connect errors call_sglang_generate opens a fresh aiohttp.ClientSession (and TCP connection) for every /generate turn, plus another short-lived session per /abort_request. Under concurrent multi-turn agent rollouts this churns sockets and intermittently surfaces as aiohttp.ClientConnectorError, failing the trajectory even though the request never reached SGLang. - Register one pooled aiohttp.ClientSession per adapter app via an aiohttp cleanup_ctx (sglang_client_context), built over a shared TCPConnector (limit=0, ttl_dns_cache=300, keepalive_timeout=60) and closed on app shutdown. BaseAdapter.__init__ appends it to app.cleanup_ctx, so every consumer that runs the app through a runner gets it for free. - Retry the /generate POST exactly once on aiohttp.ClientConnectorError, reusing the SAME rid. A connector error is raised before any request bytes are written, so the server never saw the rid and the retry cannot double-generate. - Errors after the request may have reached the server (disconnects, timeouts, HTTP errors) are NOT retried; they still flow through the existing abort-by-rid path, now over the pooled client. Tests pin the invariants: stable client identity across turns; retry happens exactly once and reuses the rid; mid-flight failures are never retried and abort by rid; a second consecutive connector error propagates after exactly one retry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

EazyReal mentioned this pull request Jun 12, 2026

Mid-flight /generate retry can double-generate: _post retries any exception and there is no rid to abort the orphaned request radixark/miles#1334

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): reuse a pooled SGLang client across turns and retry once on pre-connect connector errors#2069

fix(agent): reuse a pooled SGLang client across turns and retry once on pre-connect connector errors#2069
EazyReal wants to merge 1 commit into
THUDM:mainfrom
EazyReal:upstream-pr/agent-sglang-client-reuse

EazyReal commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EazyReal commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant