Skip to content

fix(agent): reuse a pooled SGLang client across turns and retry once on pre-connect connector errors#2069

Open
EazyReal wants to merge 1 commit into
THUDM:mainfrom
EazyReal:upstream-pr/agent-sglang-client-reuse
Open

fix(agent): reuse a pooled SGLang client across turns and retry once on pre-connect connector errors#2069
EazyReal wants to merge 1 commit into
THUDM:mainfrom
EazyReal:upstream-pr/agent-sglang-client-reuse

Conversation

@EazyReal

Copy link
Copy Markdown
Contributor

Problem. call_sglang_generate opens a fresh aiohttp.ClientSession (and TCP connector) for every /generate call — one per agent turn — plus another short-lived session per /abort_request. Under concurrent multi-turn agent rollouts this churns sockets (accumulating TIME_WAIT connections) and intermittently surfaces as aiohttp.ClientConnectorError, which fails the whole turn — and with it the trajectory — even though the request never reached SGLang.

Before. Each turn: new session → new TCP connection → request → teardown. A transient connect-time failure (ephemeral-port/socket churn, brief router unavailability) raises ClientConnectorError with no recovery: a 200-turn rollout batch dies on a single failed connect() even though no request was ever in flight.

After. Each adapter app owns one pooled ClientSession, registered via aiohttp's cleanup_ctx in BaseAdapter.__init__ and closed on app shutdown, so turns reuse warm keep-alive connections instead of reconnecting per call. A ClientConnectorError is retried exactly once with the same rid: connector errors are raised before any request bytes are written, so the server never saw the rid and the retry cannot double-generate. Errors after the request may have reached the server (disconnects, timeouts, HTTP >= 400) are NOT retried — they still go through the existing abort-by-rid path, now over the pooled client.

Why this fix. The root cause is per-call connection churn, so the fix pools at the adapter-app boundary where the lifecycle hook already exists (cleanup_ctx), rather than papering over symptoms with a blanket retry. The retry is restricted to the only failure class that is provably idempotent (pre-connect), preserving the invariant that any request that may have reached SGLang is aborted by rid, never re-issued. Request body, headers, sampling params, rid semantics, and the call_sglang_generate signature are unchanged; both provider adapters (openai/anthropic) get the pooled client for free through BaseAdapter. The connector uses limit=0 to match the previous (unbounded, per-call) behavior — concurrency stays governed by the rollout scheduler.

Tests (tests/test_agent_adapters.py, CPU-only, already in CI):

  • pooled client identity is stable across turns;
  • a pre-connect connector error is retried exactly once and reuses the same rid;
  • a mid-flight failure (ServerDisconnectedError) is never retried and aborts by rid;
  • two consecutive connector errors propagate after exactly one retry, then abort.

Each test was mutation-checked: regenerating the rid on retry, bypassing the pooled client, widening the retry to aiohttp.ClientError, or removing the retry bound each fails at least one test.

🤖 Generated with Claude Code

… pre-connect errors

call_sglang_generate opens a fresh aiohttp.ClientSession (and TCP
connection) for every /generate turn, plus another short-lived session
per /abort_request. Under concurrent multi-turn agent rollouts this
churns sockets and intermittently surfaces as
aiohttp.ClientConnectorError, failing the trajectory even though the
request never reached SGLang.

- Register one pooled aiohttp.ClientSession per adapter app via an
  aiohttp cleanup_ctx (sglang_client_context), built over a shared
  TCPConnector (limit=0, ttl_dns_cache=300, keepalive_timeout=60) and
  closed on app shutdown. BaseAdapter.__init__ appends it to
  app.cleanup_ctx, so every consumer that runs the app through a runner
  gets it for free.
- Retry the /generate POST exactly once on aiohttp.ClientConnectorError,
  reusing the SAME rid. A connector error is raised before any request
  bytes are written, so the server never saw the rid and the retry
  cannot double-generate.
- Errors after the request may have reached the server (disconnects,
  timeouts, HTTP errors) are NOT retried; they still flow through the
  existing abort-by-rid path, now over the pooled client.

Tests pin the invariants: stable client identity across turns; retry
happens exactly once and reuses the rid; mid-flight failures are never
retried and abort by rid; a second consecutive connector error
propagates after exactly one retry.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant