fix(server): associate ctx-scoped elicitation/sampling with the originating request (SEP-2260)#2228
fix(server): associate ctx-scoped elicitation/sampling with the originating request (SEP-2260)#2228felixweinberger wants to merge 14 commits into
Conversation
🦋 Changeset detectedLatest commit: 21aefbe The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
@modelcontextprotocol/client
@modelcontextprotocol/codemod
@modelcontextprotocol/server
@modelcontextprotocol/server-legacy
@modelcontextprotocol/express
@modelcontextprotocol/fastify
@modelcontextprotocol/hono
@modelcontextprotocol/node
commit: |
87f0810 to
285a0c0
Compare
d618283 to
f2f21b8
Compare
|
Review follow-up: added 12 unit tests in |
285a0c0 to
ae4bde3
Compare
f2f21b8 to
b678f44
Compare
|
Updated per maintainer review: dropped the verbose section comment (tests are self-explanatory); eliminated the new non-null assertions by starting the keepalive inside the stream's |
ae4bde3 to
b0ce6ce
Compare
b678f44 to
a109146
Compare
b0ce6ce to
2c5acb2
Compare
a109146 to
8cfaca3
Compare
8cfaca3 to
54ebc75
Compare
8c7ecc1 to
37150a7
Compare
…(SEP-2260) Adds three requirements covering server-request association on the 2025 path: nested server-to-client requests ride the originating request's SSE response stream, the originating stream stays alive during nested waits when transport keepalive is enabled, and ping is exempt from the association rule. Test-first: the nested-on-originating-stream and keepalive-during-nested cells are red until the corresponding server changes land.
…riginating request ctx.mcpReq.elicitInput() and ctx.mcpReq.requestSampling() now pass relatedRequestId for the request being handled, matching ctx.mcpReq.send() semantics. On streamable HTTP these requests are routed onto the originating request's SSE response stream instead of the standalone GET stream, and they work even when no GET stream is open. Explicit handler-supplied options still override.
Adds a keepAliveInterval option to WebStandardStreamableHTTPServerTransport. When set, the transport periodically writes an SSE comment frame on every open SSE stream (POST response streams and the standalone GET stream) so idle connections are not torn down by intermediaries, e.g. while a request handler waits on a nested elicitation. Disabled by default.
…ge cases Adds unit coverage for the two SEP-2260 server changes: - server.test.ts: ctx-scoped elicitInput/requestSampling pass the handled request's id as relatedRequestId at the transport.send level, explicit handler options still override, other options keep the implicit association, and the top-level Server.elicitInput()/createMessage() remain unassociated. - streamableHttp.test.ts: keepalive is off by default (no timer scheduled), comment frames are written at the configured interval, timers are cleared when a response stream completes and when the transport closes mid-stream, concurrent streams keep independent timers, and non-positive intervals are treated as disabled. The non-positive interval cases exposed a gap: keepAliveInterval <= 0 was passed straight to setInterval (which clamps to 1ms, flooding the stream). startKeepAlive now treats non-positive values as disabled.
Avoids new non-null assertions on the stream controller and makes startKeepAlive always return a stop function (no-op when disabled), so cleanup sites call it unconditionally. Also drops a redundant comment in buildContext.
37150a7 to
e9e6a0f
Compare
54ebc75 to
d3618a9
Compare
…live - Server->client requests related to a response stream that cannot carry SSE (JSON-response mode, or the stream is gone) are delivered on the standalone SSE stream instead of being silently dropped, matching their routing before they carried a relatedRequestId. - Keepalive intervals that are not finite positive numbers (NaN, Infinity) are treated as disabled instead of flooding streams with 1ms timers. - The keepalive armed for an event-replay stream is stopped when replay fails, instead of leaking the timer. - startKeepAlive JSDoc matches its actual contract (always returns a stop function) and the disabled branch reuses the noKeepAlive constant.
|
@claude review |
…elpers
SEP-2260 requires ctx-scoped server->client requests to be associated with
the originating request - that is the purpose of the ctx helpers vs the
top-level Server methods - so the association must not be overridable.
Spreading options before relatedRequestId also fixes the gotcha where an
explicit relatedRequestId: undefined would silently clobber the default.
Matches ctx.mcpReq.send(), which already does
{ ...options, relatedRequestId: request.id }.
|
One more change from maintainer review (b979896): the ctx-scoped helpers now spread Rationale: SEP-2260 requires ctx-scoped server→client requests to be associated with the originating request; that's the entire purpose of the ctx helpers vs the top-level The "explicit override wins" unit test is inverted accordingly: it now asserts the association wins over a handler-supplied |
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🟡
packages/server/src/server/streamableHttp.ts:791-802— Sibling of the replayEvents() keepalive leak (which the existing comment incorrectly declares safe for this path): in handlePostRequest's SSE branch the keepalive timer is armed synchronously in the stream's start callback (line ~799), before the awaited writePrimingEvent() → eventStore.storeEvent() call (line ~840) — if storeEvent rejects, the catch returns a 400 without calling cleanup()/stopKeepAlive(), leaving a running interval enqueueing ': keepalive' chunks into an orphaned, never-read stream until the transport closes. Apply the same fix as the replayEvents site (start the keepalive after writePrimingEvent resolves, or call stopKeepAlive() in the catch) so a fix to replayEvents alone doesn't leave this path behind.Extended reasoning...
The bug. In
handlePostRequest's SSE branch, theReadableStreamconstructor'sstartcallback runs synchronously, sostopKeepAlive = this.startKeepAlive(controller, encoder)(packages/server/src/server/streamableHttp.ts:799) arms thesetIntervalimmediately at stream construction. Only afterwards does the code reachawait this.writePrimingEvent(streamController!, encoder, streamId, clientProtocolVersion)(line ~840), which callsawait this._eventStore.storeEvent(streamId, …)when aneventStoreis configured and the client protocol version is ≥ 2025-11-25.writePrimingEventhas no internal try/catch, so a backing-store rejection (DB/Redis hiccup, store unavailable) propagates to the function-levelcatchat the end ofhandlePostRequest(line ~868), which only callsonerrorand returns a 400-32700 Parse errorresponse — it never callscleanup()/stopKeepAlive(), and theResponsewrapping the readable is never constructed, so the stream is orphaned with the timer still running.Why nothing else stops the timer. The two normal teardown paths are both unreachable on this error path: (1)
onmessageis never invoked for the request (the handler loop sits after the failed await), so no response ever flows throughsend()and the all-responses-ready →stream.cleanup()path never fires; (2) the readable is never returned to the client, so itscancelcallback can't fire either. The keepalive's own self-cleanup only triggers whencontroller.enqueue()throws — but enqueueing into an unread, still-readable stream does not throw (it just buffers past the high-water mark), so thecatchinsidestartKeepAlivenever runs. The interval keeps firing and the chunk queue keeps growing.Step-by-step proof.
- Configure the transport with both
eventStoreandkeepAliveInterval: 1000— exactly the resumable, long-human-in-the-loop deployment this PR targets. - A client POSTs a
tools/call(protocol version 2025-11-25). The SSE branch constructs theReadableStream;startruns synchronously and arms the keepalive interval. - The
_streamMapping/_requestToStreamMappingentries are registered, thenwritePrimingEventruns andeventStore.storeEvent()rejects (transient backing-store failure). - Control jumps to the catch at line ~868:
onerrorfires and a 400-32700response is returned.stopKeepAlive()is never called. - Every 1000 ms the interval enqueues
': keepalive\n\n'into the discarded stream's queue, indefinitely. Each subsequent failed POST adds another running timer plus its own growing buffer. They are only reclaimed whentransport.close()(orcloseSSEStream(requestId)) iterates_streamMappingand invokescleanup()— which on a long-lived stateful session may be hours or days away.
Why this isn't already covered. The existing inline comment about the keepalive leak covers only the
replayEvents()site and explicitly asserts that "the same pattern in handleGetRequest/handlePostRequest is fine because those paths have no awaited fallible call between arming the timer and registering the cleanup." That claim is wrong forhandlePostRequest: the awaited, falliblewritePrimingEventsits between the cleanup registration and any code path that would ever invoke that cleanup, and the catch never invokes it. Flagging this site prevents a partial fix that only patchesreplayEvents. The new keepalive unit tests cover stream completion and transport close but not the priming-event failure path, so nothing in CI catches it.Impact and scope. The keepalive timer is new in this PR, so the running-timer + growing-queue leak on this error path is PR-introduced (only the stale mapping entries were pre-existing). Mitigations keep it minor: unlike the
replayEventscase, the_streamMappingentry (whosecleanup()callsstopKeepAlive()) was registered before the await, sotransport.close()can eventually reclaim it; the trigger needseventStore+keepAliveInterval+ astoreEventrejection; and the per-failure cost is one timer plus ~13 bytes per tick. It's a bounded resource leak rather than an unbounded one.How to fix. Same one-liner family as the
replayEventsremediation: either move thestartKeepAlive()call (or defer arming the timer) until afterwritePrimingEventresolves, or callstopKeepAlive()/the registeredcleanup()forstreamIdinhandlePostRequest's catch before returning the 400. Whichever shape thereplayEventsfix takes, applying it to this sibling site at the same time keeps the keepalive lifecycle consistent across all three stream-creation paths. - Configure the transport with both
There was a problem hiding this comment.
I didn't find any new issues in this revision and the earlier findings (JSON-mode routing regression, keepalive leak on failed replay, NaN guard, docs, JSDoc, override precedence) all look addressed by the follow-up commits. That said, this changes server→client request routing in the streamable HTTP transport and adds a new public transport option, so it should still get a maintainer's review (and a changeset) before merging.
Extended reasoning...
Overview
This PR makes ctx-scoped elicitInput()/requestSampling() carry the originating request's id as relatedRequestId (now non-overridable, matching ctx.mcpReq.send()), adds a fallback in WebStandardStreamableHTTPServerTransport.send() so related server→client requests fall back to the standalone GET stream when the related response stream cannot carry SSE, and introduces an opt-in keepAliveInterval SSE keepalive. It adds substantial unit coverage (association precedence, keepalive timer lifecycle, replay-failure cleanup) plus three new e2e requirements/scenarios and a docs/server.md mention.
Security risks
No direct security exposure: no auth, crypto, or input-trust changes. The main risks are availability-shaped — keepalive timer leaks or floods (both now guarded and tested) and silent message drops from routing changes (the JSON-mode regression I flagged earlier was fixed with a fallback path and a regression test).
Level of scrutiny
This touches the core message-routing path of the production streamable HTTP transport and changes default behavior for nested server→client requests, and it adds a new public API surface (keepAliveInterval, whose shape the author explicitly leaves open for review). Per the repo's conventions (burden of proof on addition, design review first), that warrants maintainer judgment rather than bot approval, even though the implementation now looks correct to me.
Other factors
All findings from my previous review rounds were addressed in 270945d, cd284ca, 94125e7, and b979896, each with accompanying tests; the latest commit also resolves the spread-order inconsistency by making the association non-overridable, consistent with the protocol-layer ctx helpers. The remaining open item is the missing changeset (already flagged by changeset-bot) — this PR changes published-package behavior and adds an option, so it likely needs one. The bug-hunting system found no new issues on this revision.
…s response stream Server-to-client requests must ride the originating request's response stream (SEP-2260). When that stream cannot carry SSE (JSON response mode, or the stream has closed), throw a clear error instead of falling back to the standalone GET stream. The standalone stream is not a valid fallback: clients are not required to open one, so the previous behavior could hang forever while also delivering the request unassociated.
A mutation-based audit showed four of the five ctx-association unit tests only catch regressions that the e2e association scenarios already catch over real streamable HTTP. Keep the override-resistance test, which is the sole guard for handler-supplied relatedRequestId being ignored (no e2e scenario passes an explicit relatedRequestId).
# Conflicts: # packages/server/test/server/server.test.ts
Implements the 2025-lineage improvements from SEP-2260 (server-request association): ctx-scoped elicitation/sampling requests now ride the originating request's response stream, and the streamable HTTP transport gains an opt-in SSE keepalive for long human-in-the-loop waits. Adds the 4 e2e requirements that pin these behaviors.
Motivation and Context
SEP-2260 (Final) requires server→client requests to be associated with the originating client request. Tracking issue: #2186.
Scope note from spec analysis: SEP-2260's MUST-strength rules never landed in the published 2025-11-25 spec pages (which remain SHOULD-strength: server→client messages "SHOULD relate to the originating client request"), and in the current draft spec they are enforced structurally by MRTR (SEP-2322) rather than as standalone rules. This PR therefore ships what is meaningful on the 2025 lineage:
ctx.mcpReq.elicitInput()/requestSampling()were sent withoutrelatedRequestId, so on streamable HTTP they were routed to the standalone GET stream instead of the originating POST response stream (and silently undeliverable when no GET stream was open). They now carry the handled request's id, matchingctx.mcpReq.send()semantics.keepAliveIntervaloption): per SEP-2260's timeout considerations, transports SHOULD use SSE keepalives so connections survive long nested waits. Default off — no behavior change unless configured.The remaining SEP-2260 rules (client rejects unsolicited server→client requests; no API for unsolicited pushes) assert draft-era semantics and land with the MRTR implementation (#2187), where they hold by construction.
How Has This Been Tested?
4 new e2e requirements with scenarios (
protocol:assoc:*), all passing:nested-on-originating-stream— nested elicit/sample arrives on the originating POST SSE streamkeepalive-during-nested— keepalive comments observed during a nested waitping-exempt— server-initiated ping remains valid outside any request contextenableJsonResponsedocumentedFull e2e matrix: 33 files, 1,237 cells = 1,145 passed + 92 expected-fail, 0 unexpected failures; no pre-existing cell changed state. Server unit tests 68/68. Typecheck/lint clean.
Breaking Changes
None intended. Behavior change (improvement): ctx-scoped elicitation/sampling requests now arrive on the originating request's response stream instead of the standalone GET stream — this is the spec-recommended routing and also makes these requests work when no GET stream is open. Handler-supplied options still override. Caveat: under
enableJsonResponsemode, nested ctx-scoped requests are not deliverable (same pre-existing limitation asctx.mcpReq.send()).ctx-scoped elicitation/sampling in
enableJsonResponsemode (or after the response stream closed) now fails with a clear error instead of being silently dropped. Elicitation fundamentally requires a streaming response. The standalone GET stream is never used as a fallback for request-associated messages.Types of changes
Checklist
Additional context
keepAliveInterval(name, ms unit, opt-in default) is open for review feedback.