-
Notifications
You must be signed in to change notification settings - Fork 1.8k
fix(client): allow transport restart after close() #1828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
db137d9
fde9748
043e452
8e0b918
57d2cbf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| --- | ||
| '@modelcontextprotocol/client': patch | ||
| --- | ||
|
|
||
| Allow `StreamableHTTPClientTransport` and `SSEClientTransport` to restart after `close()`. `close()` now clears `_abortController` (previously aborted but not unset, blocking the start guard) and `_sessionId` (previously leaked into post-restart requests, causing 404s). |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -230,16 +230,21 @@ | |
| }); | ||
| } | ||
|
|
||
| private async _startOrAuthSse(options: StartSSEOptions, isAuthRetry = false): Promise<void> { | ||
| const { resumptionToken } = options; | ||
|
|
||
| // Capture the signal active when this call started so a close()/start() during the awaits | ||
| // below doesn't bind this stale GET to the restarted transport's controller. | ||
| const signal = this._abortController?.signal; | ||
|
|
||
| try { | ||
| // Try to open an initial SSE stream with GET to listen for server messages | ||
| // This is optional according to the spec - server may not support it | ||
| const headers = await this._commonHeaders(); | ||
| if (signal?.aborted) return; | ||
| const userAccept = headers.get('accept'); | ||
| const types = [...(userAccept?.split(',').map(s => s.trim().toLowerCase()) ?? []), 'text/event-stream']; | ||
| headers.set('accept', [...new Set(types)].join(', ')); | ||
|
Check failure on line 247 in packages/client/src/client/streamableHttp.ts
|
||
|
Comment on lines
233
to
247
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔴 The 401 auth-retry path in Extended reasoning...What the bug is and how it manifests At the top of _startOrAuthSse (line 238), the PR correctly captures const signal = this._abortController?.signal to prevent a close()+start() during async work from binding a stale GET to the new lifecycle. There is an existing if (signal?.aborted) return; check after await this._commonHeaders() (line 246). However, the 401 auth-retry path (lines 270-277) contains its own async suspension — await this._authProvider.onUnauthorized({...}) — without any corresponding guard before the recursive call. The specific code path that triggers it The relevant code is: Why existing code does not prevent it The signal variable captured at line 238 IS S1 (the old lifecycle). After onUnauthorized() resolves, signal (S1) is already aborted. But the code never inspects signal at this point — it proceeds directly to the recursive call. The recursive invocation then runs const signal = this._abortController?.signal at ITS line 238, which now captures S2 (the new non-aborted controller). The subsequent if (signal?.aborted) return check sees S2 (not aborted) and proceeds to open the GET fetch with S2 as its signal, fully binding the ghost stream to the new lifecycle. Step-by-step proof
Impact The ghost stream is associated with the new lifecycle's controller (S2), so it IS cleaned up on the next close(). However, it carries the old-lifecycle's options — including a potentially stale resumptionToken / Last-Event-ID — to the server, which may replay stale events into the new session or confuse server-side session state. This is precisely the ghost SSE pattern the PR's signal-capture design was meant to prevent, but the auth-retry path was missed. How to fix it Add if (signal?.aborted) return; immediately before return this._startOrAuthSse(options, true): This mirrors the existing guard after _commonHeaders() and correctly uses the already-captured S1 which is aborted at this point, preventing the recursive call from re-reading the replaced controller. |
||
|
|
||
| // Include Last-Event-ID header for resumable streams if provided | ||
| if (resumptionToken) { | ||
|
|
@@ -250,7 +255,7 @@ | |
| ...this._requestInit, | ||
| method: 'GET', | ||
| headers, | ||
| signal: this._abortController?.signal | ||
| signal | ||
| }); | ||
|
|
||
| if (!response.ok) { | ||
|
|
@@ -341,11 +346,18 @@ | |
| // Calculate next delay based on current attempt count | ||
| const delay = this._getNextReconnectionDelay(attemptCount); | ||
|
|
||
| // Capture the signal active when this reconnection was scheduled. close() + start() | ||
| // replaces this._abortController, so re-reading it later would see the new session's | ||
| // controller and allow a stale reconnect to fire into the restarted transport. | ||
| const signal = this._abortController?.signal; | ||
|
|
||
| const reconnect = (): void => { | ||
| this._cancelReconnection = undefined; | ||
| if (this._abortController?.signal.aborted) return; | ||
| if (signal?.aborted) return; | ||
| this._startOrAuthSse(options).catch(error => { | ||
| if (signal?.aborted) return; | ||
| this.onerror?.(new Error(`Failed to reconnect SSE stream: ${error instanceof Error ? error.message : String(error)}`)); | ||
| if (signal?.aborted) return; | ||
| try { | ||
| this._scheduleReconnection(options, attemptCount + 1); | ||
claude[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } catch (scheduleError) { | ||
|
|
@@ -369,6 +381,9 @@ | |
| } | ||
| const { onresumptiontoken, replayMessageId } = options; | ||
|
|
||
| // Capture the signal this stream is bound to so we don't reconnect into a restarted transport. | ||
| const signal = this._abortController?.signal; | ||
|
|
||
| let lastEventId: string | undefined; | ||
| // Track whether we've received a priming event (event with ID) | ||
| // Per spec, server SHOULD send a priming event with ID before closing | ||
|
|
@@ -436,7 +451,7 @@ | |
| // BUT don't reconnect if we already received a response - the request is complete | ||
| const canResume = isReconnectable || hasPrimingEvent; | ||
| const needsReconnect = canResume && !receivedResponse; | ||
| if (needsReconnect && this._abortController && !this._abortController.signal.aborted) { | ||
| if (needsReconnect && signal && !signal.aborted) { | ||
| this._scheduleReconnection( | ||
| { | ||
| resumptionToken: lastEventId, | ||
|
|
@@ -455,7 +470,7 @@ | |
| // BUT don't reconnect if we already received a response - the request is complete | ||
| const canResume = isReconnectable || hasPrimingEvent; | ||
| const needsReconnect = canResume && !receivedResponse; | ||
| if (needsReconnect && this._abortController && !this._abortController.signal.aborted) { | ||
| if (needsReconnect && signal && !signal.aborted) { | ||
| // Use the exponential backoff reconnection strategy | ||
| try { | ||
| this._scheduleReconnection( | ||
|
|
@@ -476,7 +491,7 @@ | |
| } | ||
|
|
||
| async start() { | ||
| if (this._abortController) { | ||
| if (this._abortController && !this._abortController.signal.aborted) { | ||
| throw new Error( | ||
| 'StreamableHTTPClientTransport already started! If using Client class, note that connect() calls start() automatically.' | ||
| ); | ||
claude[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
@@ -511,6 +526,9 @@ | |
| } finally { | ||
| this._cancelReconnection = undefined; | ||
| this._abortController?.abort(); | ||
| this._sessionId = undefined; | ||
| this._lastUpscopingHeader = undefined; | ||
| this._serverRetryMs = undefined; | ||
| this.onclose?.(); | ||
| } | ||
| } | ||
claude[bot] marked this conversation as resolved.
Show resolved
Hide resolved
Comment on lines
526
to
534
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔴 Extended reasoning...The PR establishes a clear pattern: all per-lifecycle state must be reset in close() so the transport can be safely reused after restart. It correctly adds this._sessionId = undefined, this._lastUpscopingHeader = undefined, and this._serverRetryMs = undefined to the finally block. However, _resourceMetadataUrl (initialized to undefined in the constructor at line ~194) and _scope (line ~195) are omitted from this cleanup, breaking the same invariant. Where these fields are set: _resourceMetadataUrl and _scope are set during 401 handling in _startOrAuthSse() (lines 260-261) and _send() (lines 580-581). _scope is also updated in the 403/insufficient_scope path (lines 620, 624). The 403 path is especially subtle: _resourceMetadataUrl is only updated conditionally (if (resourceMetadataUrl) { this._resourceMetadataUrl = resourceMetadataUrl; }), so if a new lifecycle 403 response omits the resource_metadata parameter, the stale lifecycle-1 URL persists. Where these fields are consumed: Both are passed directly to auth() in finishAuth() (lines 508-509: resourceMetadataUrl: this._resourceMetadataUrl, scope: this._scope) and in the 403 upscoping path (lines 631-632). These values drive token endpoint discovery; using the wrong URL means the client contacts the wrong OAuth authorization server. Step-by-step failure scenario: (1) Lifecycle 1: server returns 401 with WWW-Authenticate header containing resource_metadata=https://old-server/resource, setting _resourceMetadataUrl. (2) close() is called; _resourceMetadataUrl is NOT cleared. (3) OAuth server config changes to a new resource metadata URL. (4) start() is called (newly enabled by this PR). (5) New lifecycle triggers a 401 WITHOUT a www-authenticate header, which is valid per spec when the server assumes the client already has the metadata. The guard at line 258 (if response.headers.has('www-authenticate')) is false, so _resourceMetadataUrl is NOT refreshed. (6) finishAuth() or onUnauthorized calls auth() with the stale _resourceMetadataUrl, performing token endpoint discovery against the wrong server. Why existing code does not prevent this: The 401 path only updates _resourceMetadataUrl/_scope when the www-authenticate header is present. A spec-compliant server may omit this header on subsequent 401s after the client has already received the metadata. Before this PR, start() after close() threw already started, so users always created a new transport instance that re-initialized both fields to undefined in the constructor. This PR enables same-instance restart without the corresponding field resets, the same class of omission as _sessionId, _lastUpscopingHeader, and _serverRetryMs already fixed by this PR. Fix: Add this._resourceMetadataUrl = undefined and this._scope = undefined to the finally block in close(), alongside the existing this._sessionId = undefined, following the exact same pattern the PR already uses for the other per-lifecycle fields.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These need to survive close(). The OAuth re-auth flow this PR enables is |
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 SSEClientTransport.close() now clears _eventSource (enabling restart), but omits _resourceMetadataUrl and _scope — after close()+start(), if the new lifecycle's 401 arrives without a WWW-Authenticate header (valid per RFC 9110), stale values from the prior lifecycle are passed directly to auth() in finishAuth(), targeting the wrong OAuth server. Fix: add this._resourceMetadataUrl = undefined and this._scope = undefined to close().
Extended reasoning...
What the bug is and how it manifests
SSEClientTransport declares _resourceMetadataUrl (line 71) and _scope (line 72) as instance fields. The PR's change to close() correctly adds this._eventSource = undefined (enabling restart via the start() guard at line 219) and this._abortController = undefined, but does NOT add this._resourceMetadataUrl = undefined or this._scope = undefined. These fields carry OAuth discovery state from one lifecycle and silently contaminate the next.
The specific code path that triggers it
_resourceMetadataUrl and _scope are set in two places: (1) inside _startOrAuth()'s EventSource fetch interceptor at lines 138–139, guarded by if (response.headers.has('www-authenticate')); and (2) inside _send() at lines 276–277, same guard. Both assignments only fire when the 401 response includes a WWW-Authenticate header. The stale values are consumed in finishAuth() at lines 233–234 where they are passed directly to auth() for OAuth token endpoint discovery.
Why existing code does not prevent it
The 401 handling in both _startOrAuth() and _send() only refreshes _resourceMetadataUrl and _scope when www-authenticate is present. Per RFC 9110 §11.6.1 and the MCP OAuth spec, a server may omit the WWW-Authenticate header on subsequent 401 responses once the client has previously received the resource metadata — the server may assume the client already knows where to go. In that case, the conditional assignments are skipped and the stale values from the prior lifecycle persist unchecked.
Impact
Before this PR, start() after close() always threw 'SSEClientTransport already started!' because the start() guard (if (this._eventSource)) saw the non-null _eventSource that close() did not clear. Users were forced to instantiate a new transport, which re-initializes _resourceMetadataUrl = undefined and _scope = undefined in the constructor (lines 87–88). This PR adds this._eventSource = undefined to close(), making same-instance restart possible — but without the corresponding field resets. The primary scenario this PR targets (start → 401 → close → OAuth → start) is exactly the path that triggers the stale-state bug.
Step-by-step proof
How to fix it
Add the two missing resets to close():
This mirrors the exact pattern the PR already uses for _sessionId, _lastUpscopingHeader, and _serverRetryMs in StreamableHTTPClientTransport.close(), and follows what the constructor already does at lines 87–88.