Python: Fix MCP client infinite loop when server does not support optional ping method#5634
Python: Fix MCP client infinite loop when server does not support optional ping method#5634giles17 wants to merge 2 commits intomicrosoft:mainfrom
Conversation
When an MCP server does not implement the optional ping method (per the MCP spec), _ensure_connected() would catch the McpError and trigger a reconnection. With load_tools=True, reconnection calls load_tools() which calls _ensure_connected() again, creating an infinite loop. Fix: Detect the JSON-RPC -32601 (Method not found) error code in the McpError response from send_ping(). When this specific error is returned, treat the connection as valid since the server is clearly responsive — it just doesn't implement the optional ping method. Fixes microsoft#4940 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
giles17
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 90%
✓ Correctness
The change correctly handles the case where an MCP server does not support the optional ping method by catching McpError with code -32601 and treating it as a valid connection. The exception handling order is correct (McpError before Exception), the ErrorData.code attribute access is valid, and the tests cover the key scenarios. No correctness issues found.
✓ Security Reliability
The change correctly handles the case where an MCP server does not support the optional ping method (JSON-RPC -32601 'Method not found'). The exception handling order is correct (McpError before Exception), the attribute access pattern (mcp_exc.error.code) is verified against the McpError class definition, and the logic is sound — any response (even an error) from the server proves the connection is alive. No security or reliability issues found.
✓ Test Coverage
The new tests adequately cover the main scenarios for the McpError handling in
_ensure_connected: method-not-found treated as connected, non-method-not-found triggering reconnection, and the integration test verifying no infinite loop. One edge case is missing: reconnection failure after a non-method-not-found McpError (the duplicated reconnect logic in theexcept McpErrorbranch at lines 1057-1063 is not tested for the failure path, unlike the identical logic in the genericexcept Exceptionbranch which is tested bytest_ensure_connected_wraps_reconnect_failure).
✗ Design Approach
The change fixes the immediate reconnect loop for servers that do not implement the optional
pingRPC, but it still treats that unsupported optional method as the steady-state health-check path. Because_ensure_connected()continues to callsend_ping()every time and only special-cases-32601after the exception is raised, ping-less servers will remain on an exception-driven path during every paginatedload_tools()/load_prompts()refresh instead of learning once that ping is unsupported or using a required request as the liveness signal.
Flagged Issues
-
_ensure_connected()still unconditionally callssend_ping()as its liveness probe even thoughpingis optional. Becauseload_prompts()andload_tools()call_ensure_connected()inside pagination loops, a server withoutpingwill raiseMcpError(code=-32601)on every page load indefinitely. The better approach is to detect once that ping is unsupported and skip future probes for that session, or use a non-optional MCP operation as the connectivity check.
Automated review by giles17's agents
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 91%
✓ Correctness
The change correctly handles MCP servers that don't support the optional ping method by catching McpError with JSON-RPC error code -32601 and treating the connection as valid. The exception handling order is correct, the .error.code access is consistent with existing patterns in the codebase (lines 1140, 1196), and the tests cover the key scenarios. No correctness issues found.
✓ Security Reliability
This change gracefully handles MCP servers that don't support the optional ping method by catching McpError with JSON-RPC code -32601 (Method not found) and treating the connection as valid. The implementation is consistent with existing patterns in the codebase (e.g., accessing mcp_exc.error.message at lines 1140 and 1196), the error code comparison is a safe integer equality check, and the test coverage is thorough. No security or reliability issues identified.
✓ Test Coverage
The PR adds proper handling for MCP servers that don't implement the optional ping method, preventing unnecessary reconnection attempts and potential infinite loops. Three new tests cover the happy paths well: (1) METHOD_NOT_FOUND treated as connected, (2) other McpError codes trigger reconnection, and (3) load_tools doesn't loop. However, the reconnection failure path within the new McpError handler (lines 1059-1063) is untested — only the success case is covered.
✗ Design Approach
The recursion fix for servers that omit
pingaddresses the reported symptom, but the underlying approach is still too narrow:_ensure_connected()continues to use an optional RPC method as a liveness probe and now hardcodes one JSON-RPC error code while still treating every otherMcpErroras a dead connection. Elsewhere in this module, transport closure is handled viaClosedResourceErrorand ordinaryMcpErrors are surfaced as request failures, so this change bakes in a mismatched connection model rather than fixing it at the right layer.
Flagged Issues
-
python/packages/core/agent_framework/_mcp.py:1049-1058still conflates protocol-levelMcpErrors with transport failure. In the same module, normal RPC errors are propagated (_mcp.py:1139-1141) and reconnects are reserved for actual closed transports (_mcp.py:176-1183). A healthier design would reconnect only on closed-session/transport signals, or remove the preflight ping entirely and let the real request retry onClosedResourceError.
Automated review by giles17's agents
There was a problem hiding this comment.
Pull request overview
This PR updates the Python core MCP client connection handling so MCP servers that omit the optional ping method no longer get treated as disconnected during tool/prompt loading. It fits into the agent_framework MCP lifecycle code in python/packages/core, where connection validation drives tool discovery and prompt discovery.
Changes:
- Adjust
_ensure_connected()to treat JSON-RPC-32601(“method not found”) fromsend_ping()as a live connection. - Keep existing reconnect behavior for other ping failures.
- Add unit tests covering method-not-found handling, non-
-32601MCP errors, and the non-loopingload_tools()scenario.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
python/packages/core/agent_framework/_mcp.py |
Updates MCP connection validation and reconnect behavior around send_ping(). |
python/packages/core/tests/core/test_mcp.py |
Adds regression tests for unsupported ping handling and reconnect behavior. |
- Cache _ping_supported flag to avoid repeated failing round-trips on every pagination page for servers that don't implement ping - Add _reconnecting guard to prevent re-entrant reconnection loops through connect() -> load_tools() -> _ensure_connected() - Treat all McpError responses from ping as 'transport alive' (not just -32601), consistent with how call_tool() handles McpError - Reset _ping_supported on reconnect to re-probe new sessions - Inline METHOD_NOT_FOUND constant per maintainer feedback - Expand test coverage: caching, recursion guard, reconnect-failure wrapping, and non-32601 McpError no longer triggering reconnect Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation and Context
MCP servers are not required to implement the
pingmethod (it is optional per the MCP spec). When connecting to such a server,_ensure_connectedtreated ping failure as a dead connection, triggering reconnection which re-invokedload_tools/load_prompts, creating an infiniteconnect → load_tools → _ensure_connected → ping fails → connectloop.Fixes #4940
Description
The root cause was that
_ensure_connecteddid not distinguish between a genuinely dead connection and a server that simply doesn't implement the optional ping method. The fix catchesMcpErrorwith the JSON-RPC "Method not found" error code (-32601) fromsend_pingand treats it as a valid, live connection—skipping reconnection. Any otherMcpErrorcode still triggers the existing reconnection logic. Tests verify that ping-unsupported servers no longer cause infinite loops, that-32601is treated as connected, and that other MCP errors still reconnect normally.Contribution Checklist