Skip to content

Python: Fix MCP client infinite loop when server does not support optional ping method#5634

Draft
giles17 wants to merge 2 commits intomicrosoft:mainfrom
giles17:agent/fix-4940-1
Draft

Python: Fix MCP client infinite loop when server does not support optional ping method#5634
giles17 wants to merge 2 commits intomicrosoft:mainfrom
giles17:agent/fix-4940-1

Conversation

@giles17
Copy link
Copy Markdown
Contributor

@giles17 giles17 commented May 4, 2026

Motivation and Context

MCP servers are not required to implement the ping method (it is optional per the MCP spec). When connecting to such a server, _ensure_connected treated ping failure as a dead connection, triggering reconnection which re-invoked load_tools/load_prompts, creating an infinite connect → load_tools → _ensure_connected → ping fails → connect loop.

Fixes #4940

Description

The root cause was that _ensure_connected did not distinguish between a genuinely dead connection and a server that simply doesn't implement the optional ping method. The fix catches McpError with the JSON-RPC "Method not found" error code (-32601) from send_ping and treats it as a valid, live connection—skipping reconnection. Any other McpError code still triggers the existing reconnection logic. Tests verify that ping-unsupported servers no longer cause infinite loops, that -32601 is treated as connected, and that other MCP errors still reconnect normally.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Note: PR autogenerated by giles17's agent

When an MCP server does not implement the optional ping method (per the
MCP spec), _ensure_connected() would catch the McpError and trigger a
reconnection. With load_tools=True, reconnection calls load_tools() which
calls _ensure_connected() again, creating an infinite loop.

Fix: Detect the JSON-RPC -32601 (Method not found) error code in the
McpError response from send_ping(). When this specific error is returned,
treat the connection as valid since the server is clearly responsive —
it just doesn't implement the optional ping method.

Fixes microsoft#4940

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 4, 2026 23:10
@giles17 giles17 self-assigned this May 4, 2026
@moonbox3 moonbox3 added the python label May 4, 2026
Copy link
Copy Markdown
Contributor Author

@giles17 giles17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 90%

✓ Correctness

The change correctly handles the case where an MCP server does not support the optional ping method by catching McpError with code -32601 and treating it as a valid connection. The exception handling order is correct (McpError before Exception), the ErrorData.code attribute access is valid, and the tests cover the key scenarios. No correctness issues found.

✓ Security Reliability

The change correctly handles the case where an MCP server does not support the optional ping method (JSON-RPC -32601 'Method not found'). The exception handling order is correct (McpError before Exception), the attribute access pattern (mcp_exc.error.code) is verified against the McpError class definition, and the logic is sound — any response (even an error) from the server proves the connection is alive. No security or reliability issues found.

✓ Test Coverage

The new tests adequately cover the main scenarios for the McpError handling in _ensure_connected: method-not-found treated as connected, non-method-not-found triggering reconnection, and the integration test verifying no infinite loop. One edge case is missing: reconnection failure after a non-method-not-found McpError (the duplicated reconnect logic in the except McpError branch at lines 1057-1063 is not tested for the failure path, unlike the identical logic in the generic except Exception branch which is tested by test_ensure_connected_wraps_reconnect_failure).

✗ Design Approach

The change fixes the immediate reconnect loop for servers that do not implement the optional ping RPC, but it still treats that unsupported optional method as the steady-state health-check path. Because _ensure_connected() continues to call send_ping() every time and only special-cases -32601 after the exception is raised, ping-less servers will remain on an exception-driven path during every paginated load_tools() / load_prompts() refresh instead of learning once that ping is unsupported or using a required request as the liveness signal.

Flagged Issues

  • _ensure_connected() still unconditionally calls send_ping() as its liveness probe even though ping is optional. Because load_prompts() and load_tools() call _ensure_connected() inside pagination loops, a server without ping will raise McpError(code=-32601) on every page load indefinitely. The better approach is to detect once that ping is unsupported and skip future probes for that session, or use a non-optional MCP operation as the connectivity check.

Automated review by giles17's agents

Comment thread python/packages/core/agent_framework/_mcp.py
Comment thread python/packages/core/tests/core/test_mcp.py Outdated
@moonbox3
Copy link
Copy Markdown
Contributor

moonbox3 commented May 4, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework
   _mcp.py6263095%153, 296, 356–357, 486, 548, 561, 563–566, 585–586, 599–602, 604–605, 609, 673–675, 1034, 1169, 1221–1222, 1225, 1243, 1696
TOTAL33642391288% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
6569 30 💤 0 ❌ 0 🔥 1m 47s ⏱️

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 91%

✓ Correctness

The change correctly handles MCP servers that don't support the optional ping method by catching McpError with JSON-RPC error code -32601 and treating the connection as valid. The exception handling order is correct, the .error.code access is consistent with existing patterns in the codebase (lines 1140, 1196), and the tests cover the key scenarios. No correctness issues found.

✓ Security Reliability

This change gracefully handles MCP servers that don't support the optional ping method by catching McpError with JSON-RPC code -32601 (Method not found) and treating the connection as valid. The implementation is consistent with existing patterns in the codebase (e.g., accessing mcp_exc.error.message at lines 1140 and 1196), the error code comparison is a safe integer equality check, and the test coverage is thorough. No security or reliability issues identified.

✓ Test Coverage

The PR adds proper handling for MCP servers that don't implement the optional ping method, preventing unnecessary reconnection attempts and potential infinite loops. Three new tests cover the happy paths well: (1) METHOD_NOT_FOUND treated as connected, (2) other McpError codes trigger reconnection, and (3) load_tools doesn't loop. However, the reconnection failure path within the new McpError handler (lines 1059-1063) is untested — only the success case is covered.

✗ Design Approach

The recursion fix for servers that omit ping addresses the reported symptom, but the underlying approach is still too narrow: _ensure_connected() continues to use an optional RPC method as a liveness probe and now hardcodes one JSON-RPC error code while still treating every other McpError as a dead connection. Elsewhere in this module, transport closure is handled via ClosedResourceError and ordinary McpErrors are surfaced as request failures, so this change bakes in a mismatched connection model rather than fixing it at the right layer.

Flagged Issues

  • python/packages/core/agent_framework/_mcp.py:1049-1058 still conflates protocol-level McpErrors with transport failure. In the same module, normal RPC errors are propagated (_mcp.py:1139-1141) and reconnects are reserved for actual closed transports (_mcp.py:176-1183). A healthier design would reconnect only on closed-session/transport signals, or remove the preflight ping entirely and let the real request retry on ClosedResourceError.

Automated review by giles17's agents

Comment thread python/packages/core/agent_framework/_mcp.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Python core MCP client connection handling so MCP servers that omit the optional ping method no longer get treated as disconnected during tool/prompt loading. It fits into the agent_framework MCP lifecycle code in python/packages/core, where connection validation drives tool discovery and prompt discovery.

Changes:

  • Adjust _ensure_connected() to treat JSON-RPC -32601 (“method not found”) from send_ping() as a live connection.
  • Keep existing reconnect behavior for other ping failures.
  • Add unit tests covering method-not-found handling, non--32601 MCP errors, and the non-looping load_tools() scenario.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
python/packages/core/agent_framework/_mcp.py Updates MCP connection validation and reconnect behavior around send_ping().
python/packages/core/tests/core/test_mcp.py Adds regression tests for unsupported ping handling and reconnect behavior.

Comment thread python/packages/core/agent_framework/_mcp.py Outdated
Comment thread python/packages/core/tests/core/test_mcp.py
Comment thread python/packages/core/agent_framework/_mcp.py Outdated
Comment thread python/packages/core/agent_framework/_mcp.py Outdated
- Cache _ping_supported flag to avoid repeated failing round-trips
  on every pagination page for servers that don't implement ping
- Add _reconnecting guard to prevent re-entrant reconnection loops
  through connect() -> load_tools() -> _ensure_connected()
- Treat all McpError responses from ping as 'transport alive' (not
  just -32601), consistent with how call_tool() handles McpError
- Reset _ping_supported on reconnect to re-probe new sessions
- Inline METHOD_NOT_FOUND constant per maintainer feedback
- Expand test coverage: caching, recursion guard, reconnect-failure
  wrapping, and non-32601 McpError no longer triggering reconnect

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@giles17 giles17 marked this pull request as draft May 6, 2026 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: [Bug]: MCP client connection goes into infinite loop if MCP server doesn't support "ping" method

4 participants