Conversation
- Bug 1 (P0): _get_playwright now checks if the creating thread is still alive before returning the cached Playwright instance. When the creating thread has exited (e.g. after a LangGraph ToolNode executor shuts down), the stale greenlet-bound instance is discarded and a fresh one is created in the current thread. Concurrent callers in the same executor (creating thread still alive) continue to share one connection. - Bug 2 (P1): Change browser_navigate default wait_until from 'load' to 'domcontentloaded'. Sites where external sub-resources are unreachable from the sandbox never fire the 'load' event, causing a permanent 90s timeout. 'domcontentloaded' fires reliably once the DOM is ready. - Bug 3 (P2): Detect 'cannot switch to a different thread' errors in _run_in_sandbox's catch-all Exception handler and reset the cached Playwright instance + sandbox so the next call can recover cleanly. Also: - _reset_playwright now also clears _playwright_thread - Updated all test fixtures to set _playwright_thread = None - Fixed concurrent caching test to use a hold barrier keeping threads alive - Added TestBrowserToolSetThreadAwareness (dead-thread recreation, live- thread reuse) and TestBrowserToolSetGreenletErrorHandling test classes Type check: passed (297 source files, no issues) Co-authored-by: OhYee <13498329+OhYee@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes BrowserToolSet Playwright caching failures caused by greenlet thread-affinity when LangGraph runs tool calls on short-lived worker threads, and improves navigation robustness.
Changes:
- Add thread-liveness tracking to invalidate/recreate cached Playwright connections.
- Change
browser_navigatedefaultwait_untilfrom"load"to"domcontentloaded"to reduce sandbox timeouts. - Add recovery logic for greenlet thread-binding errors and extend unit tests around caching/reset behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
agentrun/integration/builtin/sandbox.py |
Adds _playwright_thread tracking, greenlet-error recovery path, and updates browser_navigate default wait condition. |
tests/unittests/integration/test_browser_toolset_error_handling.py |
Updates fixtures for _playwright_thread and adds new tests for thread awareness / greenlet error handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| except Exception as e: | ||
| logger.debug("Unexpected error in browser sandbox: %s", e) | ||
| error_msg = str(e) | ||
| if "cannot switch to" in error_msg: | ||
| logger.debug( |
There was a problem hiding this comment.
The greenlet recovery path detects thread-binding errors via a substring check ("cannot switch to"). This is brittle (message text can change between greenlet/Playwright versions) and may miss the real exception type. Prefer handling the concrete exception type (e.g., greenlet.error) in a dedicated except block (guarded by a safe import) rather than relying on message parsing.
There was a problem hiding this comment.
Fixed in ac0252e. Added a GreenletError import with ImportError fallback (same pattern as PlaywrightError at the top of the file), then replaced the brittle string check with a dedicated except GreenletError block placed before the generic except Exception. The sandbox is now preserved on greenlet errors (only Playwright is reset and retried) since the failure is client-side thread affinity, not a sandbox crash.
| """测试并发调用 _get_playwright 只创建一个连接,不会泄漏 | ||
|
|
||
| 所有工作线程在同一 executor 内并发运行(即创建线程仍存活), | ||
| 应复用同一连接,不会触发重建。 | ||
| """ | ||
| start_barrier = threading.Barrier(5) | ||
| # Keep all threads alive until every thread has obtained playwright, | ||
| # simulating concurrent workers within the same executor context. | ||
| hold_barrier = threading.Barrier(5) | ||
| results: list = [] | ||
|
|
||
| def worker(): | ||
| barrier.wait() | ||
| start_barrier.wait() | ||
| p = toolset._get_playwright(mock_sandbox) | ||
| results.append(p) | ||
| hold_barrier.wait() # stay alive so is_alive() == True for peers |
There was a problem hiding this comment.
This concurrency test asserts that _get_playwright returns the same cached Playwright object to multiple OS threads. With Playwright Sync API, the underlying greenlet is thread-affine, so sharing a single instance across threads will still crash when those threads actually call methods on it. If the intended fix is thread-safe caching, the test should reflect that (e.g., per-thread caching) rather than validating cross-thread sharing.
There was a problem hiding this comment.
Already applied in ac0252e. The test was renamed to test_concurrent_get_playwright_each_thread_gets_own_connection and now validates that each OS thread receives its own Playwright connection rather than asserting cross-thread sharing (which is unsafe with the Sync API's greenlet binding).
| def test_get_playwright_live_thread_not_recreated( | ||
| self, toolset, mock_sandbox | ||
| ): | ||
| """测试创建线程仍存活时不重建连接(并发安全) | ||
|
|
||
| 即使在不同线程中调用,只要创建线程仍然存活,就应复用同一连接。 | ||
| 这模拟同一 executor 内并发工具调用的场景。 | ||
| """ | ||
| results: list = [] | ||
|
|
||
| # Create connection in main thread first | ||
| toolset._get_playwright(mock_sandbox) | ||
| # The creating thread (main test thread) is still alive | ||
|
|
||
| # Another thread should reuse the same connection | ||
| def worker(): | ||
| p = toolset._get_playwright(mock_sandbox) | ||
| results.append(p) | ||
|
|
||
| t = threading.Thread(target=worker) | ||
| t.start() | ||
| t.join() | ||
|
|
||
| assert len(results) == 1 | ||
| assert results[0] is toolset._playwright_sync | ||
| mock_sandbox.sync_playwright.assert_called_once() | ||
|
|
There was a problem hiding this comment.
test_get_playwright_live_thread_not_recreated expects that a Playwright instance created on the main thread can be reused from a different live thread. For Playwright Sync API, this is exactly the unsafe case that triggers greenlet.error: cannot switch to a different thread. The desired behavior should be either (a) recreate a thread-local Playwright instance for the worker thread, or (b) prevent calling into Playwright from a different thread; the test should be updated accordingly.
There was a problem hiding this comment.
Already applied in ac0252e. The test was replaced by test_get_playwright_different_live_thread_recreates_connection, which validates that any different OS thread triggers a new connection — even when the creating thread is still alive — since Playwright Sync API greenlets are never safe to use from a different thread.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: OhYee <oyohyee@oyohyee.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: OhYee <oyohyee@oyohyee.com>
…read - Add GreenletError import with ImportError fallback (same pattern as PlaywrightError) - Replace brittle 'cannot switch to' substring check with dedicated except GreenletError block placed before the generic except Exception, so the correct exception type is matched regardless of message wording - Keep existing sandbox on greenlet error (failure is client-side thread affinity, not a sandbox crash); only reset Playwright and retry - _get_playwright already checks current_thread is not creator_thread (any different thread triggers recreation, not just dead threads) - Update concurrent caching test: each thread now gets its own connection (Playwright Sync API is thread-affine, cross-thread sharing is unsafe) - Replace test_get_playwright_live_thread_not_recreated with test_get_playwright_different_live_thread_recreates_connection to validate the correct per-thread isolation behavior - Update greenlet error tests to use real greenlet.error, verify sandbox is preserved, and check retry behavior Type check: passed (297 source files, no issues) Co-authored-by: OhYee <13498329+OhYee@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
…t toolsets Removed the comprehensive LangChain invoke methods integration test file which was testing various invocation patterns (astream_events, astream, stream, invoke, ainvoke) with both streaming and non-streaming scenarios. Also renamed TestToolSet to SampleToolSet across multiple integration test files to avoid confusion with actual test classes. The removed test file contained extensive setup for mocking OpenAI protocol services and comprehensive test cases for different LangChain/LangGraph invocation methods including streaming protocols and tool call scenarios. The toolset renaming affects AgentScope, CrewAI, Google ADK, LangChain, LangGraph, and PydanticAI integration tests to maintain consistency and prevent naming conflicts. feat(browser-toolset): add thread awareness and error handling for Playwright connections Added comprehensive thread awareness to BrowserToolSet's Playwright connection management. The implementation now tracks which thread created the Playwright connection and properly recreates it when the creating thread exits, preventing greenlet binding errors in multi-threaded environments like LangGraph ToolNodes. Also added proper error handling for 'cannot switch to a different thread' exceptions, automatically resetting both Playwright connections and sandbox instances when such errors occur to prevent cascading failures. The changes include new test coverage for concurrent access, thread lifecycle management, and error recovery scenarios to ensure robust operation in production environments. test(browser-toolset): enhance error handling tests with thread awareness Expanded the browser toolset error handling tests to cover thread-aware Playwright connection management. Added comprehensive test cases for concurrent access patterns, thread lifecycle scenarios, and specific error handling for greenlet binding issues that occur when switching between threads in LangGraph environments. The new tests verify proper connection reuse within the same thread, correct recreation when the creating thread exits, and appropriate error handling for cross-thread Playwright operations. chore(test): remove duplicate LangChain invoke methods test file Removed the duplicate test file that appears to be an older version of the LangChain invoke methods tests, maintaining only the comprehensive version that covers all invocation patterns and streaming scenarios. test: 移除 LangChain 调用方法集成测试并重命名测试工具集 移除了全面的 LangChain 调用方法集成测试文件,该文件测试了各种调用模式(astream_events、astream、stream、invoke、ainvoke)以及流式和非流式场景。同时在多个集成测试文件中将 TestToolSet 重命名为 SampleToolSet,以避免与实际测试类混淆。 被删除的测试文件包含大量用于模拟 OpenAI 协议服务的设置,以及针对不同 LangChain/LangGraph 调用方法的全面测试用例,包括流式协议和工具调用场景。 工具集重命名影响了 AgentScope、CrewAI、Google ADK、LangChain、LangGraph 和 PydanticAI 集成测试,以保持一致性并防止命名冲突。 feat(browser-toolset): 为 Playwright 连接添加线程感知和错误处理功能 为 BrowserToolSet 的 Playwright 连接管理添加了全面的线程感知功能。实现现在跟踪哪个线程创建了 Playwright 连接,并在线程退出时正确重新创建它,防止在多线程环境(如 LangGraph ToolNodes)中的 greenlet 绑定错误。 还为 'cannot switch to a different thread' 异常添加了适当的错误处理,在发生此类错误时自动重置 Playwright 连接和沙箱实例,以防止级联故障。 更改包括对并发访问、线程生命周期管理和错误恢复场景的新测试覆盖,以确保在生产环境中稳健运行。 test(browser-toolset): 使用线程感知增强错误处理测试 扩展浏览器工具集错误处理测试以涵盖线程感知的 Playwright 连接管理。添加了对并发访问模式、线程生命周期场景的全面测试用例,以及针对在 LangGraph 环境中跨线程切换时发生的 greenlet 绑定问题的具体错误处理。 新测试验证同一线程内的正确连接复用、创建线程退出时的正确重建,以及跨线程 Playwright 操作的适当错误处理。 chore(test): 删除重复的 LangChain 调用方法测试文件 删除似乎是 LangChain 调用方法测试旧版本的重复测试文件,只保留涵盖所有调用模式和流式场景的完整版本。 Change-Id: I4d963b9b041bf81cc2f5f20d3c2f7cd47abe6bbb Signed-off-by: OhYee <oyohyee@oyohyee.com>
The Playwright connection cache introduced in v0.0.22 assumed all tool calls run on the same OS thread. LangGraph
ToolNodecreates a newContextThreadPoolExecutorper invocation, so each tool call lands on a different short-lived worker thread — the cached greenlet is bound to the dead thread, making every call after the first fail with"cannot switch to a different thread (which happens to have exited)".Changes
Bug 1 (P0) — Per-thread isolation in
_get_playwrightThe Playwright Sync API greenlet is bound to the OS thread that created it and is never safe to use from any other thread, even if the creating thread is still alive.
_get_playwrightnow stores the creating thread asself._playwright_threadand recreates the connection whenever the calling thread differs from the creator — regardless of whether the creator is alive or dead.Bug 2 (P1) —
browser_navigatedefaultwait_untilChanged from
"load"→"domcontentloaded". Sites with external sub-resources unreachable from the sandbox never fireload, causing systematic 90 s timeouts.domcontentloadedfires reliably once the DOM is ready.Bug 3 (P2) — Greenlet error recovery in
_run_in_sandboxAdded a dedicated
except GreenletErrorblock (with safeImportErrorfallback, the same pattern used forPlaywrightError) placed before the genericexcept Exception. This replaces the previous brittle substring check ("cannot switch to") with a proper exception type match that is robust across greenlet/Playwright version changes.On a
greenlet.error, only the Playwright connection is reset — the sandbox itself is preserved (the failure is client-side thread affinity, not a sandbox crash) and the callback is retried once with the same sandbox instance._reset_playwrightNow also clears
self._playwright_thread = Noneon reset.Original prompt
This section details on the original issue you should resolve
<issue_title>[Bug] BrowserToolSet v0.0.22 的 Playwright 缓存在跨线程调用时导致 greenlet 永久失效</issue_title>
<issue_description>## 概述
agentrun-sdk v0.0.22 的 PR Serverless-Devs/agentrun-sdk-python#57(修复 Issue Serverless-Devs/agentrun-sdk-python#55)引入了 Playwright 连接缓存机制 (
_get_playwright+self._playwright_sync)。该缓存隐含假设所有工具调用在同一个 OS 线程上执行,但 LangGraphToolNode每次调用时创建新的ContextThreadPoolExecutor(线程池),导致连续的浏览器工具调用实际运行在不同的短生命周期工作线程上。结果:首次浏览器工具调用成功后,缓存的 Playwright 实例的 greenlet 绑定到已终止的工作线程上,所有后续调用必然失败,返回
"cannot switch to a different thread (which happens to have exited)"。版本影响:v0.0.21 不受此问题影响(每次创建新 Playwright 实例),v0.0.22 和 v0.0.23 均受影响。
Bug 1(P0 根因):
_get_playwright缓存不感知调用线程变化根因
PR #57 将每个工具方法从"每次创建新 Playwright"改为"复用缓存实例":
_get_playwright在首次调用时创建BrowserPlaywrightSync实例并缓存到self._playwright_sync。Playwright 内部的sync_playwright().start()会创建一个MainGreenlet,该 greenlet 绑定到创建它的 OS 线程(Python greenlet 库的约束:greenlet 只能在创建它的线程上切换)。当外部调用者(如 LangGraph ToolNode)在不同线程上发起后续工具调用时,
_get_playwright返回缓存实例,但其内部 greenlet 仍绑定在已不存在的旧线程上,导致greenlet.error。LangGraph ToolNode 的线程模型
LangGraph
ToolNode._func()每次调用都创建新的ContextThreadPoolExecutor,工具在线程池工作线程中执行:get_executor_for_config(来自langchain_core.runnables.config)每次返回新ContextThreadPoolExecutor:已确认 LangGraph 最新版 v1.0.10 的
ToolNode仍使用完全相同的模式,从 v1.0.6 → v1.0.10 的变更集中在 CLI、文档、依赖更新等方面,线程模型未变。升级 LangGraph 无法解决此问题。冲突时序
v0.0.21 为什么不触发
在 v0.0.21 中,每次工具调用都使用
with sb.sync_playwright() as p:创建和销毁独立的 Playwright 实例。即使 ToolNode 每次在不同线程上执行,每次调用都会创建新的 greenlet 绑定到当前线程,不依赖前一个线程。v0.0.21 用户报告的"不稳定失败"是另一个独立问题(Issue #55 描述的 CDP 瞬态错误 + 不区分错误类型导致不必要的沙箱重建),并非 greenlet 线程绑定问题。
复现步骤
BrowserToolSet并通过to_langchain()转为 LangChain 工具create_agent构建 LangGraph Agentbrowser_navigate→ 成功最小复现代码