Skip to content

fix(BrowserToolSet): Playwright greenlet thread-binding crash on cross-thread cache reuse#67

Merged
Sodawyx merged 6 commits intomainfrom
copilot/fix-playwright-cache-issue
Mar 10, 2026
Merged

fix(BrowserToolSet): Playwright greenlet thread-binding crash on cross-thread cache reuse#67
Sodawyx merged 6 commits intomainfrom
copilot/fix-playwright-cache-issue

Conversation

Copy link
Contributor

Copilot AI commented Mar 10, 2026

The Playwright connection cache introduced in v0.0.22 assumed all tool calls run on the same OS thread. LangGraph ToolNode creates a new ContextThreadPoolExecutor per invocation, so each tool call lands on a different short-lived worker thread — the cached greenlet is bound to the dead thread, making every call after the first fail with "cannot switch to a different thread (which happens to have exited)".

Changes

Bug 1 (P0) — Per-thread isolation in _get_playwright

The Playwright Sync API greenlet is bound to the OS thread that created it and is never safe to use from any other thread, even if the creating thread is still alive. _get_playwright now stores the creating thread as self._playwright_thread and recreates the connection whenever the calling thread differs from the creator — regardless of whether the creator is alive or dead.

# Before: always returned cached instance regardless of thread state
if self._playwright_sync is not None:
    return self._playwright_sync

# After: invalidate cache for any different calling thread
if self._playwright_sync is not None and self._playwright_thread is not None:
    current_thread = threading.current_thread()
    creator_thread = self._playwright_thread
    if not creator_thread.is_alive() or current_thread is not creator_thread:
        self._reset_playwright()  # greenlet binding is invalid, must recreate

Bug 2 (P1) — browser_navigate default wait_until

Changed from "load""domcontentloaded". Sites with external sub-resources unreachable from the sandbox never fire load, causing systematic 90 s timeouts. domcontentloaded fires reliably once the DOM is ready.

Bug 3 (P2) — Greenlet error recovery in _run_in_sandbox

Added a dedicated except GreenletError block (with safe ImportError fallback, the same pattern used for PlaywrightError) placed before the generic except Exception. This replaces the previous brittle substring check ("cannot switch to") with a proper exception type match that is robust across greenlet/Playwright version changes.

On a greenlet.error, only the Playwright connection is reset — the sandbox itself is preserved (the failure is client-side thread affinity, not a sandbox crash) and the callback is retried once with the same sandbox instance.

_reset_playwright

Now also clears self._playwright_thread = None on reset.

Original prompt

This section details on the original issue you should resolve

<issue_title>[Bug] BrowserToolSet v0.0.22 的 Playwright 缓存在跨线程调用时导致 greenlet 永久失效</issue_title>
<issue_description>## 概述

agentrun-sdk v0.0.22 的 PR Serverless-Devs/agentrun-sdk-python#57(修复 Issue Serverless-Devs/agentrun-sdk-python#55)引入了 Playwright 连接缓存机制 (_get_playwright + self._playwright_sync)。该缓存隐含假设所有工具调用在同一个 OS 线程上执行,但 LangGraph ToolNode 每次调用时创建新的 ContextThreadPoolExecutor(线程池),导致连续的浏览器工具调用实际运行在不同的短生命周期工作线程上。

结果:首次浏览器工具调用成功后,缓存的 Playwright 实例的 greenlet 绑定到已终止的工作线程上,所有后续调用必然失败,返回 "cannot switch to a different thread (which happens to have exited)"

版本影响:v0.0.21 不受此问题影响(每次创建新 Playwright 实例),v0.0.22 和 v0.0.23 均受影响。

环境
agentrun-sdk 0.0.22
Python 3.12
LangGraph 1.0.6(已确认升级到 v1.0.10 无法解决)
LangChain 1.2.4
Playwright 1.57.0
greenlet 3.3.0
OS macOS (darwin 23.6.0)

Bug 1(P0 根因):_get_playwright 缓存不感知调用线程变化

根因

PR #57 将每个工具方法从"每次创建新 Playwright"改为"复用缓存实例":

# v0.0.21 — 每次创建 & 销毁
def browser_navigate(self, url, ...):
    def inner(sb):
        with sb.sync_playwright() as p:   # ← 每次新建
            response = p.goto(url, ...)
            return {...}
    return self._run_in_sandbox(inner)

# v0.0.22 — 缓存复用
def browser_navigate(self, url, ...):
    def inner(sb):
        p = self._get_playwright(sb)      # ← 首次创建,后续复用
        response = p.goto(url, ...)
        return {...}
    return self._run_in_sandbox(inner)

_get_playwright 在首次调用时创建 BrowserPlaywrightSync 实例并缓存到 self._playwright_sync。Playwright 内部的 sync_playwright().start() 会创建一个 MainGreenlet该 greenlet 绑定到创建它的 OS 线程(Python greenlet 库的约束:greenlet 只能在创建它的线程上切换)。

当外部调用者(如 LangGraph ToolNode)在不同线程上发起后续工具调用时,_get_playwright 返回缓存实例,但其内部 greenlet 仍绑定在已不存在的旧线程上,导致 greenlet.error

LangGraph ToolNode 的线程模型

LangGraph ToolNode._func() 每次调用都创建新的 ContextThreadPoolExecutor,工具在线程池工作线程中执行:

# langgraph/prebuilt/tool_node.py
def _func(self, input, config, runtime):
    # ...
    with get_executor_for_config(config) as executor:  # 每次新建 executor
        outputs = list(
            executor.map(self._run_one, tool_calls, ...)
        )
    # executor.__exit__() → shutdown(wait=True) → 工作线程终止

get_executor_for_config(来自 langchain_core.runnables.config)每次返回新 ContextThreadPoolExecutor

@contextmanager
def get_executor_for_config(config):
    with ContextThreadPoolExecutor(max_workers=config.get("max_concurrency")) as executor:
        yield executor

已确认 LangGraph 最新版 v1.0.10 的 ToolNode 仍使用完全相同的模式,从 v1.0.6 → v1.0.10 的变更集中在 CLI、文档、依赖更新等方面,线程模型未变。升级 LangGraph 无法解决此问题。

冲突时序

LangGraph Agent 执行流程:

 ToolNode._func() [调用 Serverless-Devs/agentrun-sdk-python#1: browser_navigate]
    └─ ContextThreadPoolExecutor-1 创建
        └─ Thread-B 执行 browser_navigate
            └─ _get_playwright(sb)
                └─ 首次创建 → sync_playwright().start()
                    └─ MainGreenlet 创建,绑定 Thread-B ✓
                └─ 缓存到 self._playwright_sync
            └─ p.goto("baidu.com") → 成功 ✅
    └─ executor.__exit__() → Thread-B 终止 💀

 LLM 调用 (~2s)

 ToolNode._func() [调用 Serverless-Devs/agentrun-sdk-python#2: browser_snapshot]
    └─ ContextThreadPoolExecutor-2 创建
        └─ Thread-C 执行 browser_snapshot
            └─ _get_playwright(sb)
                └─ self._playwright_sync 不为 None → 返回缓存实例
            └─ p.evaluate(...)
                └─ dispatcher_fiber.switch()
                    └─ 目标 greenlet 绑定在 Thread-B,但 Thread-B 已退出
                    └─ 💥 "cannot switch to a different thread (which happens to have exited)"

v0.0.21 为什么不触发

在 v0.0.21 中,每次工具调用都使用 with sb.sync_playwright() as p: 创建和销毁独立的 Playwright 实例。即使 ToolNode 每次在不同线程上执行,每次调用都会创建新的 greenlet 绑定到当前线程,不依赖前一个线程。

v0.0.21 用户报告的"不稳定失败"是另一个独立问题(Issue #55 描述的 CDP 瞬态错误 + 不区分错误类型导致不必要的沙箱重建),并非 greenlet 线程绑定问题。

复现步骤

  1. 创建 BrowserToolSet 并通过 to_langchain() 转为 LangChain 工具
  2. 通过 create_agent 构建 LangGraph Agent
  3. 给 Agent 发送需要使用浏览器的指令(如"打开 baidu.com 并获取快照")
  4. Agent 首次调用 browser_navigate成功
  5. Agent 第二次调用任何浏览器工具 → 必然失败

最小复现代码

from agentrun.integration.builtin.sandbox import BrowserToolSet
from langchain_core.runnables.config import ContextThreadPoolExecutor

toolset = BrowserToolSet(template_name="<your-template>", config=None)
tools = toolset.to_langchain()
tools_map = {t.name: t for t in tools}

# 模拟 ToolNode:每次工具调用使用新的 ContextThreadPoolExecutor
# Step 1: 第一个 executor(工具在 Thread-B 上执行)
with ContextThreadPoolExecutor(max_workers=1) as executor:
    r = list(executor.map(
        lambda _: tools_map["browser_navigate"].invoke({"url": "https://www.baidu.com"}),
        [None]
    ))[0]
    print(r)  # {'url': 'https://www.baidu.com', 'succ...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes Serverless-Devs/agentrun-sdk-python#66

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs.

- Bug 1 (P0): _get_playwright now checks if the creating thread is still
  alive before returning the cached Playwright instance. When the creating
  thread has exited (e.g. after a LangGraph ToolNode executor shuts down),
  the stale greenlet-bound instance is discarded and a fresh one is created
  in the current thread. Concurrent callers in the same executor (creating
  thread still alive) continue to share one connection.

- Bug 2 (P1): Change browser_navigate default wait_until from 'load' to
  'domcontentloaded'. Sites where external sub-resources are unreachable
  from the sandbox never fire the 'load' event, causing a permanent 90s
  timeout. 'domcontentloaded' fires reliably once the DOM is ready.

- Bug 3 (P2): Detect 'cannot switch to a different thread' errors in
  _run_in_sandbox's catch-all Exception handler and reset the cached
  Playwright instance + sandbox so the next call can recover cleanly.

Also:
- _reset_playwright now also clears _playwright_thread
- Updated all test fixtures to set _playwright_thread = None
- Fixed concurrent caching test to use a hold barrier keeping threads alive
- Added TestBrowserToolSetThreadAwareness (dead-thread recreation, live-
  thread reuse) and TestBrowserToolSetGreenletErrorHandling test classes

Type check: passed (297 source files, no issues)

Co-authored-by: OhYee <13498329+OhYee@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix Playwright cache issue with cross-thread calls fix(BrowserToolSet): Playwright greenlet thread-binding crash on cross-thread cache reuse Mar 10, 2026
@OhYee OhYee requested a review from Copilot March 10, 2026 01:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes BrowserToolSet Playwright caching failures caused by greenlet thread-affinity when LangGraph runs tool calls on short-lived worker threads, and improves navigation robustness.

Changes:

  • Add thread-liveness tracking to invalidate/recreate cached Playwright connections.
  • Change browser_navigate default wait_until from "load" to "domcontentloaded" to reduce sandbox timeouts.
  • Add recovery logic for greenlet thread-binding errors and extend unit tests around caching/reset behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
agentrun/integration/builtin/sandbox.py Adds _playwright_thread tracking, greenlet-error recovery path, and updates browser_navigate default wait condition.
tests/unittests/integration/test_browser_toolset_error_handling.py Updates fixtures for _playwright_thread and adds new tests for thread awareness / greenlet error handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +832 to +835
except Exception as e:
logger.debug("Unexpected error in browser sandbox: %s", e)
error_msg = str(e)
if "cannot switch to" in error_msg:
logger.debug(
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The greenlet recovery path detects thread-binding errors via a substring check ("cannot switch to"). This is brittle (message text can change between greenlet/Playwright versions) and may miss the real exception type. Prefer handling the concrete exception type (e.g., greenlet.error) in a dedicated except block (guarded by a safe import) rather than relying on message parsing.

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ac0252e. Added a GreenletError import with ImportError fallback (same pattern as PlaywrightError at the top of the file), then replaced the brittle string check with a dedicated except GreenletError block placed before the generic except Exception. The sandbox is now preserved on greenlet errors (only Playwright is reset and retried) since the failure is client-side thread affinity, not a sandbox crash.

Comment on lines +260 to +275
"""测试并发调用 _get_playwright 只创建一个连接,不会泄漏

所有工作线程在同一 executor 内并发运行(即创建线程仍存活),
应复用同一连接,不会触发重建。
"""
start_barrier = threading.Barrier(5)
# Keep all threads alive until every thread has obtained playwright,
# simulating concurrent workers within the same executor context.
hold_barrier = threading.Barrier(5)
results: list = []

def worker():
barrier.wait()
start_barrier.wait()
p = toolset._get_playwright(mock_sandbox)
results.append(p)
hold_barrier.wait() # stay alive so is_alive() == True for peers
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This concurrency test asserts that _get_playwright returns the same cached Playwright object to multiple OS threads. With Playwright Sync API, the underlying greenlet is thread-affine, so sharing a single instance across threads will still crash when those threads actually call methods on it. If the intended fix is thread-safe caching, the test should reflect that (e.g., per-thread caching) rather than validating cross-thread sharing.

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already applied in ac0252e. The test was renamed to test_concurrent_get_playwright_each_thread_gets_own_connection and now validates that each OS thread receives its own Playwright connection rather than asserting cross-thread sharing (which is unsafe with the Sync API's greenlet binding).

Comment on lines +395 to +421
def test_get_playwright_live_thread_not_recreated(
self, toolset, mock_sandbox
):
"""测试创建线程仍存活时不重建连接(并发安全)

即使在不同线程中调用,只要创建线程仍然存活,就应复用同一连接。
这模拟同一 executor 内并发工具调用的场景。
"""
results: list = []

# Create connection in main thread first
toolset._get_playwright(mock_sandbox)
# The creating thread (main test thread) is still alive

# Another thread should reuse the same connection
def worker():
p = toolset._get_playwright(mock_sandbox)
results.append(p)

t = threading.Thread(target=worker)
t.start()
t.join()

assert len(results) == 1
assert results[0] is toolset._playwright_sync
mock_sandbox.sync_playwright.assert_called_once()

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_get_playwright_live_thread_not_recreated expects that a Playwright instance created on the main thread can be reused from a different live thread. For Playwright Sync API, this is exactly the unsafe case that triggers greenlet.error: cannot switch to a different thread. The desired behavior should be either (a) recreate a thread-local Playwright instance for the worker thread, or (b) prevent calling into Playwright from a different thread; the test should be updated accordingly.

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already applied in ac0252e. The test was replaced by test_get_playwright_different_live_thread_recreates_connection, which validates that any different OS thread triggers a new connection — even when the creating thread is still alive — since Playwright Sync API greenlets are never safe to use from a different thread.

OhYee and others added 2 commits March 10, 2026 11:25
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: OhYee <oyohyee@oyohyee.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: OhYee <oyohyee@oyohyee.com>
…read

- Add GreenletError import with ImportError fallback (same pattern as PlaywrightError)
- Replace brittle 'cannot switch to' substring check with dedicated
  except GreenletError block placed before the generic except Exception,
  so the correct exception type is matched regardless of message wording
- Keep existing sandbox on greenlet error (failure is client-side thread
  affinity, not a sandbox crash); only reset Playwright and retry
- _get_playwright already checks current_thread is not creator_thread
  (any different thread triggers recreation, not just dead threads)
- Update concurrent caching test: each thread now gets its own connection
  (Playwright Sync API is thread-affine, cross-thread sharing is unsafe)
- Replace test_get_playwright_live_thread_not_recreated with
  test_get_playwright_different_live_thread_recreates_connection to
  validate the correct per-thread isolation behavior
- Update greenlet error tests to use real greenlet.error, verify sandbox
  is preserved, and check retry behavior

Type check: passed (297 source files, no issues)

Co-authored-by: OhYee <13498329+OhYee@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 10, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • agentrun.cn-hangzhou.aliyuncs.com
    • Triggering command: /home/REDACTED/work/agentrun-sdk-python/agentrun-sdk-python/.venv/bin/pytest pytest tests/unittests/ -q (dns block)
  • api.scarf.sh
    • Triggering command: /home/REDACTED/work/agentrun-sdk-python/agentrun-sdk-python/.venv/bin/pytest pytest tests/unittests/ -q (dns block)
  • telemetry.crewai.com
    • Triggering command: /home/REDACTED/work/agentrun-sdk-python/agentrun-sdk-python/.venv/bin/pytest pytest tests/unittests/ -q (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

…t toolsets

Removed the comprehensive LangChain invoke methods integration test file which was testing various invocation patterns (astream_events, astream, stream, invoke, ainvoke) with both streaming and non-streaming scenarios. Also renamed TestToolSet to SampleToolSet across multiple integration test files to avoid confusion with actual test classes.

The removed test file contained extensive setup for mocking OpenAI protocol services and comprehensive test cases for different LangChain/LangGraph invocation methods including streaming protocols and tool call scenarios.

The toolset renaming affects AgentScope, CrewAI, Google ADK, LangChain, LangGraph, and PydanticAI integration tests to maintain consistency and prevent naming conflicts.

feat(browser-toolset): add thread awareness and error handling for Playwright connections

Added comprehensive thread awareness to BrowserToolSet's Playwright connection management. The implementation now tracks which thread created the Playwright connection and properly recreates it when the creating thread exits, preventing greenlet binding errors in multi-threaded environments like LangGraph ToolNodes.

Also added proper error handling for 'cannot switch to a different thread' exceptions, automatically resetting both Playwright connections and sandbox instances when such errors occur to prevent cascading failures.

The changes include new test coverage for concurrent access, thread lifecycle management, and error recovery scenarios to ensure robust operation in production environments.

test(browser-toolset): enhance error handling tests with thread awareness

Expanded the browser toolset error handling tests to cover thread-aware Playwright connection management. Added comprehensive test cases for concurrent access patterns, thread lifecycle scenarios, and specific error handling for greenlet binding issues that occur when switching between threads in LangGraph environments.

The new tests verify proper connection reuse within the same thread, correct recreation when the creating thread exits, and appropriate error handling for cross-thread Playwright operations.

chore(test): remove duplicate LangChain invoke methods test file

Removed the duplicate test file that appears to be an older version of the LangChain invoke methods tests, maintaining only the comprehensive version that covers all invocation patterns and streaming scenarios.

test: 移除 LangChain 调用方法集成测试并重命名测试工具集

移除了全面的 LangChain 调用方法集成测试文件,该文件测试了各种调用模式(astream_events、astream、stream、invoke、ainvoke)以及流式和非流式场景。同时在多个集成测试文件中将 TestToolSet 重命名为 SampleToolSet,以避免与实际测试类混淆。

被删除的测试文件包含大量用于模拟 OpenAI 协议服务的设置,以及针对不同 LangChain/LangGraph 调用方法的全面测试用例,包括流式协议和工具调用场景。

工具集重命名影响了 AgentScope、CrewAI、Google ADK、LangChain、LangGraph 和 PydanticAI 集成测试,以保持一致性并防止命名冲突。

feat(browser-toolset): 为 Playwright 连接添加线程感知和错误处理功能

为 BrowserToolSet 的 Playwright 连接管理添加了全面的线程感知功能。实现现在跟踪哪个线程创建了 Playwright 连接,并在线程退出时正确重新创建它,防止在多线程环境(如 LangGraph ToolNodes)中的 greenlet 绑定错误。

还为 'cannot switch to a different thread' 异常添加了适当的错误处理,在发生此类错误时自动重置 Playwright 连接和沙箱实例,以防止级联故障。

更改包括对并发访问、线程生命周期管理和错误恢复场景的新测试覆盖,以确保在生产环境中稳健运行。

test(browser-toolset): 使用线程感知增强错误处理测试

扩展浏览器工具集错误处理测试以涵盖线程感知的 Playwright 连接管理。添加了对并发访问模式、线程生命周期场景的全面测试用例,以及针对在 LangGraph 环境中跨线程切换时发生的 greenlet 绑定问题的具体错误处理。

新测试验证同一线程内的正确连接复用、创建线程退出时的正确重建,以及跨线程 Playwright 操作的适当错误处理。

chore(test): 删除重复的 LangChain 调用方法测试文件

删除似乎是 LangChain 调用方法测试旧版本的重复测试文件,只保留涵盖所有调用模式和流式场景的完整版本。

Change-Id: I4d963b9b041bf81cc2f5f20d3c2f7cd47abe6bbb
Signed-off-by: OhYee <oyohyee@oyohyee.com>
@OhYee OhYee marked this pull request as ready for review March 10, 2026 08:09
@OhYee OhYee requested a review from Sodawyx March 10, 2026 08:09
@Sodawyx Sodawyx merged commit 5dd5e41 into main Mar 10, 2026
2 checks passed
@Sodawyx Sodawyx deleted the copilot/fix-playwright-cache-issue branch March 10, 2026 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants