Skip to content

Fix pdb / breakpoint() hang in workflow code (#1104)#2

Closed
elidlocke wants to merge 1 commit into
mainfrom
pdb-hang-repro
Closed

Fix pdb / breakpoint() hang in workflow code (#1104)#2
elidlocke wants to merge 1 commit into
mainfrom
pdb-hang-repro

Conversation

@elidlocke
Copy link
Copy Markdown
Owner

@elidlocke elidlocke commented May 22, 2026

What was changed

When debug_mode=True on the Worker (or TEMPORAL_DEBUG=1), breakpoint() inside workflow code now opens an interactive pdb prompt — including from a sandboxed workflow run under pytest. Four pieces:

  • Inline dispatch on the main thread. Activations run on the asyncio main thread (scheduled via loop.call_soon to avoid nesting inside the dispatch task's __step()), so pdb's input() reaches the TTY.
  • Targeted sandbox relaxation. breakpoint is removed from the sandbox's invalid builtins so the call can reach the worker hook. Nothing else is relaxed.
  • Custom Pdb subclass. Drops into pdb at the workflow's own frame (not our indirection), suspends sandbox checks for the duration of each REPL interaction, and overrides q / Ctrl-D to continue the workflow instead of failing it with BdbQuit.
  • Defensive sys.breakpointhook. Calling breakpoint() from a workflow worker thread without debug_mode raises a clear RuntimeError instead of silently hanging.

When debug_mode is not set, the worker's dispatch and sandbox config are unchanged. The defensive hook replaces a silent hang with a clear error — strictly an improvement, not a change to working code.

Why?

breakpoint() and pdb.set_trace() inside workflow code silently hang today. Three overlapping issues:

  1. Activations run on a ThreadPoolExecutor thread, so pdb's input() can't read the controlling TTY.
  2. The sandbox flags breakpoint as non-deterministic, so the call doesn't reach the debugger.
  3. pdb's cmdloop touches more sandbox-restricted internals at runtime (e.g. readline.get_completer) — relaxing the builtin alone isn't enough.

Direct synchronous activation from the dispatch coroutine doesn't work on Python 3.14:

RuntimeError: Cannot enter into task <workflow run task>
  while another task <_handle_activation> is being executed.

The dispatch task is mid-__step() when workflow.activate tries to step the workflow's own task; 3.14 refuses. await future after loop.call_soon suspends the dispatch task first.

Complements temporalio#1249 (sandbox passthrough for IDE debuggers). Independent change, different debugger.

Checklist

  1. Closes Setting debug_mode in a Worker still doesn't allow the user of breakpoints temporalio/sdk-python#1104

  2. How was this tested:

  • tests/worker/test_breakpoint_hang.py — five tests covering thread placement (both modes), breakpoint in a sandboxed workflow lands at the user's frame with locals visible, q/Ctrl-D continues cleanly, defensive hook raises. 5/5 pass on Python 3.13 and 3.14.
  • Manual: drop breakpoint() into any workflow's run() body, run via pytest -s (or a standalone python script), confirm the (Pdb) prompt opens at the user's frame with locals in scope.
  1. Any docs updates needed?
  • Yes. Adds a "Debugging Workflows with breakpoint() / pdb" subsection to the README under Workflow Sandbox, with a runnable example and the workflow-task-timeout caveat.

@elidlocke elidlocke self-assigned this May 22, 2026
@elidlocke
Copy link
Copy Markdown
Owner Author

Closing to force base-SHA recompute after syncing fork's main

@elidlocke elidlocke closed this May 22, 2026
@elidlocke elidlocke reopened this May 22, 2026
@elidlocke elidlocke changed the title [CI verification] Fix pdb breakpoint() hang in workflow code (#1104) Fix pdb / breakpoint() hang in workflow code (#1104) May 22, 2026
@elidlocke elidlocke force-pushed the pdb-hang-repro branch 7 times, most recently from 93c2129 to eea3c60 Compare May 29, 2026 15:04
When debug_mode=True (or TEMPORAL_DEBUG=1), breakpoint() inside workflow
code now opens an interactive pdb prompt -- including from a sandboxed
workflow run under pytest. Four pieces:

- Inline dispatch on the asyncio main thread (via loop.call_soon to
  avoid nesting inside the dispatch task's __step() and tripping
  Python 3.14's task-entry validation).
- breakpoint removed from the sandbox's invalid builtins so the call
  reaches the worker hook. Nothing else is relaxed.
- A Pdb subclass that lands at the workflow's own frame, suspends
  sandbox checks during each REPL interaction, and overrides q/Ctrl-D
  to continue the workflow instead of failing it with BdbQuit.
- A defensive sys.breakpointhook that raises a clear RuntimeError when
  breakpoint() is called from a workflow worker thread without
  debug_mode, replacing the previous silent hang.

When debug_mode is not set, the worker's dispatch and sandbox config
are unchanged.

Adds a README subsection on debugging workflows and five tests at
tests/worker/test_breakpoint_hang.py. Verified on Python 3.13 and 3.14.

Closes temporalio#1104.
@elidlocke
Copy link
Copy Markdown
Owner Author

Superseded by upstream temporalio#1568.

@elidlocke elidlocke closed this Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Setting debug_mode in a Worker still doesn't allow the user of breakpoints

1 participant