Skip to content

Fix pdb / breakpoint() hang in workflow code#1568

Open
elidlocke wants to merge 1 commit into
temporalio:mainfrom
elidlocke:pdb-hang-repro
Open

Fix pdb / breakpoint() hang in workflow code#1568
elidlocke wants to merge 1 commit into
temporalio:mainfrom
elidlocke:pdb-hang-repro

Conversation

@elidlocke
Copy link
Copy Markdown

What was changed

When debug_mode=True on the Worker (or TEMPORAL_DEBUG=1), breakpoint() inside workflow code now opens an interactive pdb prompt — including from a sandboxed workflow run under pytest. Four pieces:

  • Inline dispatch on the main thread. Activations run on the asyncio main thread (scheduled via loop.call_soon to avoid nesting inside the dispatch task's __step()), so pdb's input() reaches the TTY.
  • Targeted sandbox relaxation. breakpoint is removed from the sandbox's invalid builtins so the call can reach the worker hook. Nothing else is relaxed.
  • Custom Pdb subclass. Drops into pdb at the workflow's own frame (not our indirection), suspends sandbox checks for the duration of each REPL interaction, and overrides q / Ctrl-D to continue the workflow instead of failing it with BdbQuit.
  • Defensive sys.breakpointhook. Calling breakpoint() from a workflow worker thread without debug_mode raises a clear RuntimeError instead of silently hanging.

When debug_mode is not set, the worker's dispatch and sandbox config are unchanged. The defensive hook replaces a silent hang with a clear error — strictly an improvement, not a change to working code.

Why?

breakpoint() and pdb.set_trace() inside workflow code silently hang today. Three overlapping issues:

  1. Activations run on a ThreadPoolExecutor thread, so pdb's input() can't read the controlling TTY.
  2. The sandbox flags breakpoint as non-deterministic, so the call doesn't reach the debugger.
  3. pdb's cmdloop touches more sandbox-restricted internals at runtime (e.g. readline.get_completer) — relaxing the builtin alone isn't enough.

Direct synchronous activation from the dispatch coroutine doesn't work on Python 3.14:

RuntimeError: Cannot enter into task <workflow run task>
  while another task <_handle_activation> is being executed.

The dispatch task is mid-__step() when workflow.activate tries to step the workflow's own task; 3.14 refuses. await future after loop.call_soon suspends the dispatch task first.

Complements #1249 (sandbox passthrough for IDE debuggers). Independent change, different debugger.

Checklist

  1. Closes Setting debug_mode in a Worker still doesn't allow the user of breakpoints #1104

  2. How was this tested:

  • tests/worker/test_breakpoint_hang.py — five tests covering thread placement (both modes), breakpoint in a sandboxed workflow lands at the user's frame with locals visible, q/Ctrl-D continues cleanly, defensive hook raises. 5/5 pass on Python 3.13 and 3.14.
  • Manual: drop breakpoint() into any workflow's run() body, run via pytest -s (or a standalone python script), confirm the (Pdb) prompt opens at the user's frame with locals in scope.
  1. Any docs updates needed?
  • Yes. Adds a "Debugging Workflows with breakpoint() / pdb" subsection to the README under Workflow Sandbox, with a runnable example and the workflow-task-timeout caveat.

When debug_mode=True (or TEMPORAL_DEBUG=1), breakpoint() inside workflow
code now opens an interactive pdb prompt -- including from a sandboxed
workflow run under pytest. Four pieces:

- Inline dispatch on the asyncio main thread (via loop.call_soon to
  avoid nesting inside the dispatch task's __step() and tripping
  Python 3.14's task-entry validation).
- breakpoint removed from the sandbox's invalid builtins so the call
  reaches the worker hook. Nothing else is relaxed.
- A Pdb subclass that lands at the workflow's own frame, suspends
  sandbox checks during each REPL interaction, and overrides q/Ctrl-D
  to continue the workflow instead of failing it with BdbQuit.
- A defensive sys.breakpointhook that raises a clear RuntimeError when
  breakpoint() is called from a workflow worker thread without
  debug_mode, replacing the previous silent hang.

When debug_mode is not set, the worker's dispatch and sandbox config
are unchanged.

Adds a README subsection on debugging workflows and five tests at
tests/worker/test_breakpoint_hang.py. Verified on Python 3.13 and 3.14.

Closes temporalio#1104.
sys.breakpointhook = _temporal_workflow_breakpoint_hook


def _relax_sandbox_for_debugger(workflow_runner: WorkflowRunner) -> WorkflowRunner:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's sufficient logic involved here that I think it would be a good idea to move this to a new file. _debugger.py perhaps?

os.environ, etc.) aren't blocked either — without permanently dropping
sandbox checks for the rest of workflow execution.
"""
from temporalio.worker.workflow_sandbox._runner import SandboxedWorkflowRunner
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid inline imports unless there is a good reason. I think pdb has sufficient justification to do so, but probably not this internal import.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Setting debug_mode in a Worker still doesn't allow the user of breakpoints

2 participants