Skip to content

Tracing crashes with UnicodeEncodeError on surrogate escapes in hook arguments/results #681

@RonnyPfannschmidt

Description

@RonnyPfannschmidt

Problem

When pluggy tracing is enabled (e.g. pytest's --debug flag), hook arguments or return values containing surrogate escape characters (like \ud800) cause a UnicodeEncodeError crash. This is because _format_message in _tracing.py uses str() to format values, and the resulting string with literal surrogates cannot be written to most output targets.

Reported originally as pytest-dev/pytest#13750.

Analysis

_format_message currently formats two kinds of data identically using str():

  1. Structural labels passed as positional args — e.g. "finish", "-->", hook names — which are always safe ASCII strings
  2. Python values — hook kwargs values (rendered via the extra dict) and hook return values — which may contain arbitrary data including surrogates

The fix needs to apply repr() only to the value positions, not to structural labels.

Intended Solution

  1. In _tracing.py line 38 — use {value!r} for the extra/kwargs dict values:

    lines.append(f"{indent}    {name}: {value!r}\n")

    This makes kwargs values in trace output show their type (e.g. 'lfplugin' instead of lfplugin, PosixPath('/foo') instead of /foo) and safely escapes surrogates.

  2. In _manager.py line 506 — use repr() on the hook result before passing it as a trace arg:

    hooktrace("finish", hook_name, "-->", repr(outcome.get_result()))

    This ensures the result value is safely formatted, while "finish", hook_name, and "-->" remain plain str()-formatted (via _format_message's existing map(str, args)).

  3. Do NOT change the content = " ".join(map(str, args)) line in _format_message. Keeping str() there preserves readable structural output without quoting labels. The caller (_manager.py) is responsible for pre-formatting any unsafe values with repr().

This avoids the double-repr problem and keeps trace output readable:

  finish pytest_runtest_call --> '\ud800' [hook]
      config: <Config object at 0x...>
      plugin_name: 'lfplugin'

Rather than the over-quoted version that blanket repr() in _format_message would produce:

  'finish' 'pytest_runtest_call' '-->' "'\\ ud800'" [hook]

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions