Problem
When pluggy tracing is enabled (e.g. pytest's --debug flag), hook arguments or return values containing surrogate escape characters (like \ud800) cause a UnicodeEncodeError crash. This is because _format_message in _tracing.py uses str() to format values, and the resulting string with literal surrogates cannot be written to most output targets.
Reported originally as pytest-dev/pytest#13750.
Analysis
_format_message currently formats two kinds of data identically using str():
- Structural labels passed as positional args — e.g.
"finish", "-->", hook names — which are always safe ASCII strings
- Python values — hook kwargs values (rendered via the
extra dict) and hook return values — which may contain arbitrary data including surrogates
The fix needs to apply repr() only to the value positions, not to structural labels.
Intended Solution
-
In _tracing.py line 38 — use {value!r} for the extra/kwargs dict values:
lines.append(f"{indent} {name}: {value!r}\n")
This makes kwargs values in trace output show their type (e.g. 'lfplugin' instead of lfplugin, PosixPath('/foo') instead of /foo) and safely escapes surrogates.
-
In _manager.py line 506 — use repr() on the hook result before passing it as a trace arg:
hooktrace("finish", hook_name, "-->", repr(outcome.get_result()))
This ensures the result value is safely formatted, while "finish", hook_name, and "-->" remain plain str()-formatted (via _format_message's existing map(str, args)).
-
Do NOT change the content = " ".join(map(str, args)) line in _format_message. Keeping str() there preserves readable structural output without quoting labels. The caller (_manager.py) is responsible for pre-formatting any unsafe values with repr().
This avoids the double-repr problem and keeps trace output readable:
finish pytest_runtest_call --> '\ud800' [hook]
config: <Config object at 0x...>
plugin_name: 'lfplugin'
Rather than the over-quoted version that blanket repr() in _format_message would produce:
'finish' 'pytest_runtest_call' '-->' "'\\ ud800'" [hook]
References
Problem
When pluggy tracing is enabled (e.g. pytest's
--debugflag), hook arguments or return values containing surrogate escape characters (like\ud800) cause aUnicodeEncodeErrorcrash. This is because_format_messagein_tracing.pyusesstr()to format values, and the resulting string with literal surrogates cannot be written to most output targets.Reported originally as pytest-dev/pytest#13750.
Analysis
_format_messagecurrently formats two kinds of data identically usingstr():"finish","-->", hook names — which are always safe ASCII stringsextradict) and hook return values — which may contain arbitrary data including surrogatesThe fix needs to apply
repr()only to the value positions, not to structural labels.Intended Solution
In
_tracing.pyline 38 — use{value!r}for the extra/kwargs dict values:This makes kwargs values in trace output show their type (e.g.
'lfplugin'instead oflfplugin,PosixPath('/foo')instead of/foo) and safely escapes surrogates.In
_manager.pyline 506 — userepr()on the hook result before passing it as a trace arg:This ensures the result value is safely formatted, while
"finish",hook_name, and"-->"remain plainstr()-formatted (via_format_message's existingmap(str, args)).Do NOT change the
content = " ".join(map(str, args))line in_format_message. Keepingstr()there preserves readable structural output without quoting labels. The caller (_manager.py) is responsible for pre-formatting any unsafe values withrepr().This avoids the double-repr problem and keeps trace output readable:
Rather than the over-quoted version that blanket
repr()in_format_messagewould produce:References
UnicodeEncodeErrorin pluggy with surrogate escape in parametrization and--debugpytest#13750