Skip to content

Use streaming in all assertion comparisons consumers#14523

Open
Pierre-Sassoulas wants to merge 7 commits into
pytest-dev:mainfrom
Pierre-Sassoulas:stream-comparisons
Open

Use streaming in all assertion comparisons consumers#14523
Pierre-Sassoulas wants to merge 7 commits into
pytest-dev:mainfrom
Pierre-Sassoulas:stream-comparisons

Conversation

@Pierre-Sassoulas
Copy link
Copy Markdown
Member

@Pierre-Sassoulas Pierre-Sassoulas commented May 26, 2026

Follow-up to #14521, if we push the concept of generator for comparators to all the consumers of comparator, we can avoid computing big diff when they are going to be truncated anyway, and ultimately make pytest faster (maybe also make the output with -vvv more fluid).

I didn't do it because the truncation footer could no longer report an exact hidden-line count. It would change from "...Full output truncated (499992 lines hidden), use '-vv' to show" to something like "...Full output truncated, use '-vv' to show", but on 500k element list/dict/set it makes pytest 2x faster.

Ways to claim that speedup:

  • drop the count entirely so the footer becomes ``...Full output truncated, use '-vv' to show`
  • gate the count on -v.
  • "we have at least 3 lines mores" (the same regardless of whether the diff is 12 or 12 million lines).
    Fishing for opinions here :)

Following Ronny's review comment on pytest-dev#13762, switch the set comparison
helpers in ``_compare_set.py`` to return ``Iterator[str]`` so the
composition is direct: ``_set_one_sided_diff`` ``yield``s, and the
other helpers ``yield from`` it. This avoids the manual
``explanation = []; .append/.extend`` boilerplate.

The "equal sets" branch of ``_compare_gt_set`` / ``_compare_lt_set``
used to peek at the diff for emptiness; replace that with a direct
``left == right`` check so the generator form stays idiomatic.

``SET_COMPARISON_FUNCTIONS`` and ``_compare_eq_set`` now return
``Iterable[str]`` / ``Iterator[str]``; the consumers in
``_compare_eq_any`` materialise with ``list(...)``.
Drop the ``list(...)`` wraps around each per-type comparator call in
the match dispatch and ``yield from`` instead. ``_compare_eq_any``
becomes an ``Iterator[str]`` that yields nothing when no specialised
explanation applies (replaces the previous ``list[str] | None``
sentinel).

The two callers materialise:

* ``util.assertrepr_compare`` does
  ``list(_compare_eq_any(...))`` before its empty/summary check.
* ``_compare_eq_cls`` iterates the generator directly via
  ``for line in _compare_eq_any(...)``.

No behavior change yet — this is the stepping stone for letting the
truncator upstream consume the iterator lazily so huge diffs don't
materialise just to be thrown away.
Turn ``assertrepr_compare`` into a generator. The first line yielded is
the summary; subsequent lines are the explanation produced by
``_compare_eq_any``. Yields nothing when no specialised explanation
applies — the consumer maps an empty iterator to ``None``.

The ``pytest_assertrepr_compare`` hook impl in ``assertion/__init__``
materialises the iterator and returns ``list[str] | None`` so the
public hook contract is unchanged. A follow-up commit replaces the
``list(...)`` call with a streaming truncator so an enormous diff
doesn't have to be built in full just to be discarded.

Behaviour change: previously, if an exception was raised while
building the explanation (e.g. a faulty ``__repr__``), the partial
output was discarded and only the failure notice was returned. The
generator can't unyield lines it has already produced, so the new
form preserves the partial output and appends the failure notice
after it. This is arguably more useful — the reader sees what was
being compared at the point the comparison failed.

``test_list_bad_repr`` is updated to assert that the failure notice
appears at the end of the explanation instead of replacing the body.
…ions

The existing ``truncate_if_required`` takes a ``list[str]`` — it can
only trim *after* the full explanation has been built. Add a streaming
counterpart that takes an ``Iterable[str]`` and stops pulling lines as
soon as the truncation threshold is reached, so a huge comparison
doesn't have to materialise its entire output just to be discarded.

The remaining lines are still iterated past the cap (without storing)
so the truncation footer can report the exact hidden-line count, and
``_truncate_explanation`` gains an ``extra_hidden`` argument to fold
that count into the message.

``_get_truncation_parameters`` is also refactored to take a ``Config``
directly (it never used anything else from ``Item``), so the new
streaming helper can be called from places that don't have an item
handy.

The new helper isn't wired up yet — that's the next commit.
Wire the built-in ``pytest_assertrepr_compare`` hook to return the
iterator produced by ``util.assertrepr_compare`` directly, and update
``callbinrepr`` to consume it through ``materialize_with_truncation``.
The result: a comparison that would produce millions of explanation
lines stops at the truncation threshold (default 8 lines / 640 chars)
without materialising the rest, only counting the remaining lines so
the truncation footer still reports the exact hidden-line count.

The ``callbinrepr`` dispatcher's ``materialize_with_truncation`` call
accepts both lists (returned by third-party plugins implementing the
hook) and iterators (returned by the built-in impl), so the change is
transparent to plugin authors.

``callop`` in ``test_assertion`` now materialises the iterator so
tests keep comparing against literal lists.
@Pierre-Sassoulas Pierre-Sassoulas added type: performance performance or memory problem/improvement type: refactoring internal improvements to the code labels May 26, 2026
@Pierre-Sassoulas Pierre-Sassoulas marked this pull request as draft May 26, 2026 10:17
* Drop ``truncate.truncate_if_required`` — all callers migrated to
  ``materialize_with_truncation`` and the function had no remaining
  users.

* Add ``TestMaterializeWithTruncation`` covering:
    - iterator within limits returns all lines
    - iterator past limits is bounded and contains a truncation marker
    - sized and unsized inputs produce equivalent shapes
    - truncation is skipped at ``-vv``
    - the lines that survive truncation start with the original input

  Assertions check behaviour (the presence of a "truncated" marker,
  the length being bounded, the first lines being preserved), never
  the literal footer wording — so the tests survive a future decision
  to drop the ``(N lines hidden)`` count from the message.

* Add ``test_plugin_hook_returning_none_is_skipped`` to cover the
  ``if new_expl is None: continue`` branch in ``callbinrepr``.

* Add ``test_exception_before_first_yield_emits_summary_and_notice``
  to cover the ``summary_yielded is False`` arm of
  ``assertrepr_compare``'s exception handler — when the comparator
  raises before yielding anything, the summary is still produced so
  the reader sees what was compared.
@psf-chronographer psf-chronographer Bot added the bot:chronographer:provided (automation) changelog entry is part of PR label May 26, 2026
@Pierre-Sassoulas Pierre-Sassoulas marked this pull request as ready for review May 26, 2026 11:39
Copy link
Copy Markdown
Member

@nicoddemus nicoddemus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@Pierre-Sassoulas
Copy link
Copy Markdown
Member Author

Thank you ! Do you have an opinion about the next step ? (3 options in the PR description)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:chronographer:provided (automation) changelog entry is part of PR type: performance performance or memory problem/improvement type: refactoring internal improvements to the code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants