Skip to content

fix: track untracked memory allocations in spill-capable operators#1

Draft
wirybeaver wants to merge 1 commit into
mainfrom
spill-memory-tracker
Draft

fix: track untracked memory allocations in spill-capable operators#1
wirybeaver wants to merge 1 commit into
mainfrom
spill-memory-tracker

Conversation

@wirybeaver

Copy link
Copy Markdown
Owner

Which issue does this PR close?

Rationale for this change

DataFusion's spill-capable operators (NestedLoopJoin, Aggregation, External Sort) allocate more memory than they reserve through MemoryPool. This causes actual memory usage to exceed the declared pool budget, leading to OOM kills in production. This PR closes the gap between actual allocator usage and MemoryPool-reported usage by adding MemoryReservation tracking at each untracked allocation site.

What changes are included in this PR?

SpillManager centralized IPC buffer accounting:

  • SpillManager::new() requires a caller-owned MemoryReservation (production paths must pass one)
  • Write-side: append_batch tracks IPC buffer overhead with balanced grow/shrink
  • Read-side: read_spill_as_stream accepts optional reservation with capacity pre-reservation before spawn_buffered; unbuffered reads pre-reserve with try_grow before each blocking decode
  • ReservationGuard RAII utility (try_grow_guard/grow_guard) for error-safe accounting

NestedLoopJoin probe accounting:

  • probe_reservation field for probe-phase intermediates
  • RAII guards for Cartesian indices, filter take, output take (documented make-progress exceptions)
  • push_output_batch with best-effort try_grow and error-path cleanup

Aggregation emit accounting:

  • transient_reservation tracks emitted arrays until yielded downstream
  • estimated_emit_size() on GroupValues trait (6 implementations)

External sort:

  • Sort merge single-file path transfers merge reservation via take() with grow-to-at-least semantics

Headroom: HEADROOM_FACTOR reduced from 8.0 to 6.0.

Are these changes tested?

Yes:

  • 4 ReservationGuard unit tests (auto_shrinks, release_prevents_shrink, grow_guard, error_path)
  • 3 SpillManager write accounting tests (balance, exhausted pool)
  • 4 SpillManager read accounting tests (unbuffered tracking, buffered capacity, transferred reservation reuse, exhausted pool controlled failure)
  • 4 NLJ reservation balance tests (inner/left/right/full joins)
  • 2 aggregation reservation balance tests (primitive keys, GroupValuesRows Duration+Utf8)
  • 1 sort spill reservation balance test
  • All existing NLJ (46), spill (81), and sort (3) tests pass

Are there any user-facing changes?

  • SpillManager::new() now requires a MemoryReservation parameter (API change for external callers constructing SpillManager directly)
  • read_spill_as_stream and read_spill_as_stream_unbuffered accept an additional optional MemoryReservation parameter

Add memory reservation tracking for previously unaccounted allocations
in NestedLoopJoin, GroupValuesRows emit, External Sort merge, and
SpillManager IPC buffers to close the gap between actual allocator
usage and MemoryPool-reported usage, preventing OOM kills.

Key changes:

SpillManager centralized accounting:
- SpillManager::new() requires caller-owned MemoryReservation
- Write-side: append_batch tracks IPC buffer overhead (grow/shrink)
- Read-side: read_spill_as_stream accepts optional reservation with
  capacity pre-reservation before spawn_buffered; unbuffered reads
  pre-reserve one batch slot with try_grow before each blocking decode
- ReservationGuard RAII utility (try_grow_guard/grow_guard)

NestedLoopJoin probe accounting:
- probe_reservation field for tracking probe-phase intermediates
- RAII guards for Cartesian indices, filter take, output take
- push_output_batch with best-effort try_grow and error-path cleanup
- 4 reservation balance tests (inner/left/right/full joins)

Aggregation emit accounting:
- transient_reservation tracks emitted arrays until yielded downstream
- estimated_emit_size() on GroupValues trait (6 implementations)
- 2 reservation balance tests (primitive + GroupValuesRows path)

External sort:
- Sort merge single-file path transfers merge reservation via take()
- Grow-to-at-least semantics for transferred reservation reuse
- 1 sort spill reservation balance test

HEADROOM_FACTOR reduced from 8.0 to 6.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-execution v53.1.0 (current)
       Built [  32.906s] (current)
     Parsing datafusion-execution v53.1.0 (current)
      Parsed [   0.028s] (current)
    Building datafusion-execution v53.1.0 (baseline)
       Built [  30.116s] (baseline)
     Parsing datafusion-execution v53.1.0 (baseline)
      Parsed [   0.027s] (baseline)
    Checking datafusion-execution v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.332s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [  65.346s] datafusion-execution
    Building datafusion-physical-plan v53.1.0 (current)
       Built [  35.932s] (current)
     Parsing datafusion-physical-plan v53.1.0 (current)
      Parsed [   0.137s] (current)
    Building datafusion-physical-plan v53.1.0 (baseline)
       Built [  36.225s] (baseline)
     Parsing datafusion-physical-plan v53.1.0 (baseline)
      Parsed [   0.136s] (baseline)
    Checking datafusion-physical-plan v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.903s] 223 checks: 221 pass, 2 fail, 0 warn, 30 skip

--- failure method_parameter_count_changed: pub method parameter count changed ---

Description:
A publicly-visible method now takes a different number of parameters, not counting the receiver (self) parameter.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/method_parameter_count_changed.ron

Failed in:
  datafusion_physical_plan::SpillManager::new takes 3 parameters in /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/f2a3ea8915d1487951ca324a1842efe82b7ead62/datafusion/physical-plan/src/spill/spill_manager.rs:51, but now takes 4 parameters in /home/runner/work/datafusion/datafusion/datafusion/physical-plan/src/spill/spill_manager.rs:71
  datafusion_physical_plan::SpillManager::read_spill_as_stream takes 2 parameters in /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/f2a3ea8915d1487951ca324a1842efe82b7ead62/datafusion/physical-plan/src/spill/spill_manager.rs:179, but now takes 3 parameters in /home/runner/work/datafusion/datafusion/datafusion/physical-plan/src/spill/spill_manager.rs:238
  datafusion_physical_plan::SpillManager::read_spill_as_stream_unbuffered takes 2 parameters in /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/f2a3ea8915d1487951ca324a1842efe82b7ead62/datafusion/physical-plan/src/spill/spill_manager.rs:194, but now takes 3 parameters in /home/runner/work/datafusion/datafusion/datafusion/physical-plan/src/spill/spill_manager.rs:276

--- failure trait_method_added: pub trait method added ---

Description:
A non-sealed public trait added a new method without a default implementation, which breaks downstream implementations of the trait
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#trait-new-item-no-default
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/trait_method_added.ron

Failed in:
  trait method datafusion_physical_plan::aggregates::group_values::GroupValues::estimated_emit_size in file /home/runner/work/datafusion/datafusion/datafusion/physical-plan/src/aggregates/group_values/mod.rs:120

     Summary semver requires new major version: 2 major and 0 minor checks failed
    Finished [  75.236s] datafusion-physical-plan
    Building datafusion-sqllogictest v53.1.0 (current)
       Built [ 174.549s] (current)
     Parsing datafusion-sqllogictest v53.1.0 (current)
      Parsed [   0.027s] (current)
    Building datafusion-sqllogictest v53.1.0 (baseline)
       Built [ 173.281s] (baseline)
     Parsing datafusion-sqllogictest v53.1.0 (baseline)
      Parsed [   0.028s] (baseline)
    Checking datafusion-sqllogictest v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.133s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [ 351.178s] datafusion-sqllogictest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant