Skip to content

Repartition drops data when spilling #20683

@hareshkh

Description

@hareshkh

Describe the bug

In non-preserve-order repartitioning mode, all input partition tasks share clones of the same SpillPoolWriter for each output partition. SpillPoolWriter used #[derive(Clone)] but its Drop implementation unconditionally set writer_dropped = true and finalized the current spill file. This meant that when the first input task finishes and its clone is dropped, the SpillPoolReader sees writer_dropped = true on an empty queue and returns EOF — silently discarding every batch subsequently written by the still-running input tasks.

This bug requires three conditions to trigger:

  1. Non-preserve-order repartitioning (so spill writers are cloned across input tasks)
  2. Memory pressure causing batches to spill to disk
  3. Input tasks finishing at different times (the common case with varying partition sizes)

To Reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions