Skip to content

[VL][FOLLOWUP] Reduce Velox hash shuffle partition buffer memory by evicting large partitions after split#12156

Open
wankunde wants to merge 1 commit into
apache:mainfrom
wankunde:hash_shuffle_writer2
Open

[VL][FOLLOWUP] Reduce Velox hash shuffle partition buffer memory by evicting large partitions after split#12156
wankunde wants to merge 1 commit into
apache:mainfrom
wankunde:hash_shuffle_writer2

Conversation

@wankunde
Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

Bug fix for VeloxHashShuffleWriter::doSplit.

How does this issue happen :

  • After set splitState_ == kInit, do evictPartitionBuffers() to evict large partitions.
  • evictPartitionBuffers(pid) calls assembleBuffers, which slices / copies / serializes buffers and may allocate from the Velox memory pool.
  • The allocation can trigger memory arbitration, which calls back into reclaimFixedSize.
  • Because evictState_ is still kEvictable and evictPartitionBuffersAfterSpill() is true, the arbitrator enters evictPartitionBuffersMinSize, which may select the same pid we are currently evicting.
  • The inner eviction resets partitionBuffers_[*][pid], partitionBinaryAddrs_[*][pid], and partitionBufferBase_[pid] while the outer call is still mid-assembly.

How to fix this issue:

  • Do do evictPartitionBuffers() before set splitState_ == kInit

How was this patch tested?

Use exists UT

Was this patch authored or co-authored using generative AI tooling?

NO

@github-actions github-actions Bot added the VELOX label May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants