Skip to content

Avoid concatenating record batches in joins#23032

Open
maxburke wants to merge 2 commits into
apache:mainfrom
urbanlogiq:join-avoid-concat
Open

Avoid concatenating record batches in joins#23032
maxburke wants to merge 2 commits into
apache:mainfrom
urbanlogiq:join-avoid-concat

Conversation

@maxburke

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes issue #23031

Rationale for this change

We run into two problems when operating on datasets with approximately 60 million rows:

  1. First, we get OOM killed on machines with 64gb or less of memory
  2. Second, on machines with more than 64gb, we overflow string array offsets during the record batch concatenation in the core of the join.

What changes are included in this PR?

This removes record batch concatenation from several joins (hash join, nested loop join, piecewise merge join)

Are these changes tested?

Yes

Are there any user-facing changes?

I sure hope not! (no)

@maxburke maxburke changed the title Join avoid concat Avoid concatenating record batches in joins Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant