fix: preserve no-filter SMJ matches across pending outer batches#23049
fix: preserve no-filter SMJ matches across pending outer batches#23049neilconway wants to merge 2 commits into
Conversation
895944d to
421e249
Compare
|
Thank you for catching this and for the clear explanation. I believe the fix is correct. setting |
viirya
left a comment
There was a problem hiding this comment.
Nice fix — I traced the state machine and the bug is exactly as described.
The single added line restores symmetry with the filtered path, which already does the same thing at the matching point in process_filtered_match_loop (with the comment "Clear stale batch before polling"). Without self.outer_batch = None; here, after poll_next_outer_batch returns Pending and poll_join re-enters from the top, the if self.outer_batch.is_none() guard is skipped because the stale already-emitted batch is still set. resume_boundary then compares saved_keys against the old batch's keys, applies the boundary state to the wrong batch, and the continuation rows of the matched key group in the real next batch get treated as unmatched. Clearing it first forces the top-level loop to load the actual next batch before resume_boundary runs.
I verified this locally:
- The new
no_filter_boundary_pending_with_unmatched_prefixtest passes with the fix. - Reverting just the one-line fix makes it fail with
LeftSemi left: [1], right: [1, 2]— the continuation row is dropped — so the test is a genuine regression guard, not a tautology. - The full sort_merge_join suite still passes (62 tests).
One optional nit: consider adding the same one-line comment the filtered path has (// Clear stale batch before polling) so the parallel between the two paths is obvious to future readers.
LGTM.
Which issue does this PR close?
fuzz_cases::join_fuzz::test_left_anti_join_1k#23048Rationale for this change
When the no-filter bitwise sort-merge join path finds a matching key,
it advances the inner cursor past that key before marking all matching
outer rows. If the outer key group continues into the next outer batch
and polling that batch returns
Pending,poll_joinresumes from itstop-level state with the inner cursor already past the matched key.
On resume, the stream still retained the already-emitted outer batch.
That stale batch caused the pending boundary state to be applied to the
wrong batch and then discarded. When the actual next outer batch was
loaded later, rows continuing the matched key could compare as
Lessthanthe current inner key and be incorrectly treated as unmatched.
Fix this by clearing
outer_batchafter emitting the fully consumed batchand before polling for the next outer batch. This makes the resumed
top-level state load the actual next outer batch before applying
resume_boundary, matching the filtered code path.What changes are included in this PR?
Are these changes tested?
Yes.
Are there any user-facing changes?
No.