Skip to content

CaseWhen uses forward pass with a remaining mask#6804

Open
palaska wants to merge 33 commits intodevelopfrom
bp/case-when
Open

CaseWhen uses forward pass with a remaining mask#6804
palaska wants to merge 33 commits intodevelopfrom
bp/case-when

Conversation

@palaska
Copy link
Contributor

@palaska palaska commented Mar 5, 2026

Each condition is ANDed with remaining (unmatched rows), producing disjoint branch masks by construction. This enables early exit when all rows are claimed and skips evaluating THEN expressions for branches with no remaining matches.

Matched branches are merged in a single builder pass instead of N intermediate arrays.

Inspired by Datafusion's CASE WHEN optimization blog post

Future work

  • Hash table lookup — detecting WHEN col = const THEN const patterns and compiling them into a vectorized dictionary lookup (I think this needs to live in simplify?)
  • Column projection — narrowing the input batch to only columns referenced by a branch's condition before evaluating it (not sure if we need this)

Also fixes a nullability bug in zip_impl_with_builder where AllOr::All/AllOr::None short-circuits bypassed the builder's declared return type, producing a wrong dtype when if_true and if_false had different nullability.

Benches
Before
Screenshot 2026-03-05 at 01 38 14
After
Screenshot 2026-03-05 at 01 33 24

Signed-off-by: Baris Palaska <barispalaska@gmail.com>
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 5, 2026

Merging this PR will improve performance by 59.97%

⚡ 9 improved benchmarks
✅ 991 untouched benchmarks
🆕 5 new benchmarks
⏩ 1466 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
🆕 Simulation case_when_fragmented[10000] N/A 4.2 ms N/A
Simulation case_when_nary_10_conditions[1000] 414.3 µs 363 µs +14.12%
Simulation case_when_nary_10_conditions[10000] 1,060.9 µs 754.9 µs +40.52%
🆕 Simulation case_when_fragmented[1000] N/A 460.6 µs N/A
Simulation case_when_nary_3_conditions[100000] 2.9 ms 2.2 ms +29.09%
Simulation case_when_nary_3_conditions[10000] 436.6 µs 373 µs +17.05%
Simulation case_when_nary_10_conditions[100000] 7.3 ms 4.6 ms +59.97%
Simulation case_when_all_false[1000] 134.1 µs 110.4 µs +21.45%
🆕 Simulation case_when_nary_early_dominant[100000] N/A 2.3 ms N/A
Simulation case_when_nary_equality_lookup[100000] 4.1 ms 3.2 ms +29.68%
Simulation case_when_nary_equality_lookup[10000] 631.3 µs 523.7 µs +20.54%
🆕 Simulation case_when_nary_early_dominant[1000] N/A 183.5 µs N/A
🆕 Simulation case_when_nary_early_dominant[10000] N/A 374.9 µs N/A
Simulation bitwise_not_vortex_buffer_mut[128] 530.3 ns 471.9 ns +12.36%

Comparing bp/case-when (50f3ed8) with develop (aff52ad)2

Open in CodSpeed

Footnotes

  1. 1466 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on develop (6ff018b) during the generation of this report, so aff52ad was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@palaska palaska added the changelog/performance A performance improvement label Mar 5, 2026
@joseph-isaacs
Copy link
Contributor

Cool

@joseph-isaacs
Copy link
Contributor

mind if we get this on in first: #6806

palaska added 2 commits March 5, 2026 17:24
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
palaska added 4 commits March 5, 2026 19:09
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
@palaska palaska requested a review from joseph-isaacs March 5, 2026 19:23
palaska and others added 7 commits March 5, 2026 19:32
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Comment on lines +297 to +306
for (start, end, branch_idx) in &events {
if builder.len() < *start {
builder.extend_from_array(&else_value.slice(builder.len()..*start)?);
}
builder.extend_from_array(&branches[*branch_idx].1.slice(*start..*end)?);
}
if builder.len() < row_count {
builder.extend_from_array(&else_value.slice(builder.len()..row_count)?);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be very expensive if the slices are small

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on my benchmark sweep (again 😄 ), 4 as the average length of the runs seems to be a good cutoff point to choose between strategies. I know you wanted this optimization to live in zip but as we talked disjointness is not guaranteed there so I feel like it's fine to have it here?

It'd be great if you can double check if my merge_row_by_row implementation is optimal.

palaska added 4 commits March 9, 2026 12:28
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
@palaska palaska requested a review from joseph-isaacs March 9, 2026 17:37

/// Walks rows with a span cursor, emitting one `scalar_at` per row.
/// Zero per-run allocations; preferred for fragmented masks (avg run < [`SLICE_CROSSOVER_RUN_LEN`]).
fn merge_row_by_row(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this benchmarked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but only for primitive arrays (avg length of 4 is calibrated using primitive arrays). I guess the cost will differ based on the underlying encodings of the arrays. Probably a lower cutoff will work better when scalar_at is more expensive. Not sure what the best move here is.

}

Ok(result)
merge_case_branches(branches, else_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really feels like we want an expr that is n-way merge, but that is future work I think

}
}
}
spans.sort_unstable_by_key(|&(start, ..)| start);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works since each range is globally disjoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yepp

Comment on lines +321 to +324
for row in 0..row_count {
while cursor < spans.len() && spans[cursor].1 <= row {
cursor += 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very inefficient? Didn't you just sort this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do something else other than append_scalar(branch_arr.scalar_at(...)) once per row? I guess you are saying inefficient because of the inner while loop which just increments a counter. I refactored it so it's better structured now but it's still does n scalar_at calls

palaska added 13 commits March 10, 2026 13:52
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
branch_arrays: &[ArrayRef],
else_value: &ArrayRef,
spans: &[(usize, usize, usize)],
row_count: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else_value.len()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/performance A performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants