Skip to content

IN LIST: add direct-probe hash filter for large primitive lists#23015

Draft
geoffreyclaude wants to merge 9 commits into
apache:mainfrom
geoffreyclaude:perf/in_list_direct_probe_filter
Draft

IN LIST: add direct-probe hash filter for large primitive lists#23015
geoffreyclaude wants to merge 9 commits into
apache:mainfrom
geoffreyclaude:perf/in_list_direct_probe_filter

Conversation

@geoffreyclaude

@geoffreyclaude geoffreyclaude commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

#23014 handles tiny primitive IN lists by comparing against each constant. That stops being a good tradeoff once the list gets larger.

For larger primitive lists, this PR uses a purpose-built lookup table. The mental model is:

  1. Precompute a table from the constants in x IN (...).
  2. For each input value, compute a cheap table slot from the value.
  3. Check that slot, and move forward if there was a collision.

This is still a hash-table style lookup, but it is simpler than the generic fallback because primitive values are fixed-width and can be stored directly. There is no need for the generic Arrow comparator path for each candidate.

The earlier bitmap and branchless filters remain in place for the cases where they are cheaper.

What changes are included in this PR?

  • Adds DirectProbeFilter, a compact open-addressing lookup table with linear probing.
  • Routes larger primitive IN lists to direct probing after the branchless thresholds.
  • Supports zero-copy same-width reinterpretation for compatible primitive types.
  • Avoids extra temporary value copies when building the table.
  • Keeps slice and null handling on the raw-buffer fast path.

Are these changes tested?

Yes.

  • cargo fmt --all --check
  • cargo test -p datafusion-physical-expr direct_probe --lib
  • cargo clippy -p datafusion-physical-expr --all-targets --all-features -- -D warnings

Are there any user-facing changes?

No. This is an internal performance optimization only.

Local benchmark snapshot

Benchmark command:

cargo bench -p datafusion-physical-expr --profile release-nonlto --bench in_list_strategy -- --save-baseline <name>

Method: compare adjacent saved baselines using raw Criterion sample minima (min(time / iters)). Lower is better; changes within +/-5% are treated as noise.

Compared baselines: #23014 -> #23015

Relevant scope: large primitive-list rows.

Summary: 13 relevant rows, 13 faster, 0 slower, 0 within +/-5%.

Benchmark Before After Change
f32/large_list/list=64/match=0% 18.83 us 7.91 us -58.0% (2.38x faster)
f32/large_list/list=64/match=50% 33.00 us 10.27 us -68.9% (3.21x faster)
nulls/primitive/i32/large_list/list=64/match=50%/nulls=20% 25.79 us 11.26 us -56.3% (2.29x faster)
primitive/i32/large_list/list=256/match=0% 17.80 us 7.99 us -55.1% (2.23x faster)
primitive/i32/large_list/list=256/match=50% 27.31 us 10.25 us -62.5% (2.67x faster)
primitive/i32/large_list/list=64/match=0% 18.15 us 8.05 us -55.7% (2.26x faster)
primitive/i32/large_list/list=64/match=50% 27.85 us 10.25 us -63.2% (2.72x faster)
primitive/i64/large_list/list=128/match=0% 18.93 us 8.00 us -57.7% (2.37x faster)
primitive/i64/large_list/list=128/match=50% 24.24 us 10.08 us -58.4% (2.41x faster)
primitive/i64/large_list/list=32/match=0% 19.82 us 8.51 us -57.1% (2.33x faster)
primitive/i64/large_list/list=32/match=50% 26.01 us 11.31 us -56.5% (2.30x faster)
timestamp_ns/large_list/list=32/match=0% 19.38 us 8.49 us -56.2% (2.28x faster)
timestamp_ns/large_list/list=32/match=50% 45.82 us 10.03 us -78.1% (4.57x faster)

@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label Jun 18, 2026
@geoffreyclaude geoffreyclaude force-pushed the perf/in_list_direct_probe_filter branch 2 times, most recently from 12ca843 to 0111ce5 Compare June 18, 2026 09:12
@geoffreyclaude geoffreyclaude changed the title Implement Direct Probe (Hash) Filter for large primitive lists IN LIST: add direct-probe hash filter for large primitive lists Jun 18, 2026
@geoffreyclaude geoffreyclaude force-pushed the perf/in_list_direct_probe_filter branch 2 times, most recently from 92f2c37 to a109166 Compare June 19, 2026 05:35
Replaces HashSet<u8> with a 32-byte stack-allocated bitmap. Provides O(1) membership testing via bit-shifting, significantly reducing memory overhead and improving cache locality. Triggers for UInt8 arrays.
Implements an 8 KB heap-allocated bitmap for UInt16. Maintains O(1) performance while handling the larger value space. Triggers for UInt16 arrays.
Introduces zero-copy buffer reinterpretation to allow signed integers and other 1 or 2-byte primitive types (e.g. Float16) to use the high-performance bitmap filters. Triggers for all types with 1-byte or 2-byte width.
Adds a const-generic unrolled comparison chain that avoids CPU branching. Outperforms hash lookups for very small lists. Triggers for primitives when list size <= 32 (4-byte), 16 (8-byte), or 4 (16-byte).
Implements a fast hash table using open addressing with linear probing and a 25% load factor. Replaces the legacy HashSet for primitives, reducing indirection. Triggers for primitives when list size exceeds branchless thresholds.
@geoffreyclaude geoffreyclaude force-pushed the perf/in_list_direct_probe_filter branch from a109166 to 2e20173 Compare June 19, 2026 05:55
@github-actions

Copy link
Copy Markdown

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
error: running 'cargo update' on crate 'datafusion-physical-expr' failed with output:
-----
    Updating crates.io index
error: failed to get `arrow-cast` as a dependency of package `arrow v59.0.0`
    ... which satisfies dependency `arrow = "^59.0.0"` of package `datafusion-physical-expr v54.0.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-expr)`
    ... which satisfies path dependency `datafusion-physical-expr` of package `placeholder v0.0.0 (/home/runner/work/datafusion/datafusion/target/semver-checks/local-datafusion_physical_expr-54_0_0-x86_64_unknown_linux_gnu-803dd5f4795bc6a9)`

Caused by:
  failed to load source for dependency `arrow-cast`

Caused by:
  unable to update registry `crates-io`

Caused by:
  download of ar/ro/arrow-cast failed

Caused by:
  curl failed

Caused by:
  [16] Error in the HTTP2 framing layer

-----
error: failed to update dependencies for crate datafusion-physical-expr v54.0.0
note: this is unlikely to be a bug in cargo-semver-checks,
      and is probably an issue with the crate's Cargo.toml
note: the following command can be used to reproduce the compilation error:
      cargo new --lib example &&
          cd example &&
          echo '[workspace]' >> Cargo.toml &&
          cargo add --path /home/runner/work/datafusion/datafusion/datafusion/physical-expr --features proto,recursive_protection &&
          cargo update

error: aborting due to failure to run 'cargo update' for crate datafusion-physical-expr v54.0.0

@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant