Skip to content

IN LIST: add UInt8 bitmap filter#23011

Draft
geoffreyclaude wants to merge 4 commits into
apache:mainfrom
geoffreyclaude:perf/in_list_bitmap_u8_filter
Draft

IN LIST: add UInt8 bitmap filter#23011
geoffreyclaude wants to merge 4 commits into
apache:mainfrom
geoffreyclaude:perf/in_list_bitmap_u8_filter

Conversation

@geoffreyclaude

@geoffreyclaude geoffreyclaude commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

IN LIST evaluates expressions like x IN (1, 3, 7). The list on the right is fixed, so DataFusion can precompute a small lookup structure once and then reuse it for every input row.

For UInt8, there are only 256 possible values: 0 through 255. That means the lookup can be a tiny checklist with one bit per possible value:

  • If the list contains 3, set bit 3.
  • If the list contains 7, set bit 7.
  • To check whether an input value is present, read that one bit.

So instead of hashing each input value or comparing it against the list, membership becomes one indexed bit test. The bitmap is only 32 bytes, because 256 bits = 32 bytes.

This PR adds the first specialized primitive path in the stack as a concrete UInt8 filter. The UInt16 version is added in #23012, and the shared bitmap abstraction is introduced only after both concrete implementations are visible in #23035.

What changes are included in this PR?

  • Adds UInt8BitmapFilter, a 32-byte bitmap built from the non-null constants in the IN list.
  • Routes UInt8 constant-list filtering to that bitmap path.
  • Keeps the same SQL null behavior as the generic path for both IN and NOT IN.
  • Moves shared dictionary-needle handling into static_filter.rs, so specialized filters can reuse it consistently.
  • Adds focused tests for UInt8 null handling and dictionary-encoded needles.

Are these changes tested?

Yes.

  • cargo fmt --all
  • cargo test -p datafusion-physical-expr bitmap_filter_u8 --lib
  • cargo test -p datafusion-physical-expr in_list_int_types --lib
  • cargo clippy -p datafusion-physical-expr --all-targets --all-features -- -D warnings

Are there any user-facing changes?

No. This is an internal performance optimization only.

Local benchmark snapshot

Benchmark command:

cargo bench -p datafusion-physical-expr --profile release-nonlto --bench in_list_strategy -- --save-baseline <name>

Method: compare adjacent saved baselines using raw Criterion sample minima (min(time / iters)). Lower is better; changes within +/-5% are treated as noise. These numbers were not rerun after splitting the bitmap abstraction into #23035.

Compared baselines: #21927 -> #23011

Relevant scope: UInt8 narrow-integer rows.

Summary: 5 relevant rows, 5 faster, 0 slower, 0 within +/-5%.

Benchmark Before After Change
narrow_integer/u8/list=16/match=0% 20.39 us 3.94 us -80.7% (5.18x faster)
narrow_integer/u8/list=16/match=50% 38.38 us 3.98 us -89.6% (9.65x faster)
narrow_integer/u8/list=4/match=0% 18.18 us 3.93 us -78.4% (4.62x faster)
narrow_integer/u8/list=4/match=50% 34.63 us 3.96 us -88.6% (8.75x faster)
nulls/narrow_integer/u8/list=16/match=50%/nulls=20% 37.12 us 4.16 us -88.8% (8.93x faster)

@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label Jun 18, 2026
@geoffreyclaude geoffreyclaude force-pushed the perf/in_list_bitmap_u8_filter branch from b865b12 to b910c6a Compare June 18, 2026 07:55
@geoffreyclaude geoffreyclaude changed the title Implement Bitmap Filter for UInt8 (Stack-based) IN LIST: add UInt8 bitmap filter Jun 18, 2026
@geoffreyclaude geoffreyclaude force-pushed the perf/in_list_bitmap_u8_filter branch 2 times, most recently from 80597b1 to 2f19956 Compare June 19, 2026 05:35
Replaces HashSet<u8> with a 32-byte stack-allocated bitmap. Provides O(1) membership testing via bit-shifting, significantly reducing memory overhead and improving cache locality. Triggers for UInt8 arrays.
@geoffreyclaude geoffreyclaude force-pushed the perf/in_list_bitmap_u8_filter branch from 2f19956 to 5351b95 Compare June 19, 2026 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant