Skip to content

IN LIST: add UInt16 bitmap filter#23012

Draft
geoffreyclaude wants to merge 5 commits into
apache:mainfrom
geoffreyclaude:perf/in_list_bitmap_u16_filter
Draft

IN LIST: add UInt16 bitmap filter#23012
geoffreyclaude wants to merge 5 commits into
apache:mainfrom
geoffreyclaude:perf/in_list_bitmap_u16_filter

Conversation

@geoffreyclaude

@geoffreyclaude geoffreyclaude commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

#23011 uses a bitmap checklist for UInt8, where there are 256 possible values. UInt16 is the same idea with a larger value range: 0 through 65,535.

That is still small enough to represent directly. A UInt16 bitmap needs one bit for each possible value:

  • 65,536 possible values
  • 65,536 bits total
  • 8 KB of memory

Then a lookup is still simple: use the input value as the bit position and check whether that bit is set. For example, if the list contains 42, bit 42 is set, and every input row with value 42 can be recognized with one bit test.

This PR keeps the scope narrow: it adds the unsigned 2-byte bitmap path as a concrete UInt16 filter. #23035 then unifies the UInt8 and UInt16 implementations, and #23013 uses that shared shape for signed same-width reinterpretation.

What changes are included in this PR?

  • Adds UInt16BitmapFilter, backed by a heap-allocated 65,536-bit bitmap.
  • Routes UInt16 constant-list filtering to that bitmap path.
  • Keeps the same IN / NOT IN null behavior as the generic path.
  • Adds focused coverage for UInt16 boundary values, nulls, and NOT IN.

Are these changes tested?

Yes.

  • cargo fmt --all
  • cargo test -p datafusion-physical-expr bitmap_filter_u16 --lib
  • cargo test -p datafusion-physical-expr in_list_int_types --lib
  • cargo test -p datafusion-physical-expr test_in_list_from_array_type_combinations --lib
  • cargo test -p datafusion-physical-expr test_in_list_dictionary_types --lib
  • cargo clippy -p datafusion-physical-expr --all-targets --all-features -- -D warnings

Are there any user-facing changes?

No. This is an internal performance optimization only.

Benchmark note

No local in_list_strategy numbers are included for this PR because the benchmark harness does not currently include a direct UInt16 case. The available i16 rows measure the signed reinterpretation path added in #23013 after the bitmap unification in #23035, not this PR's unsigned UInt16 bitmap filter.

@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label Jun 18, 2026
@geoffreyclaude geoffreyclaude force-pushed the perf/in_list_bitmap_u16_filter branch 2 times, most recently from 55f3836 to 81ec379 Compare June 18, 2026 08:40
@geoffreyclaude geoffreyclaude changed the title Extend Bitmap Filter to UInt16 (Heap-based) IN LIST: add UInt16 bitmap filter Jun 18, 2026
@geoffreyclaude geoffreyclaude force-pushed the perf/in_list_bitmap_u16_filter branch 2 times, most recently from 7043d4b to 2dbce01 Compare June 19, 2026 05:35
Replaces HashSet<u8> with a 32-byte stack-allocated bitmap. Provides O(1) membership testing via bit-shifting, significantly reducing memory overhead and improving cache locality. Triggers for UInt8 arrays.
Implements an 8 KB heap-allocated bitmap for UInt16. Maintains O(1) performance while handling the larger value space. Triggers for UInt16 arrays.
@geoffreyclaude geoffreyclaude force-pushed the perf/in_list_bitmap_u16_filter branch from 2dbce01 to 5514d78 Compare June 19, 2026 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant