Further improve performance of IN list evaluation

## Summary

`IN LIST` evaluates expressions like:

```sql
x IN (1, 3, 7)
```

The list on the right is constant, so DataFusion can build a lookup structure once and reuse it for every input row. This matters for dynamic filter pushdown, where these checks can run millions of times during large scans.

The goal of this stack is to pick a lookup strategy that matches the data type and list size, instead of sending every common case through the same generic comparator-based path.

The SQL behavior should stay the same, including `IN`, `NOT IN`, dictionaries, slices, and null handling. These are internal performance optimizations.

## Current Stack

Review order follows this list:

- #21927: clean up and optimize the generic static-filter fallback
- #23011: use a tiny bitmap checklist for `UInt8`
- #23012: extend bitmap lookup to `UInt16`
- #23035: unify the `UInt8` and `UInt16` bitmap filter implementations
- #23013: let compatible signed 1-byte and 2-byte values reuse those bitmaps without copying
- #23014: use direct comparisons for very small primitive lists
- #23015: use a compact direct-probe lookup table for larger primitive lists
- #23016: use length/prefix rejection for `Utf8View` and `BinaryView`
- #23018: let supported `FixedSizeBinary` widths reuse the primitive fast paths (now stacked directly on #23016)

## How The Optimizations Work

The stack is a sequence of small lookup improvements. Each PR keeps the same result semantics, but makes a common membership check cheaper.

### Bitmap Lookup

For small integer domains, the `IN` list can be represented as a bitmap.

For `UInt8`, there are only 256 possible values. If the list contains `42`, set bit `42`. To check an input row with value `42`, read bit `42`.

That turns membership into one indexed bit test.

### Reusing The Same Bytes

Some Arrow types have different meanings but the same physical width. For example, `Int8` and `UInt8` are both one byte wide.

The bitmap only cares about the exact bits, so compatible values can reuse the same lookup storage without copying the array. This is the “zero-copy reinterpretation” part of the stack.

### Tiny Lists

For very small primitive lists, comparing directly can be cheaper than hashing.

For example:

```text
x IN (10, 20, 30)
```

can be checked like:

```text
x == 10 OR x == 20 OR x == 30
```

That is the branchless small-list path.

### Larger Primitive Lists

Once primitive lists get larger, direct comparison is no longer the best fit. The stack then switches to a compact purpose-built lookup table for fixed-width primitive values.

This is still hash-table-style lookup, but it is simpler than the generic fallback because primitive values can be stored and probed directly.

### Strings And Binary Views

Strings and binary values can be expensive to compare byte-for-byte. The string/view PRs first ask a cheaper question:

```text
Could this value possibly match anything in the IN list?
```

Length and prefix checks can reject many non-matches quickly. If a value still looks like a possible match, long values are verified with exact byte comparison before returning true.

### Fixed-Size Binary

`FixedSizeBinary(1)`, `(2)`, `(4)`, `(8)`, and `(16)` have fixed-width byte layouts that match the primitive-width optimizations above.

Those values can reuse the bitmap, branchless, and direct-probe paths internally. Other fixed-size binary widths stay on the generic fallback.

## Expected Impact

Reference benchmarks from #19390 show the largest wins in areas where the old path did much more work than necessary:

| Area | Why it can be faster |
|------|----------------------|
| `UInt8` / `UInt16` | one bitmap bit test per input value |
| Small primitive lists | direct comparisons avoid hash-table overhead |
| Larger primitive lists | compact fixed-width lookup avoids the generic comparator path |
| Date, timestamp, float, and signed integer paths | same-width values can reuse primitive fast paths |
| `Utf8View` / `BinaryView` | length and prefix checks reject many non-matches cheaply |
| `FixedSizeBinary` supported widths | fixed-width byte values can reuse primitive lookup machinery |

The detailed benchmark numbers live in #19390 and have been refreshed in the dedicated PRs as review changes the stack. #23017 was closed because its local benchmark snapshot did not justify the added regular `Utf8` / `LargeUtf8` path.

## Implementation Plan

- [ ] Land #21927 to clean up and optimize the generic static-filter fallback
- [ ] Land #23011 to add bitmap filters for 1-byte values
- [ ] Land #23012 to extend bitmap filters to 2-byte values
- [ ] Land #23035 to unify the bitmap filter implementations
- [ ] Land #23013 to add zero-copy reinterpretation for compatible fixed-width values
- [ ] Land #23014 to add branchless filters for small primitive lists
- [ ] Land #23015 to add direct-probe hash filters for larger primitive lists
- [ ] Land #23016 to add optimized filters for `Utf8View` / `BinaryView`
- [ ] Land #23018 to add `FixedSizeBinary` reinterpretation for supported widths
- [x] Drop #23017 after local benchmarks showed regressions for regular `Utf8` / `LargeUtf8`

## Additional Context

- Follow-up to #18824
- Original reference implementation: #19390
- Thresholds are benchmark-driven and may be adjusted during review




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further improve performance of IN list evaluation #19241

Summary

Current Stack

How The Optimizations Work

Bitmap Lookup

Reusing The Same Bytes

Tiny Lists

Larger Primitive Lists

Strings And Binary Views

Fixed-Size Binary

Expected Impact

Implementation Plan

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Area	Why it can be faster
`UInt8` / `UInt16`	one bitmap bit test per input value
Small primitive lists	direct comparisons avoid hash-table overhead
Larger primitive lists	compact fixed-width lookup avoids the generic comparator path
Date, timestamp, float, and signed integer paths	same-width values can reuse primitive fast paths
`Utf8View` / `BinaryView`	length and prefix checks reject many non-matches cheaply
`FixedSizeBinary` supported widths	fixed-width byte values can reuse primitive lookup machinery

Further improve performance of IN list evaluation #19241

Description

Summary

Current Stack

How The Optimizations Work

Bitmap Lookup

Reusing The Same Bytes

Tiny Lists

Larger Primitive Lists

Strings And Binary Views

Fixed-Size Binary

Expected Impact

Implementation Plan

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions