Skip to content

Further improve performance of IN list evaluation #19241

@geoffreyclaude

Description

@geoffreyclaude

Summary

IN LIST evaluates expressions like:

x IN (1, 3, 7)

The list on the right is constant, so DataFusion can build a lookup structure once and reuse it for every input row. This matters for dynamic filter pushdown, where these checks can run millions of times during large scans.

The goal of this stack is to pick a lookup strategy that matches the data type and list size, instead of sending every common case through the same generic comparator-based path.

The SQL behavior should stay the same, including IN, NOT IN, dictionaries, slices, and null handling. These are internal performance optimizations.

Current Stack

Review order follows this list:

How The Optimizations Work

The stack is a sequence of small lookup improvements. Each PR keeps the same result semantics, but makes a common membership check cheaper.

Bitmap Lookup

For small integer domains, the IN list can be represented as a bitmap.

For UInt8, there are only 256 possible values. If the list contains 42, set bit 42. To check an input row with value 42, read bit 42.

That turns membership into one indexed bit test.

Reusing The Same Bytes

Some Arrow types have different meanings but the same physical width. For example, Int8 and UInt8 are both one byte wide.

The bitmap only cares about the exact bits, so compatible values can reuse the same lookup storage without copying the array. This is the “zero-copy reinterpretation” part of the stack.

Tiny Lists

For very small primitive lists, comparing directly can be cheaper than hashing.

For example:

x IN (10, 20, 30)

can be checked like:

x == 10 OR x == 20 OR x == 30

That is the branchless small-list path.

Larger Primitive Lists

Once primitive lists get larger, direct comparison is no longer the best fit. The stack then switches to a compact purpose-built lookup table for fixed-width primitive values.

This is still hash-table-style lookup, but it is simpler than the generic fallback because primitive values can be stored and probed directly.

Strings And Binary Views

Strings and binary values can be expensive to compare byte-for-byte. The string/view PRs first ask a cheaper question:

Could this value possibly match anything in the IN list?

Length and prefix checks can reject many non-matches quickly. If a value still looks like a possible match, long values are verified with exact byte comparison before returning true.

Fixed-Size Binary

FixedSizeBinary(1), (2), (4), (8), and (16) have fixed-width byte layouts that match the primitive-width optimizations above.

Those values can reuse the bitmap, branchless, and direct-probe paths internally. Other fixed-size binary widths stay on the generic fallback.

Expected Impact

Reference benchmarks from #19390 show the largest wins in areas where the old path did much more work than necessary:

Area Why it can be faster
UInt8 / UInt16 one bitmap bit test per input value
Small primitive lists direct comparisons avoid hash-table overhead
Larger primitive lists compact fixed-width lookup avoids the generic comparator path
Date, timestamp, float, and signed integer paths same-width values can reuse primitive fast paths
Utf8View / BinaryView length and prefix checks reject many non-matches cheaply
FixedSizeBinary supported widths fixed-width byte values can reuse primitive lookup machinery

The detailed benchmark numbers live in #19390 and have been refreshed in the dedicated PRs as review changes the stack. #23017 was closed because its local benchmark snapshot did not justify the added regular Utf8 / LargeUtf8 path.

Implementation Plan

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions