Add list_length scalar function#8495
Conversation
Merging this PR will improve performance by 18.65%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | slice_empty_vortex |
339.4 ns | 397.8 ns | -14.66% |
| ⚡ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
27.2 µs | 15.8 µs | +71.66% |
| ⚡ | Simulation | copy_nullable[65536] |
1.4 ms | 1 ms | +32.02% |
| ⚡ | Simulation | copy_non_nullable[65536] |
1,089.2 µs | 908.5 µs | +19.89% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
215.3 ns | 186.1 ns | +15.67% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
275.6 ns | 246.4 ns | +11.84% |
| ⚡ | Simulation | eq_i64_constant |
318.6 µs | 288.7 µs | +10.34% |
| 🆕 | Simulation | list_length_large |
N/A | 10 ms | N/A |
| 🆕 | Simulation | list_length_medium |
N/A | 142.5 µs | N/A |
| 🆕 | Simulation | list_length_small |
N/A | 57.7 µs | N/A |
| 🆕 | Simulation | listview_length_large |
N/A | 6 ms | N/A |
| 🆕 | Simulation | listview_length_medium |
N/A | 97.2 µs | N/A |
| 🆕 | Simulation | listview_length_small |
N/A | 37.8 µs | N/A |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing mk/list-length (8ac3a23) with develop (3451cb0)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
0a2f1f1 to
1ed27e1
Compare
list_length scalar function
| fn return_dtype(&self, _options: &Self::Options, arg_dtypes: &[DType]) -> VortexResult<DType> { | ||
| match &arg_dtypes[0] { | ||
| DType::List(_, nullable) => Ok(DType::Primitive(PType::U64, *nullable)), | ||
| other => vortex_bail!("list_length() requires List, got {other}"), |
There was a problem hiding this comment.
May as well support FixedList as well, then implement reduce to collapse it into the constant
There was a problem hiding this comment.
Implemented reduce for nonnullable fsl, delegated nullable to execute since we can't easily get validity (talked offline)
| struct AnyList; | ||
|
|
||
| impl Matcher for AnyList { | ||
| type Match<'a> = (); |
There was a problem hiding this comment.
You should define a enum AnyListView { List(...), FixedList(...) } , then you can just match on it above in the execute_until
There was a problem hiding this comment.
do we want to execute FixedList? We can just get the size from the dtype
There was a problem hiding this comment.
You're not executing the FixedList itself, you're basically saying, run execution one step at a time until it matches one of these encodings.
So there may be some scalar function that happens to return a FixedList, then you will terminate and have access to it
There was a problem hiding this comment.
Discussed offline -- execute_until may need refactor to return Match instead of discarding it
Computes the number of elements in each list from the offsets/sizes only (never reading element values), returning a U64 array; a null list yields a null length. Registered as a built-in scalar function (vortex.list.length) alongside list_contains. Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Pushes DuckDB's list-length scalar function into the Vortex scan as the `list_length` expression, so lengths are computed from list offsets/sizes without materializing element values. Pushdowns supported: - **Projection** (`SELECT len(list)` / `length(list)` / `array_length(list)`) - **Filter** (`WHERE array_length(list) >= k`, also `len`/`length`) Each maps to `cast(list_length(col), i64)` — DuckDB's `len`/`array_length` return `BIGINT` while `list_length` returns `u64`. `len`/`length` are overloaded with strings/bits, so the filter path needs the argument type to disambiguate. Added a small FFI accessor `duckdb_vx_expr_get_return_type` plus `ExpressionRef::return_type()`, and gate `len`/`length`/`array_length` on the bound child being `LIST`/`ARRAY`. Does not currently support `array_length(expr, dim)`. Stacked on #8495. --------- Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Adds a
list_lengthscalar function returning the number of elements in each list of aList-like array.List,ListView, andFixedSizeListarrays.U64array; a null list yields a null length.vortex.list.length) alongsidelist_contains, and exposed via thelist_length(expr)expression constructor.