Skip to content

Add list_length scalar function#8495

Merged
mhk197 merged 6 commits into
developfrom
mk/list-length
Jun 26, 2026
Merged

Add list_length scalar function#8495
mhk197 merged 6 commits into
developfrom
mk/list-length

Conversation

@mhk197

@mhk197 mhk197 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Adds a list_length scalar function returning the number of elements in each list of a List-like array.

  • Computed purely from the list's offsets/sizes — it never reads elements. Different paths for List, ListView, and FixedSizeList arrays.
  • Returns a U64 array; a null list yields a null length.
  • Registered as a built-in (vortex.list.length) alongside list_contains, and exposed via the list_length(expr) expression constructor.

@mhk197 mhk197 requested a review from a team June 18, 2026 16:11
@mhk197 mhk197 marked this pull request as draft June 18, 2026 16:19
@codspeed-hq

codspeed-hq Bot commented Jun 18, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 18.65%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 6 improved benchmarks
❌ 1 regressed benchmark
✅ 1582 untouched benchmarks
🆕 6 new benchmarks
⏩ 4 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation slice_empty_vortex 339.4 ns 397.8 ns -14.66%
Simulation chunked_bool_canonical_into[(1000, 10)] 27.2 µs 15.8 µs +71.66%
Simulation copy_nullable[65536] 1.4 ms 1 ms +32.02%
Simulation copy_non_nullable[65536] 1,089.2 µs 908.5 µs +19.89%
Simulation bitwise_not_vortex_buffer_mut[128] 215.3 ns 186.1 ns +15.67%
Simulation bitwise_not_vortex_buffer_mut[1024] 275.6 ns 246.4 ns +11.84%
Simulation eq_i64_constant 318.6 µs 288.7 µs +10.34%
🆕 Simulation list_length_large N/A 10 ms N/A
🆕 Simulation list_length_medium N/A 142.5 µs N/A
🆕 Simulation list_length_small N/A 57.7 µs N/A
🆕 Simulation listview_length_large N/A 6 ms N/A
🆕 Simulation listview_length_medium N/A 97.2 µs N/A
🆕 Simulation listview_length_small N/A 37.8 µs N/A

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing mk/list-length (8ac3a23) with develop (3451cb0)

Open in CodSpeed

Footnotes

  1. 4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@mhk197 mhk197 force-pushed the mk/list-length branch 2 times, most recently from 0a2f1f1 to 1ed27e1 Compare June 18, 2026 17:19
@mhk197 mhk197 added the changelog/feature A new feature label Jun 18, 2026
@mhk197 mhk197 marked this pull request as ready for review June 18, 2026 20:46
@mhk197 mhk197 changed the title Add list_length scalar function Add list_length scalar function Jun 18, 2026
@mhk197 mhk197 requested review from AdamGS and gatesn June 18, 2026 20:47
Comment thread vortex-array/benches/list_length.rs
fn return_dtype(&self, _options: &Self::Options, arg_dtypes: &[DType]) -> VortexResult<DType> {
match &arg_dtypes[0] {
DType::List(_, nullable) => Ok(DType::Primitive(PType::U64, *nullable)),
other => vortex_bail!("list_length() requires List, got {other}"),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May as well support FixedList as well, then implement reduce to collapse it into the constant

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented reduce for nonnullable fsl, delegated nullable to execute since we can't easily get validity (talked offline)

Comment thread vortex-array/src/scalar_fn/fns/list_length.rs Outdated
@mhk197 mhk197 requested a review from gatesn June 19, 2026 18:52
struct AnyList;

impl Matcher for AnyList {
type Match<'a> = ();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should define a enum AnyListView { List(...), FixedList(...) } , then you can just match on it above in the execute_until

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to execute FixedList? We can just get the size from the dtype

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not executing the FixedList itself, you're basically saying, run execution one step at a time until it matches one of these encodings.

So there may be some scalar function that happens to return a FixedList, then you will terminate and have access to it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline -- execute_until may need refactor to return Match instead of discarding it

mhk197 added 5 commits June 26, 2026 10:51
Computes the number of elements in each list from the offsets/sizes only (never reading element values), returning a U64 array; a null list yields a null length. Registered as a built-in scalar function (vortex.list.length) alongside list_contains.

Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
@mhk197 mhk197 enabled auto-merge (squash) June 26, 2026 18:43
@mhk197 mhk197 merged commit 3be9427 into develop Jun 26, 2026
98 of 101 checks passed
@mhk197 mhk197 deleted the mk/list-length branch June 26, 2026 18:51
mhk197 added a commit that referenced this pull request Jun 26, 2026
Pushes DuckDB's list-length scalar function into the Vortex scan as the
`list_length` expression, so lengths are computed from list
offsets/sizes without materializing element values.

Pushdowns supported:
- **Projection** (`SELECT len(list)` / `length(list)` /
`array_length(list)`)
- **Filter** (`WHERE array_length(list) >= k`, also `len`/`length`)

Each maps to `cast(list_length(col), i64)` — DuckDB's
`len`/`array_length` return `BIGINT` while `list_length` returns `u64`.

`len`/`length` are overloaded with strings/bits, so the filter path
needs the argument type to disambiguate. Added a small FFI accessor
`duckdb_vx_expr_get_return_type` plus `ExpressionRef::return_type()`,
and gate `len`/`length`/`array_length` on the bound child being
`LIST`/`ARRAY`.

Does not currently support `array_length(expr, dim)`.

Stacked on #8495.

---------

Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants