Skip to content

Ji/execute iter more#6852

Closed
joseph-isaacs wants to merge 14 commits intodevelopfrom
ji/execute-iter-more
Closed

Ji/execute iter more#6852
joseph-isaacs wants to merge 14 commits intodevelopfrom
ji/execute-iter-more

Conversation

@joseph-isaacs
Copy link
Contributor

Summary

Closes: #000

Testing

gatesn and others added 13 commits March 2, 2026 12:17
Adds a method to replace a single child of an array by index,
building on the existing with_children infrastructure. This is
needed by the upcoming iterative execution scheduler which replaces
children one at a time as they are executed.

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds the ExecutionStep enum (ExecuteChild, ColumnarizeChild, Done)
that encodings will return from VTable::execute instead of ArrayRef.
This is infrastructure for the upcoming iterative execution scheduler.

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…duler

Changes the VTable::execute signature to return ExecutionStep instead of
ArrayRef, and replaces the Executable for Columnar implementation with an
iterative work-stack scheduler.

The ExecutionStep enum has three variants:
- ExecuteChild(i): ask the scheduler to execute child i to columnar
- ColumnarizeChild(i): same but skip cross-step optimization
- Done(result): execution complete

The new scheduler in Executable for Columnar uses an explicit stack
instead of recursion, and runs reduce/reduce_parent rules between
steps via the existing optimizer infrastructure.

All encoding implementations are mechanically wrapped in
ExecutionStep::Done(...) to preserve existing behavior. Individual
encodings will be migrated to use ExecuteChild/ColumnarizeChild in
follow-up PRs.

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the iterative execution scheduler into a general-purpose
`execute_until<M: Matcher>` method on `dyn Array`. The scheduler
terminates when the root array matches `M`, while each child
can specify its own termination condition via a `DonePredicate`
carried in `ExecutionStep::ExecuteChild`.

`ExecutionStep` now provides constructor methods:
- `execute_child::<M>(idx)` — request child execution until M matches
- `done(result)` — signal completion

Both `Executable for Columnar` and `Executable for Canonical` are
simplified to thin wrappers over `execute_until` with `AnyColumnar`
and `AnyCanonical` matchers respectively.

Signed-off-by: Nick Gates <nick@vortex.dev>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace recursive child.execute() calls with ExecutionStep returns in
Slice, Filter, Masked, FoR, and ZigZag vtables. Each now checks if its
child is already in the needed form (canonical/primitive/constant) and
returns Done directly, or returns ExecuteChild(0)/ColumnarizeChild(0)
to let the scheduler handle child execution iteratively.

Also handles ConstantArray children explicitly to prevent infinite loops
in the scheduler (since constants are already columnar and won't be
re-executed). FoR decompress is split into try_fused_decompress and
apply_reference for reuse without recursive execution.

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Convert Dict, ALP-RD, DecimalByteParts, and Zstd VTable::execute
implementations to return ExecutionStep instead of recursively calling
execute on children. Each encoding checks if children are already in
the expected form (Primitive/Canonical/Constant) before proceeding,
returning ExecuteChild(n) to let the scheduler handle child execution.

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ZstdVTable and ZstdBuffersVTable no longer recursively call
.execute() after decompression. Instead they return Done with the
decompressed intermediate, letting the scheduler re-enter execution
on the result naturally.

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… 4d)

Convert the last two VTable::execute implementations that had recursive
execute calls. SharedVTable now checks if current array (cached or
source) is already columnar, otherwise returns ExecuteChild(0).
ZstdBuffersVTable decompresses and returns Done(inner_array), letting
the scheduler handle further execution of the decompressed result.

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cute-iter-more

# Conflicts:
#	encodings/sequence/src/array.rs
#	encodings/zstd/public-api.lock
#	encodings/zstd/src/array.rs
#	encodings/zstd/src/zstd_buffers.rs
#	vortex-array/public-api.lock
#	vortex-array/src/arrays/bool/vtable/mod.rs
#	vortex-array/src/arrays/constant/vtable/mod.rs
#	vortex-array/src/arrays/decimal/vtable/mod.rs
#	vortex-array/src/arrays/extension/vtable/mod.rs
#	vortex-array/src/arrays/fixed_size_list/vtable/mod.rs
#	vortex-array/src/arrays/listview/vtable/mod.rs
#	vortex-array/src/arrays/null/mod.rs
#	vortex-array/src/arrays/primitive/vtable/mod.rs
#	vortex-array/src/arrays/struct_/vtable/mod.rs
#	vortex-array/src/arrays/varbinview/vtable/mod.rs
#	vortex-array/src/canonical.rs
#	vortex-array/src/executor.rs
#	vortex-array/src/vtable/dyn_.rs
#	vortex-array/src/vtable/mod.rs
…cute-iter-more

# Conflicts:
#	vortex-array/public-api.lock
#	vortex-array/src/arrays/dict/vtable/mod.rs
#	vortex-array/src/arrays/filter/vtable.rs
#	vortex-array/src/arrays/masked/vtable/mod.rs
#	vortex-array/src/arrays/shared/vtable.rs
#	vortex-array/src/arrays/slice/vtable.rs
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 9, 2026

Merging this PR will degrade performance by 49.88%

❌ 11 regressed benchmarks
✅ 989 untouched benchmarks
⏩ 1466 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation take_map[(0.1, 0.05)] 777.7 µs 1,153.2 µs -32.56%
Simulation take_map[(0.1, 0.1)] 933.8 µs 1,119.6 µs -16.59%
Simulation take_map[(0.1, 0.5)] 2.1 ms 2.7 ms -21.6%
Simulation take_map[(0.1, 1.0)] 3.7 ms 7.4 ms -49.88%
Simulation decompress[alp_for_bp_f64] 30.3 ms 37.9 ms -19.96%
Simulation decompress[datetime_for_bp] 41.7 ms 50.1 ms -16.8%
Simulation alp_rd_decompress_f64 27.4 ms 35.2 ms -21.95%
Simulation decompress_rd[f32, 100000] 1.9 ms 2.4 ms -21.98%
Simulation decompress_rd[f32, 10000] 250 µs 305.8 µs -18.24%
Simulation decompress_rd[f64, 10000] 247.4 µs 357.9 µs -30.89%
Simulation decompress_rd[f64, 100000] 2.3 ms 3.3 ms -31.64%

Comparing ji/execute-iter-more (a34d0d8) with develop (e477fa5)

Open in CodSpeed

Footnotes

  1. 1466 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants