Skip to content

Iterative Execution 4a#6755

Draft
gatesn wants to merge 7 commits intodevelopfrom
ngates/execution-4a
Draft

Iterative Execution 4a#6755
gatesn wants to merge 7 commits intodevelopfrom
ngates/execution-4a

Conversation

@gatesn
Copy link
Contributor

@gatesn gatesn commented Mar 2, 2026

No description provided.

gatesn and others added 3 commits March 2, 2026 12:17
Adds a method to replace a single child of an array by index,
building on the existing with_children infrastructure. This is
needed by the upcoming iterative execution scheduler which replaces
children one at a time as they are executed.

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds the ExecutionStep enum (ExecuteChild, ColumnarizeChild, Done)
that encodings will return from VTable::execute instead of ArrayRef.
This is infrastructure for the upcoming iterative execution scheduler.

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…duler

Changes the VTable::execute signature to return ExecutionStep instead of
ArrayRef, and replaces the Executable for Columnar implementation with an
iterative work-stack scheduler.

The ExecutionStep enum has three variants:
- ExecuteChild(i): ask the scheduler to execute child i to columnar
- ColumnarizeChild(i): same but skip cross-step optimization
- Done(result): execution complete

The new scheduler in Executable for Columnar uses an explicit stack
instead of recursion, and runs reduce/reduce_parent rules between
steps via the existing optimizer infrastructure.

All encoding implementations are mechanically wrapped in
ExecutionStep::Done(...) to preserve existing behavior. Individual
encodings will be migrated to use ExecuteChild/ColumnarizeChild in
follow-up PRs.

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor Author

gatesn commented Mar 2, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@codspeed-hq
Copy link

codspeed-hq bot commented Mar 2, 2026

Merging this PR will degrade performance by 35.16%

❌ 29 regressed benchmarks
✅ 925 untouched benchmarks
⏩ 1466 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_bool_canonical_into[(10, 1000)] 1.4 ms 1.8 ms -23.05%
Simulation chunked_opt_bool_canonical_into[(10, 1000)] 1.5 ms 2.3 ms -35.16%
Simulation chunked_opt_bool_canonical_into[(100, 100)] 733.7 µs 832.9 µs -11.91%
Simulation chunked_bool_canonical_into[(100, 100)] 167.6 µs 209.9 µs -20.14%
Simulation chunked_varbinview_canonical_into[(10, 1000)] 2.7 ms 3.1 ms -14.93%
Simulation chunked_opt_bool_into_canonical[(10, 1000)] 1.7 ms 2.5 ms -31.7%
Simulation chunked_opt_bool_into_canonical[(100, 100)] 765.1 µs 886.4 µs -13.69%
Simulation chunked_varbinview_into_canonical[(10, 1000)] 2.9 ms 3.4 ms -13.84%
Simulation decompress_alp[f32, (1000, 0.01, 0.25)] 15.4 µs 19.2 µs -19.97%
Simulation decompress_alp[f32, (1000, 0.1, 0.25)] 15.3 µs 19.1 µs -20.19%
Simulation decompress_alp[f32, (1000, 0.01, 0.95)] 15.1 µs 18.9 µs -20.39%
Simulation decompress_alp[f32, (1000, 0.1, 1.0)] 16.1 µs 18.2 µs -11.43%
Simulation decompress_alp[f32, (1000, 0.1, 0.95)] 16.9 µs 20.7 µs -18.47%
Simulation decompress_alp[f32, (1000, 0.01, 1.0)] 15 µs 17.1 µs -12.14%
Simulation decompress_alp[f32, (10000, 0.0, 0.25)] 29.8 µs 33.2 µs -10.25%
Simulation decompress_alp[f64, (1000, 0.0, 0.25)] 14.1 µs 17.5 µs -19.39%
Simulation decompress_alp[f32, (10000, 0.01, 0.25)] 34.5 µs 38.3 µs -10.01%
Simulation decompress_alp[f32, (10000, 0.0, 0.95)] 30 µs 33.4 µs -10.12%
Simulation decompress_alp[f32, (1000, 0.0, 1.0)] 11.2 µs 12.7 µs -12.05%
Simulation decompress_alp[f64, (1000, 0.0, 0.95)] 14.1 µs 17.5 µs -19.39%
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Comparing ngates/execution-4a (e7e9518) with develop (8e2beb5)

Open in CodSpeed

Footnotes

  1. 1466 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@gatesn gatesn changed the title feat: migrate Zstd decompressors to iterative execute Iterative Execution 4a Mar 2, 2026
@gatesn gatesn changed the base branch from ngates/execution-3 to graphite-base/6755 March 2, 2026 19:04
@gatesn gatesn force-pushed the ngates/execution-4a branch from 08d7d8e to 1711dab Compare March 2, 2026 19:05
@gatesn gatesn changed the base branch from graphite-base/6755 to ngates/execution-3 March 2, 2026 19:05
@gatesn gatesn added the changelog/feature A new feature label Mar 2, 2026
Signed-off-by: Nicholas Gates <nick@nickgates.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gatesn gatesn changed the base branch from ngates/execution-3 to graphite-base/6755 March 2, 2026 19:19
@gatesn gatesn force-pushed the graphite-base/6755 branch from 4223d1f to 2b9697c Compare March 2, 2026 19:19
@gatesn gatesn force-pushed the ngates/execution-4a branch from 1711dab to d5bf556 Compare March 2, 2026 19:19
@gatesn gatesn changed the base branch from graphite-base/6755 to ngates/execution-3 March 2, 2026 19:19
@gatesn gatesn force-pushed the ngates/execution-4a branch 2 times, most recently from f9a335a to 70cd549 Compare March 2, 2026 20:13
@gatesn gatesn force-pushed the ngates/execution-3 branch from 5cd1e99 to 6c8dcdc Compare March 2, 2026 20:13
gatesn and others added 3 commits March 2, 2026 15:43
Move the iterative execution scheduler into a general-purpose
`execute_until<M: Matcher>` method on `dyn Array`. The scheduler
terminates when the root array matches `M`, while each child
can specify its own termination condition via a `DonePredicate`
carried in `ExecutionStep::ExecuteChild`.

`ExecutionStep` now provides constructor methods:
- `execute_child::<M>(idx)` — request child execution until M matches
- `done(result)` — signal completion

Both `Executable for Columnar` and `Executable for Canonical` are
simplified to thin wrappers over `execute_until` with `AnyColumnar`
and `AnyCanonical` matchers respectively.

Signed-off-by: Nick Gates <nick@vortex.dev>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ZstdVTable and ZstdBuffersVTable no longer recursively call
.execute() after decompression. Instead they return Done with the
decompressed intermediate, letting the scheduler re-enter execution
on the result naturally.

Signed-off-by: Nicholas Gates <nick@nickgates.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gatesn gatesn force-pushed the ngates/execution-4a branch from 70cd549 to e7e9518 Compare March 2, 2026 20:43
@gatesn gatesn force-pushed the ngates/execution-3 branch from 6c8dcdc to b5b1652 Compare March 2, 2026 20:43
@gatesn gatesn closed this Mar 9, 2026
@joseph-isaacs joseph-isaacs reopened this Mar 9, 2026
@joseph-isaacs joseph-isaacs changed the base branch from ngates/execution-3 to develop March 9, 2026 15:35
@gatesn gatesn closed this Mar 10, 2026
@gatesn gatesn reopened this Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants