Skip to content

feat[gpu]: slice support for dyn dispatch#6862

Merged
0ax1 merged 3 commits intodevelopfrom
ad/gpu-dyn-slice
Mar 10, 2026
Merged

feat[gpu]: slice support for dyn dispatch#6862
0ax1 merged 3 commits intodevelopfrom
ad/gpu-dyn-slice

Conversation

@0ax1
Copy link
Contributor

@0ax1 0ax1 commented Mar 10, 2026

Slice offsets are either applied during the dyn dispatch build plan construction or in case of bitpacking within the dyn dispatch kernel, as sub-byte offsets can't be expressed by pointer arithmetic on the device ptr.

Calling output_tile_len as part of the final execute_stage loop has no perf overhead compared to develop.

0ax1 added 2 commits March 10, 2026 14:50
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
@0ax1 0ax1 requested review from a10y and robert3005 March 10, 2026 14:52
@0ax1 0ax1 added the changelog/feature A new feature label Mar 10, 2026
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 10, 2026

Merging this PR will improve performance by 12.36%

⚡ 1 improved benchmark
✅ 999 untouched benchmarks
⏩ 1466 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation bitwise_not_vortex_buffer_mut[128] 530.3 ns 471.9 ns +12.36%

Comparing ad/gpu-dyn-slice (cab2c97) with develop (aff52ad)

Open in CodSpeed

Footnotes

  1. 1466 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
@0ax1 0ax1 enabled auto-merge (squash) March 10, 2026 15:05
@0ax1 0ax1 disabled auto-merge March 10, 2026 15:10
@0ax1 0ax1 enabled auto-merge (squash) March 10, 2026 15:10
@0ax1 0ax1 merged commit 1bb7e68 into develop Mar 10, 2026
52 checks passed
@0ax1 0ax1 deleted the ad/gpu-dyn-slice branch March 10, 2026 15:41
@a10y
Copy link
Contributor

a10y commented Mar 10, 2026

Why adding a new method vs fixing cuda_device_ptr()? We use cuda_device_ptr in a number of places where we probably mean offset_ptr()

@a10y
Copy link
Contributor

a10y commented Mar 10, 2026

Oh I see, cuda_device_ptr becomes an alias for offset_ptr now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants