zstdgpu: clamp forward bit-buffer over-read on degenerate tiny frames by dannychenmsft · Pull Request #111 · microsoft/DirectStorage

dannychenmsft · 2026-06-15T23:19:54Z

The forward bit-buffer Refill and Skip both (by design) fetch one dword ahead of the bytes actually consumed (see the ZSTDGPU_ASSERT(offset <= (bytesz>>2)-1) just above each fetch and the disabled #if 0 explanatory block). Release builds do not evaluate that assert, so on degenerate/tiny decodecorpus frames the running dword offset can advance one past the last dword of the compressed-input SRV.

Because the compressed-input SRV is bound as a (static) root descriptor with no bounds, that one-dword over-read is an out-of-bounds fetch that the driver faults (observed as a fast DEVICE_REMOVED / page-fault with VA=0 on both AMD/NVIDIA hardware and the WARP software adapter).

Clamp the read index of both fetches to the last valid dword. For all valid, in-bounds positions offset is already <= (bytesz>>2)-1, so the clamp is a no-op and decoded output is byte-identical; it only ever changes the unused tail bits on the degenerate path.

The forward bit-buffer Refill and Skip both fetch one dword *ahead* of the bytes actually consumed (a documented design that relies on the unused tail bits never being read; see the ZSTDGPU_ASSERT(offset <= (bytesz>>2)-1) just above each fetch and the disabled #if 0 explanatory block). Release builds do not evaluate that assert, so on degenerate/tiny decodecorpus frames the running dword offset can advance one past the last dword of the compressed-input SRV. Because the compressed-input SRV is bound as a (static) root descriptor with no bounds, that one-dword over-read is an out-of-bounds fetch that the driver faults (observed as a fast DEVICE_REMOVED / page-fault with VA=0 on both NVIDIA hardware and the WARP software adapter). GPU-based Validation confirms the fix collapses the resulting OOB cascade in the decode path (799 -> 28 reported OOBs). Clamp the *read index* of both fetches to the last valid dword. For all valid, in-bounds positions offset is already <= (bytesz>>2)-1, so the clamp is a no-op and decoded output is byte-identical (T2 Quick corpus: 0 MISMATCH, 0 new DEVICE_REMOVED); it only ever changes the unused tail bits on the degenerate path. Partial hardening: this removes a real, GBV-confirmed OOB over-read but does not by itself clear the tiny-frame device-removal (a separate downstream fault gated by the PrefixBlockSizes cumulative offsets remains under investigation). Pre-existing in origin/development (the forward bit buffer is byte-identical across baseline and the optimization branches). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

This PR adds support for minimal frame/block info constants to make sure all buffers and views are always created irrespectively of whether decompression workload needs all buffers/views or not. This is the case because we switched to GPU-driven submission, therefore all buffers must be present (created) to submit barriers and created correct descriptors. This PR supersedes #112 and #111 and addresses issues described in them.

pm4rtx · 2026-06-17T01:01:00Z

The issue this PR tries to address was fixed by #113

pm4rtx mentioned this pull request Jun 16, 2026

zstdgpu/fix: adds support for minimal values for frame/block info costants #113

Merged

pm4rtx closed this Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zstdgpu: clamp forward bit-buffer over-read on degenerate tiny frames#111

zstdgpu: clamp forward bit-buffer over-read on degenerate tiny frames#111
dannychenmsft wants to merge 1 commit into
microsoft:developmentfrom
dannychenmsft:zstdgpu-fix-bitbuffer-overread

dannychenmsft commented Jun 15, 2026

Uh oh!

pm4rtx commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dannychenmsft commented Jun 15, 2026

Uh oh!

pm4rtx commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants