Skip to content

Vulkan: defer DebugValue/Declare when curScope is NULL#3845

Open
fengshanglantian wants to merge 1 commit into
baldurk:v1.xfrom
fengshanglantian:fix-spirv-debug-null-curscope
Open

Vulkan: defer DebugValue/Declare when curScope is NULL#3845
fengshanglantian wants to merge 1 commit into
baldurk:v1.xfrom
fengshanglantian:fix-spirv-debug-null-curscope

Conversation

@fengshanglantian
Copy link
Copy Markdown

Vulkan: defer DebugValue/Declare when curScope is NULL

Summary

The SPIR-V debugger setup pass has a NULL-deref hazard in
Debugger::RegisterOp when the SPIR-V stream interleaves
NonSemantic.Shader.DebugInfo.100 DebugValue / DebugDeclare
instructions between a block terminator and the next OpDebugScope.
This crashes Debug Pixel against any such shader. Add a one-line
NULL-check that routes the offending instruction to the existing
deferred-mapping fallback path.

Repro

  1. Compile a small HLSL pixel shader with debug info embedded:

    dxc -spirv -fspv-target-env=vulkan1.1 -fspv-debug=vulkan-with-source -Od ...
    

    (any recent dxc — 1.8.2403 reproduces; older versions also possible)

  2. In a Vulkan capture, replace the pixel shader's resource with the
    SPV via ReplayController::ReplaceResource (e.g. through the
    Python Shell), then select the affected drawcall and click
    Debug Pixel.

  3. The replay process crashes:

    signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x8
    #00 rdcspv::ScopeData::HasAncestor   spirv_debug.h:639
    #01 rdcspv::Debugger::RegisterOp     spirv_debug_setup.cpp:4696
    #02 rdcspv::Processor::Parse         spirv_processor.cpp:483
    #03 VulkanReplay::DebugPixel         vk_shaderdebug.cpp:6253
    #04 ReplayProxy::Proxied_DebugPixel  replay_proxy.cpp:1645
    

    The fault address 0x8 is the offset of the parent field within
    ScopeData:

    struct ScopeData {
        DebugScope type;        // offset 0
        ScopeData *parent;      // offset 8       <-- NULL+8
        ...
    };

Root cause

m_DebugInfo.curScope is explicitly set to NULL after every
block-terminator opcode (spirv_debug_setup.cpp:4866 in the same
RegisterOp):

if(leaveScope || it.opcode() == Op::Kill || it.opcode() == Op::Unreachable ||
   it.opcode() == Op::Branch || it.opcode() == Op::BranchConditional ||
   it.opcode() == Op::Switch || it.opcode() == Op::Return || it.opcode() == Op::ReturnValue)
{
    if(m_DebugInfo.curScope)
        m_DebugInfo.curScope->end = it.offs();

    m_DebugInfo.curScope = NULL;
    m_DebugInfo.curInline = NULL;
}

The ShaderDbg::Declare / ShaderDbg::Value case (at
spirv_debug_setup.cpp:4685) does:

const bool insideValidScope = m_DebugInfo.curScope->HasAncestor(varDeclScope);

…with no NULL check, so any DebugValue/DebugDeclare that lands in
that gap dereferences NULL.

Fix

The pre-existing comment on these lines already describes the desired
behaviour for the "no valid scope" case:

bit of a hack - only process declares/values for variables inside a
scope that is within that function. If we see a declare/value in
another function we defer it hoping that we will encounter a
scope later that's valid for it.

…and the existing else branch already implements the deferral via
m_DebugInfo.pendingMappings.push_back(...). So the fix is just to
guard the call:

const bool insideValidScope = m_DebugInfo.curScope &&
                              m_DebugInfo.curScope->HasAncestor(varDeclScope);

NULL curScope now flows into the existing pendingMappings defer
path instead of crashing.

Tested

  • RenderDoc 1.45 Release build, Android arm64 remote replay against a
    Pixel 9 Pro XL (Android 16, Tensor G4 / Mali GPU). Same Vulkan
    capture + replacement SPV that reproduced the crash now reaches
    Break Mode in the Debug Pixel view; F10/F11 step over/into the HLSL
    source.
  • The fix is unrelated to platform / build config; the same code path
    exists on PC. Crash also reproduces on PC Release build with the
    same dxc-built SPV; the fix resolves it there too.

No existing test exercises this code path that I could find — the
debug instructions involved come from external shader compilers and
the project's existing Debugger tests don't drive
NonSemantic.Shader.DebugInfo.100 SPIR-V end-to-end. Happy to add a
small unit test if you can point at a good fixture pattern.

Scope

Single one-line behaviour change with surrounding comment update; no
API change, no behaviour change for the non-NULL case.

curScope is explicitly cleared after every block-terminator opcode
(OpBranch/OpKill/OpReturn/...) further down in RegisterOp(). When the
SPIR-V stream interleaves NonSemantic.Shader.DebugInfo.100 DebugValue
or DebugDeclare in the gap between a block terminator and the next
OpDebugScope, the original code dereferenced NULL curScope -> SIGSEGV
with fault addr 0x8 (the parent field offset within ScopeData).

The else branch below already pushes the mapping into pendingMappings,
which is exactly the deferred path the comment above describes ('we
defer it hoping that we will encounter a scope later that's valid for
it'). Just guard the call.

Reproduced on RenderDoc 1.45 Release (Android arm64) after replacing
a Vulkan SM5 PS shader via ReplaceResource with a SPIR-V compiled by
dxc 1.8.2403 with -fspv-debug=vulkan-with-source, then clicking
Debug Pixel:

  signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x8
  #00 rdcspv::ScopeData::HasAncestor   spirv_debug.h:639
  baldurk#1 rdcspv::Debugger::RegisterOp     spirv_debug_setup.cpp:4696
  baldurk#2 rdcspv::Processor::Parse         spirv_processor.cpp:483
  baldurk#3 VulkanReplay::DebugPixel         vk_shaderdebug.cpp:6253
  baldurk#4 ReplayProxy::Proxied_DebugPixel  replay_proxy.cpp:1645
@fengshanglantian fengshanglantian force-pushed the fix-spirv-debug-null-curscope branch from e68f41a to 5347fd1 Compare June 1, 2026 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant