[cDAC] Handle VirtualReadException in StackWalk_1.IsManaged()#124844
[cDAC] Handle VirtualReadException in StackWalk_1.IsManaged()#124844max-charlamb wants to merge 1 commit intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag |
There was a problem hiding this comment.
Pull request overview
This PR updates the cDAC stack walker to tolerate unreadable memory during managed-code detection, aligning behavior with the native DAC (treating read failures during code-range lookup as “not managed” so the walk can continue).
Changes:
- Wrap
ExecutionManager.GetCodeBlockHandleinStackWalk_1.IsManaged()with a narrowtry/catch (VirtualReadException). - On
VirtualReadException, returnfalse(unmanaged) and continue the stack walk rather than failing the entire walk.
| } | ||
| catch (VirtualReadException) | ||
| { | ||
| // If we fail to read memory while looking up the code block, the IP is not in managed code |
There was a problem hiding this comment.
If we fail to read memory while looking up the code block, the IP is not in managed code
How can this happen? I do not think we should ever fail to read the memory (unless the data are corrupted or missing - but then we cannot really tell whether the IP is in managed code or not).
I'm skeptical :) As a thought experiment, start with a valid dump, pick an arbitrary RangeSection that points to a managed frame on the stack and delete it from the dump contents. Now we've got a dump that will trigger memory read failure during the stack walk and by construction its masking managed code. Even if we assume there are zero bugs in our dump memory enumeration customers could have created dumps in other ways. I'd put my wager on the opposite conclusion - DAC was masking bugs. This looks like one (the conditional is flipped). runtime/src/coreclr/vm/codeman.h Line 1553 in 0f12574 |
noahfalk
left a comment
There was a problem hiding this comment.
I think we've likely got a memory enumeration bug and ignoring the read exception is only serving to mask it. If its really the case that by-design this memory shouldn't be in the dump it would be helpful to have some analysis of why that is.
Here's a PR to fix that bug: #124862 |
Problem
Three cDAC CI tests are failing on
windows_x64_Release:SOS.DualRuntimesSOS.GCTestsSOS.VarargPInvokeInteropMDThe
SOS.DualRuntimesfailure is caused by an unhandledVirtualReadException(CORDBG_E_READVIRTUAL_FAILURE,0x80131c49) during stack walking. When the stack walker encounters a frame whose instruction pointer is in unmanaged code (e.g.ntdll),IsManaged()callsExecutionManager.GetCodeBlockHandle(), which traverses the multi-levelRangeSectionMap. The map can find aRangeSectionFragmentfor the address range, but the fragment'sRangeSectionpointer may reference memory that is unreadable in the dump, causingVirtualReadExceptionto propagate out of the stack walk.The native DAC handles this scenario because
ExecutionManager::FindCodeRangehas aNOTHROWcontract andSUPPORTS_DAC— access violations during DAC memory reads are caught by SEH (DAC_ENTER/DAC_LEAVE) and translated to aNULLreturn. The cDAC has no equivalent protection, so the exception escapes.Fix
Wrap the
GetCodeBlockHandlecall inStackWalk_1.IsManaged()with atry/catchforVirtualReadException. When a memory read fails during the code block lookup, we treat the IP as not being in managed code (returnfalse) and allow the stack walk to continue. This matches the native DAC's behavior, whereFindCodeRangereturnsNULLon read failures.The catch is intentionally narrow — only
VirtualReadExceptionis caught, not all exceptions. A memory read failure is a definitive "this IP is not resolvable as managed code" answer, not a masked error.Validation
SOS.DualRuntimesfailure using an ad-hoc dump diagnostic tool that loads the CI dump files via ClrMD and drives the cDAC stack walker.