Use a dedicated registry key to tell WSL processes where to capture c…#14244
Use a dedicated registry key to tell WSL processes where to capture c…#14244OneBlue wants to merge 16 commits intofeature/wsl-for-appsfrom
Conversation
…into user/oneblue/dumps
|
"to capture c..." I AM IN SUSPENSE? |
Fixed lol |
There was a problem hiding this comment.
Pull request overview
Adds centralized crash dump collection for WSL Windows processes by registering a vectored exception handler and using a dedicated LXSS registry value to specify the dump output folder.
Changes:
- Introduces
ConfigureCrashHandler()and calls it from multiple WSL Windows entrypoints/services. - Implements minidump generation via
MiniDumpWriteDumpwhen aCrashDumpFolderregistry value is present. - Updates Windows test infra to optionally set the registry value and enables collection in CloudTest.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| test/windows/Common.cpp | Adds a runtime parameter to toggle crash dump collection by writing the dump folder registry value. |
| test/windows/CMakeLists.txt | Removes direct Dbghelp.lib linkage from test target (now centralized). |
| src/windows/wslrelay/main.cpp | Enables crash handler in wslrelay. |
| src/windows/wslinstaller/exe/ServiceMain.cpp | Enables crash handler in installer service. |
| src/windows/wslhost/main.cpp | Enables crash handler in wslhost. |
| src/windows/wslc/core/Main.cpp | Enables crash handler in wslc. |
| src/windows/wslasession/main.cpp | Enables crash handler in wslasession. |
| src/windows/wslasession/CMakeLists.txt | Adds Dbghelp.lib linkage for wslasession. |
| src/windows/wslaservice/exe/ServiceMain.cpp | Enables crash handler in wslaservice. |
| src/windows/wsladiag/wsladiag.cpp | Enables crash handler in wsladiag. |
| src/windows/service/exe/ServiceMain.cpp | Enables crash handler in main service. |
| src/windows/common/wslutil.h | Adds registry value name constant and declares ConfigureCrashHandler(). |
| src/windows/common/wslutil.cpp | Implements vectored exception handler that writes minidumps based on registry configuration. |
| cloudtest/TestGroup.xml.in | Enables crash dump collection in CloudTest runs via runtime parameter. |
| CMakeLists.txt | Adds Dbghelp.lib to common Windows link libraries. |
| static LONG WINAPI OnException(_EXCEPTION_POINTERS* exception) | ||
| { | ||
| static bool handlingException = false; | ||
| if (handlingException) | ||
| { | ||
| return EXCEPTION_CONTINUE_SEARCH; // Don't keep trying if we crash during exception handling. | ||
| } | ||
|
|
||
| handlingException = true; |
There was a problem hiding this comment.
handlingException is a non-atomic static that can be read/written concurrently if multiple threads fault around the same time, which is a data race (and undefined behavior). Use an interlocked/atomic guard (e.g., std::atomic<bool> with exchange(true) or InterlockedCompareExchange) to ensure only one thread enters the handler.
| if (handlingException) | ||
| { | ||
| return EXCEPTION_CONTINUE_SEARCH; // Don't keep trying if we crash during exception handling. | ||
| } | ||
|
|
||
| handlingException = true; | ||
|
|
||
| // Collect a crash dump if enabled. | ||
| auto image = std::filesystem::path(wil::GetModuleFileNameW<std::wstring>()).filename(); | ||
|
|
||
| auto lxssKey = wsl::windows::common::registry::OpenLxssMachineKey(KEY_READ); | ||
| auto crashFolder = wsl::windows::common::registry::ReadOptionalString(lxssKey.get(), nullptr, c_crashFolderKeyName); | ||
|
|
||
| std::optional<std::filesystem::path> dumpPath; | ||
| if (crashFolder.has_value()) | ||
| { | ||
| dumpPath = std::filesystem::path(crashFolder.value()) / std::format(L"{}.{}.dmp", image.native(), GetCurrentProcessId()); | ||
| } | ||
|
|
||
| WSL_LOG( | ||
| "ProcessCrash", | ||
| TraceLoggingValue(image.c_str(), "Process"), | ||
| TraceLoggingValue(dumpPath.has_value() ? dumpPath->native().c_str() : L"<none>", "DumpPath")); | ||
|
|
||
| if (!dumpPath.has_value()) | ||
| { | ||
| return EXCEPTION_CONTINUE_SEARCH; | ||
| } | ||
|
|
||
| try | ||
| { | ||
| auto dumpFile = wil::create_new_file(dumpPath->c_str(), GENERIC_WRITE, FILE_SHARE_READ); | ||
| THROW_LAST_ERROR_IF(!dumpFile); | ||
|
|
||
| MINIDUMP_EXCEPTION_INFORMATION exceptionInfo{}; | ||
| exceptionInfo.ThreadId = GetCurrentThreadId(); | ||
| exceptionInfo.ExceptionPointers = exception; | ||
|
|
||
| THROW_IF_WIN32_BOOL_FALSE(MiniDumpWriteDump( | ||
| GetCurrentProcess(), | ||
| GetCurrentProcessId(), | ||
| dumpFile.get(), | ||
| MiniDumpWithDataSegs | MiniDumpWithFullMemory | MiniDumpWithProcessThreadData | MiniDumpWithHandleData | | ||
| MiniDumpWithPrivateReadWriteMemory | MiniDumpWithUnloadedModules | MiniDumpWithFullMemoryInfo | | ||
| MiniDumpWithThreadInfo | MiniDumpWithTokenInformation | MiniDumpWithPrivateWriteCopyMemory | MiniDumpWithCodeSegs, | ||
| &exceptionInfo, | ||
| nullptr, | ||
| nullptr)); | ||
| } | ||
| CATCH_LOG(); | ||
|
|
||
| return EXCEPTION_CONTINUE_SEARCH; |
There was a problem hiding this comment.
The exception handler performs multiple potentially-throwing operations (filesystem ops, registry access, std::format, allocations) before the try { ... } CATCH_LOG(); block. If any of these throws while handling a crash, it can destabilize crash processing (including recursive faults). Wrap the entire handler body in a broad try/catch(...) and ensure the handler is effectively noexcept in practice (fail closed by returning EXCEPTION_CONTINUE_SEARCH).
| if (handlingException) | |
| { | |
| return EXCEPTION_CONTINUE_SEARCH; // Don't keep trying if we crash during exception handling. | |
| } | |
| handlingException = true; | |
| // Collect a crash dump if enabled. | |
| auto image = std::filesystem::path(wil::GetModuleFileNameW<std::wstring>()).filename(); | |
| auto lxssKey = wsl::windows::common::registry::OpenLxssMachineKey(KEY_READ); | |
| auto crashFolder = wsl::windows::common::registry::ReadOptionalString(lxssKey.get(), nullptr, c_crashFolderKeyName); | |
| std::optional<std::filesystem::path> dumpPath; | |
| if (crashFolder.has_value()) | |
| { | |
| dumpPath = std::filesystem::path(crashFolder.value()) / std::format(L"{}.{}.dmp", image.native(), GetCurrentProcessId()); | |
| } | |
| WSL_LOG( | |
| "ProcessCrash", | |
| TraceLoggingValue(image.c_str(), "Process"), | |
| TraceLoggingValue(dumpPath.has_value() ? dumpPath->native().c_str() : L"<none>", "DumpPath")); | |
| if (!dumpPath.has_value()) | |
| { | |
| return EXCEPTION_CONTINUE_SEARCH; | |
| } | |
| try | |
| { | |
| auto dumpFile = wil::create_new_file(dumpPath->c_str(), GENERIC_WRITE, FILE_SHARE_READ); | |
| THROW_LAST_ERROR_IF(!dumpFile); | |
| MINIDUMP_EXCEPTION_INFORMATION exceptionInfo{}; | |
| exceptionInfo.ThreadId = GetCurrentThreadId(); | |
| exceptionInfo.ExceptionPointers = exception; | |
| THROW_IF_WIN32_BOOL_FALSE(MiniDumpWriteDump( | |
| GetCurrentProcess(), | |
| GetCurrentProcessId(), | |
| dumpFile.get(), | |
| MiniDumpWithDataSegs | MiniDumpWithFullMemory | MiniDumpWithProcessThreadData | MiniDumpWithHandleData | | |
| MiniDumpWithPrivateReadWriteMemory | MiniDumpWithUnloadedModules | MiniDumpWithFullMemoryInfo | | |
| MiniDumpWithThreadInfo | MiniDumpWithTokenInformation | MiniDumpWithPrivateWriteCopyMemory | MiniDumpWithCodeSegs, | |
| &exceptionInfo, | |
| nullptr, | |
| nullptr)); | |
| } | |
| CATCH_LOG(); | |
| return EXCEPTION_CONTINUE_SEARCH; | |
| try | |
| { | |
| if (handlingException) | |
| { | |
| return EXCEPTION_CONTINUE_SEARCH; // Don't keep trying if we crash during exception handling. | |
| } | |
| handlingException = true; | |
| // Collect a crash dump if enabled. | |
| auto image = std::filesystem::path(wil::GetModuleFileNameW<std::wstring>()).filename(); | |
| auto lxssKey = wsl::windows::common::registry::OpenLxssMachineKey(KEY_READ); | |
| auto crashFolder = | |
| wsl::windows::common::registry::ReadOptionalString(lxssKey.get(), nullptr, c_crashFolderKeyName); | |
| std::optional<std::filesystem::path> dumpPath; | |
| if (crashFolder.has_value()) | |
| { | |
| dumpPath = std::filesystem::path(crashFolder.value()) / | |
| std::format(L"{}.{}.dmp", image.native(), GetCurrentProcessId()); | |
| } | |
| WSL_LOG( | |
| "ProcessCrash", | |
| TraceLoggingValue(image.c_str(), "Process"), | |
| TraceLoggingValue(dumpPath.has_value() ? dumpPath->native().c_str() : L"<none>", "DumpPath")); | |
| if (!dumpPath.has_value()) | |
| { | |
| return EXCEPTION_CONTINUE_SEARCH; | |
| } | |
| try | |
| { | |
| auto dumpFile = wil::create_new_file(dumpPath->c_str(), GENERIC_WRITE, FILE_SHARE_READ); | |
| THROW_LAST_ERROR_IF(!dumpFile); | |
| MINIDUMP_EXCEPTION_INFORMATION exceptionInfo{}; | |
| exceptionInfo.ThreadId = GetCurrentThreadId(); | |
| exceptionInfo.ExceptionPointers = exception; | |
| THROW_IF_WIN32_BOOL_FALSE(MiniDumpWriteDump( | |
| GetCurrentProcess(), | |
| GetCurrentProcessId(), | |
| dumpFile.get(), | |
| MiniDumpWithDataSegs | MiniDumpWithFullMemory | MiniDumpWithProcessThreadData | MiniDumpWithHandleData | | |
| MiniDumpWithPrivateReadWriteMemory | MiniDumpWithUnloadedModules | MiniDumpWithFullMemoryInfo | | |
| MiniDumpWithThreadInfo | MiniDumpWithTokenInformation | MiniDumpWithPrivateWriteCopyMemory | | |
| MiniDumpWithCodeSegs, | |
| &exceptionInfo, | |
| nullptr, | |
| nullptr)); | |
| } | |
| CATCH_LOG(); | |
| return EXCEPTION_CONTINUE_SEARCH; | |
| } | |
| catch (...) | |
| { | |
| // Fail closed: if anything throws while handling the exception, continue the search. | |
| return EXCEPTION_CONTINUE_SEARCH; | |
| } |
| MiniDumpWithDataSegs | MiniDumpWithFullMemory | MiniDumpWithProcessThreadData | MiniDumpWithHandleData | | ||
| MiniDumpWithPrivateReadWriteMemory | MiniDumpWithUnloadedModules | MiniDumpWithFullMemoryInfo | | ||
| MiniDumpWithThreadInfo | MiniDumpWithTokenInformation | MiniDumpWithPrivateWriteCopyMemory | MiniDumpWithCodeSegs, |
There was a problem hiding this comment.
The selected MINIDUMP_TYPE includes MiniDumpWithFullMemory and other high-volume flags, which can produce extremely large dumps and significantly increase I/O time during a crash (especially for long-running services). Consider using a smaller default (e.g., omit full memory) and/or make the dump type configurable (registry/runtime parameter) so CI and production scenarios can choose an appropriate size/diagnostic tradeoff.
| MiniDumpWithDataSegs | MiniDumpWithFullMemory | MiniDumpWithProcessThreadData | MiniDumpWithHandleData | | |
| MiniDumpWithPrivateReadWriteMemory | MiniDumpWithUnloadedModules | MiniDumpWithFullMemoryInfo | | |
| MiniDumpWithThreadInfo | MiniDumpWithTokenInformation | MiniDumpWithPrivateWriteCopyMemory | MiniDumpWithCodeSegs, | |
| MiniDumpWithDataSegs | MiniDumpWithProcessThreadData | MiniDumpWithHandleData | | |
| MiniDumpWithUnloadedModules | MiniDumpWithFullMemoryInfo | MiniDumpWithThreadInfo | | |
| MiniDumpWithTokenInformation | MiniDumpWithCodeSegs, |
There was a problem hiding this comment.
That's by design. This an opt-in setting used mostly in tests
|
@benhillis: When you have a minute I think we should consider merging that change. Without it debugging crashes from the pipeline is essentially impossible without a local repro |
…into user/oneblue/dumps
| static LONG WINAPI OnException(_EXCEPTION_POINTERS* exception) | ||
| { | ||
| try | ||
| { | ||
| static std::atomic<bool> handlingException = false; | ||
| if (handlingException.exchange(true)) | ||
| { | ||
| return EXCEPTION_CONTINUE_SEARCH; // Don't keep trying if we crash during exception handling. | ||
| } | ||
|
|
||
| // Collect a crash dump if enabled. | ||
| auto image = std::filesystem::path(wil::GetModuleFileNameW<std::wstring>()).filename(); | ||
|
|
||
| auto lxssKey = wsl::windows::common::registry::OpenLxssMachineKey(KEY_READ); | ||
| auto crashFolder = wsl::windows::common::registry::ReadOptionalString(lxssKey.get(), nullptr, c_crashFolderKeyName); | ||
|
|
||
| std::optional<std::filesystem::path> dumpPath; | ||
| if (crashFolder.has_value()) | ||
| { | ||
| dumpPath = std::filesystem::path(crashFolder.value()) / std::format(L"{}.{}.dmp", image.native(), GetCurrentProcessId()); | ||
| } |
There was a problem hiding this comment.
AddVectoredExceptionHandler handlers run on first-chance exceptions as well, which means this can attempt to write full crash dumps for handled exceptions (including C++ exceptions, breakpoints, and other first-chance SEH). That can lead to severe performance/disk impact and noisy/incorrect “crash” artifacts. Prefer installing an unhandled-exception-only hook (e.g., SetUnhandledExceptionFilter) for crash dumps, or explicitly filter out known handled/first-chance exception codes (at minimum C++ EH 0xE06D7363, breakpoints, and other non-fatal cases) so dumps are only produced for actual crashes.
| void wsl::windows::common::wslutil::ConfigureCrashHandler() | ||
| { | ||
| AddVectoredExceptionHandler(1, OnException); | ||
| } |
There was a problem hiding this comment.
AddVectoredExceptionHandler returns a handle that should be checked and (ideally) stored so you can avoid double-registering and/or remove it during shutdown if needed. At minimum, guard this with a one-time initialization pattern and validate the return value so failures to install the handler don’t silently disable dump collection.
| { | ||
| return EXCEPTION_CONTINUE_SEARCH; // Don't keep trying if we crash during exception handling. | ||
| } | ||
|
|
There was a problem hiding this comment.
handlingException is never reset back to false. If an exception is handled and execution continues (which is common with vectored handlers / first-chance exceptions), subsequent crashes/exceptions in the same process will skip dump collection entirely. Use an RAII reset (e.g., a scope-exit that resets the flag) or change the approach so the flag only suppresses re-entrancy during the dump-writing window.
| const auto resetHandlingException = wil::scope_exit([&]() noexcept | |
| { | |
| handlingException = false; | |
| }); |
| THROW_IF_WIN32_BOOL_FALSE(MiniDumpWriteDump( | ||
| GetCurrentProcess(), | ||
| GetCurrentProcessId(), | ||
| dumpFile.get(), | ||
| MiniDumpWithDataSegs | MiniDumpWithFullMemory | MiniDumpWithProcessThreadData | MiniDumpWithHandleData | | ||
| MiniDumpWithPrivateReadWriteMemory | MiniDumpWithUnloadedModules | MiniDumpWithFullMemoryInfo | | ||
| MiniDumpWithThreadInfo | MiniDumpWithTokenInformation | MiniDumpWithPrivateWriteCopyMemory | MiniDumpWithCodeSegs, |
There was a problem hiding this comment.
The selected minidump flags include MiniDumpWithFullMemory (and several other high-volume options), which can generate very large dumps and may capture sensitive in-memory data. Consider using a smaller default dump type (or making the dump type configurable via registry/runtime parameter) so production crashes don’t unexpectedly produce multi-GB artifacts or sensitive captures.
| THROW_IF_WIN32_BOOL_FALSE(MiniDumpWriteDump( | |
| GetCurrentProcess(), | |
| GetCurrentProcessId(), | |
| dumpFile.get(), | |
| MiniDumpWithDataSegs | MiniDumpWithFullMemory | MiniDumpWithProcessThreadData | MiniDumpWithHandleData | | |
| MiniDumpWithPrivateReadWriteMemory | MiniDumpWithUnloadedModules | MiniDumpWithFullMemoryInfo | | |
| MiniDumpWithThreadInfo | MiniDumpWithTokenInformation | MiniDumpWithPrivateWriteCopyMemory | MiniDumpWithCodeSegs, | |
| constexpr MINIDUMP_TYPE c_defaultCrashDumpType = static_cast<MINIDUMP_TYPE>( | |
| MiniDumpWithDataSegs | MiniDumpWithProcessThreadData | MiniDumpWithUnloadedModules | MiniDumpWithThreadInfo | | |
| MiniDumpWithCodeSegs); | |
| THROW_IF_WIN32_BOOL_FALSE(MiniDumpWriteDump( | |
| GetCurrentProcess(), | |
| GetCurrentProcessId(), | |
| dumpFile.get(), | |
| c_defaultCrashDumpType, |
| // {2bde4a90-d05f-401c-9492-e40884ead1d8} | ||
| inline constexpr GUID GeneratedProfilesTerminalNamespace = {0x2bde4a90, 0xd05f, 0x401c, {0x94, 0x92, 0xe4, 0x8, 0x84, 0xea, 0xd1, 0xd8}}; | ||
|
|
||
| inline auto c_crashFolderKeyName = L"CrashDumpFolder"; |
There was a problem hiding this comment.
For Win32 registry/value-name constants, an explicit type (e.g., inline constexpr PCWSTR) is clearer than inline auto and prevents accidental type deduction surprises if the initializer changes. Aligning this constant’s type with Win32 APIs will make the call sites and overload resolution more predictable.
| inline auto c_crashFolderKeyName = L"CrashDumpFolder"; | |
| inline constexpr PCWSTR c_crashFolderKeyName = L"CrashDumpFolder"; |
| WEX::TestExecution::RuntimeParameters::TryGetValue(L"WerReport", g_enableWerReport); | ||
| WEX::TestExecution::RuntimeParameters::TryGetValue(L"LogDmesg", g_LogDmesgAfterEachTest); | ||
|
|
||
| bool enableCrashDumpCollection = false; |
There was a problem hiding this comment.
Doesn't TAEF have a way to set registry keys for the duration of testpass? Why do we have to set the regkey ourselves?
I know this can done via the testmd file in OS test definitions, but not sure how it works for this repo.
There was a problem hiding this comment.
Yeah testmd doesn't apply here unfortunately
kvega005
left a comment
There was a problem hiding this comment.
Look's good to me. Just had a couple of questions.
…e reverted)" This reverts commit 3a2f454.
Summary of the Pull Request
PR Checklist
Detailed Description of the Pull Request / Additional comments
Validation Steps Performed