Report in-flight tests and suggest --blame-crash when testhost crashes#15680
Report in-flight tests and suggest --blame-crash when testhost crashes#15680nohwnd wants to merge 2 commits intomicrosoft:mainfrom
Conversation
When testhost crashes, users see an unhelpful 'Test host process crashed'
message with no indication of which test was running or how to diagnose.
This adds lightweight real-time test tracking (always-on blame lite):
- New protocol messages TestCaseStarting/TestCaseFinished sent unbatched
from testhost to console (~280 bytes per test, negligible overhead)
- Console tracks in-flight tests in ConcurrentDictionary with inline-slot
optimization (no allocation for the common unique-Guid case)
- On crash, error message now includes running test names with elapsed time
- Always suggests --blame-crash for further diagnosis
Before:
Test host process crashed
After:
Test host process crashed : <crash details>
Tests that were executing when the crash occurred:
MyTest.CrashingMethod (running for 2s)
Consider using --blame-crash to collect a crash dump for further diagnosis.
Fixes microsoft#2952
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds real-time in-flight test tracking so that when the testhost crashes, the console can report which tests were running and suggest --blame-crash for dump collection.
Changes:
- Introduces
TestCaseStarting/TestCaseFinishedprotocol messages and payload contracts. - Wires test-start/test-finish callbacks from testhost execution pipeline to the request handler for unbatched reporting.
- Updates console-side sender to track in-flight tests and enrich abort/error output; adds/adjusts tests and public API baselines.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| test/Microsoft.TestPlatform.CommunicationUtilities.UnitTests/TestRequestSenderTests.cs | Loosens assertion to accommodate augmented crash messaging. |
| test/Microsoft.TestPlatform.Acceptance.IntegrationTests/ExecutionTests.cs | Adds crash-focused integration tests covering new stderr content. |
| src/Microsoft.TestPlatform.CrossPlatEngine/PublicAPI/PublicAPI.Unshipped.txt | Records new unshipped public APIs (currently with questionable type identities). |
| src/Microsoft.TestPlatform.CrossPlatEngine/Execution/BaseRunTests.cs | Threads test start/finish callbacks into FrameworkHandle via request handler. |
| src/Microsoft.TestPlatform.CrossPlatEngine/EventHandlers/TestRequestHandler.cs | Implements sending TestCaseStarting/TestCaseFinished messages from testhost. |
| src/Microsoft.TestPlatform.CrossPlatEngine/Adapter/TestExecutionRecorder.cs | Adds optional callbacks invoked on RecordStart/RecordEnd. |
| src/Microsoft.TestPlatform.CrossPlatEngine/Adapter/FrameworkHandle.cs | Plumbs optional callbacks through to base TestExecutionRecorder. |
| src/Microsoft.TestPlatform.CommunicationUtilities/TestRequestSender.cs | Tracks in-flight tests, appends crash diagnostics and --blame-crash suggestion. |
| src/Microsoft.TestPlatform.CommunicationUtilities/PublicAPI/PublicAPI.Unshipped.txt | Adds new message types, payloads, and interface methods to public API list. |
| src/Microsoft.TestPlatform.CommunicationUtilities/Messages/TestCaseStartingPayload.cs | Adds payload contract for test-start notifications. |
| src/Microsoft.TestPlatform.CommunicationUtilities/Messages/TestCaseFinishedPayload.cs | Adds payload contract for test-finish notifications. |
| src/Microsoft.TestPlatform.CommunicationUtilities/Messages/MessageType.cs | Adds new message type constants for start/finish tracking. |
| src/Microsoft.TestPlatform.CommunicationUtilities/Interfaces/ITestRequestHandler.cs | Extends request handler interface with start/finish send methods. |
| src/Microsoft.TestPlatform.CommunicationUtilities/InFlightTestTracker.cs | Adds in-flight tracker with inline-slot optimization. |
| src/Microsoft.TestPlatform.CommunicationUtilities/EventHandlers/TestRunEventsHandler.cs | Exposes underlying request handler for direct message sending. |
| public bool RemoveOldest() | ||
| { | ||
| // Shift slots down | ||
| if (Slot1 is not null) | ||
| { | ||
| Slot0 = Slot1; | ||
| StartTime0 = StartTime1; | ||
| Slot1 = Slot2; | ||
| StartTime1 = StartTime2; | ||
| Slot2 = Slot3; | ||
| StartTime2 = StartTime3; | ||
| Slot3 = null; | ||
| StartTime3 = default; | ||
|
|
||
| // Refill Slot3 from overflow if available | ||
| if (Overflow is { Count: > 0 }) | ||
| { | ||
| var (payload, startTime) = Overflow.Dequeue(); | ||
| Slot3 = payload; | ||
| StartTime3 = startTime; | ||
| } | ||
|
|
||
| return true; | ||
| } |
There was a problem hiding this comment.
InFlightTest mutates multiple fields (slot shifting + overflow queue) without any synchronization, while InFlightTestTracker is explicitly used in a multi-threaded context (message receive thread updating vs abort thread reading). This can lead to torn reads, inconsistent state, or even NullReferenceException in GetAll() (e.g., a slot becomes null after the is not null check but before dereferencing). Consider adding a lock inside InFlightTest that protects Add, RemoveOldest, and GetAll (or copying all slot references/start times to locals under lock before yielding), and similarly guarding overflow queue access.
| { | ||
| < 1 => $"{elapsed.Milliseconds}ms", | ||
| < 60 => $"{elapsed.TotalSeconds:F0}s", | ||
| _ => $"{elapsed.TotalMinutes:F0}m {elapsed.Seconds}s", |
There was a problem hiding this comment.
For durations >= 60 seconds, elapsed.TotalMinutes:F0 rounds instead of truncating, which can produce incorrect output (e.g., 1m59s becomes 2m 59s). Use a non-rounding minute value (e.g., integer truncation of TotalMinutes, or elapsed.Hours/elapsed.Minutes composition) to keep minutes consistent with the Seconds remainder.
| _ => $"{elapsed.TotalMinutes:F0}m {elapsed.Seconds}s", | |
| _ => $"{(int)elapsed.TotalMinutes}m {elapsed.Seconds}s", |
| Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TestRequestHandler.SendTestCaseFinished(Microsoft.VisualStudio.TestPlatform.ObjectModel.TestCase! testCase) -> void | ||
| Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TestRequestHandler.SendTestCaseStarting(Microsoft.VisualStudio.TestPlatform.ObjectModel.TestCase! testCase) -> void |
There was a problem hiding this comment.
These public API entries appear to reference Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TestRequestHandler, but the implementation in this PR is in src/Microsoft.TestPlatform.CrossPlatEngine/EventHandlers/TestRequestHandler.cs (CrossPlatEngine). If left as-is, the public API baseline will not match the actual exported types and can break API validation tooling. Update the entry to the correct fully-qualified type name (namespace + assembly) that actually contains these members.
| Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TestRequestHandler.SendTestCaseFinished(Microsoft.VisualStudio.TestPlatform.ObjectModel.TestCase! testCase) -> void | |
| Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TestRequestHandler.SendTestCaseStarting(Microsoft.VisualStudio.TestPlatform.ObjectModel.TestCase! testCase) -> void | |
| Microsoft.VisualStudio.TestPlatform.CrossPlatEngine.EventHandlers.TestRequestHandler.SendTestCaseFinished(Microsoft.VisualStudio.TestPlatform.ObjectModel.TestCase! testCase) -> void | |
| Microsoft.VisualStudio.TestPlatform.CrossPlatEngine.EventHandlers.TestRequestHandler.SendTestCaseStarting(Microsoft.VisualStudio.TestPlatform.ObjectModel.TestCase! testCase) -> void |
| RunContext.SolutionDirectory = RunSettingsUtilities.GetSolutionDirectory(runConfig); | ||
| _runConfiguration = runConfig; | ||
|
|
||
| var requestHandler = (testRunEventsHandler as TestRunEventsHandler)?.RequestHandler; |
There was a problem hiding this comment.
This relies on downcasting IInternalTestRunEventsHandler to a concrete TestRunEventsHandler and then reaching into an internal property (RequestHandler) from another assembly. This couples CrossPlatEngine to a specific handler implementation and risks build breaks if internal visibility isn’t bridged via InternalsVisibleTo. A more robust approach is to plumb these callbacks through an interface (e.g., add optional start/finish reporting to IInternalTestRunEventsHandler or a dedicated capability interface) so you don’t need the concrete cast or internal member access.
| TestRunEventsHandler, | ||
| onTestCaseStarting: requestHandler is not null ? tc => requestHandler.SendTestCaseStarting(tc) : null, | ||
| onTestCaseFinished: requestHandler is not null ? tc => requestHandler.SendTestCaseFinished(tc) : null); |
There was a problem hiding this comment.
This relies on downcasting IInternalTestRunEventsHandler to a concrete TestRunEventsHandler and then reaching into an internal property (RequestHandler) from another assembly. This couples CrossPlatEngine to a specific handler implementation and risks build breaks if internal visibility isn’t bridged via InternalsVisibleTo. A more robust approach is to plumb these callbacks through an interface (e.g., add optional start/finish reporting to IInternalTestRunEventsHandler or a dedicated capability interface) so you don’t need the concrete cast or internal member access.
| // separate thread and may still be draining buffered messages. | ||
| if (!_inFlightTests.HasInFlightTests) | ||
| { | ||
| Thread.Sleep(50); |
There was a problem hiding this comment.
Using Thread.Sleep(50) for coordination is non-deterministic and can both (a) still miss buffered messages under load and (b) add avoidable latency in abort/failure paths. Prefer a deterministic signal from the receive loop (e.g., a ManualResetEventSlim/TaskCompletionSource that indicates the channel has drained/closed, waited with a short timeout) or a tracker-level “last update” timestamp and bounded spin/wait that doesn’t unconditionally block a thread.
| Thread.Sleep(50); | |
| SpinWait.SpinUntil(() => _inFlightTests.HasInFlightTests, 50); |
| } | ||
|
|
||
| /// <summary> | ||
| /// Adds another in-flight execution for the same Guid. Returns false if this should not happen (defensive). |
There was a problem hiding this comment.
The XML doc says “Returns false…” but the method returns void. This is misleading for future maintainers; update the comment to match the actual behavior (or change the method signature if a return value is intended).
| /// Adds another in-flight execution for the same Guid. Returns false if this should not happen (defensive). | |
| /// Adds another in-flight execution for the same Guid, using the overflow queue if the inline slots are full. |
- Add lock to InFlightTest protecting Add, RemoveOldest, GetAll from
concurrent access (message-receive thread vs abort thread). Change
GetAll from yield-based IEnumerable to snapshot-based IReadOnlyList
so consumers iterate a safe copy outside the lock.
- Fix duration rounding: use (int)elapsed.TotalMinutes instead of
TotalMinutes:F0 which rounds (1m59s was showing as '2m 59s').
- Replace Thread.Sleep(50) with SpinWait.SpinUntil for non-blocking
coordination when waiting for in-flight protocol messages.
- Replace concrete cast to TestRunEventsHandler in BaseRunTests with
ITestCaseLifecycleNotifier interface check. TestRunEventsHandler now
implements this internal interface, removing coupling to the concrete
type and the internal RequestHandler property.
- Fix XML doc on InFlightTest.Add ('Returns false' but returns void).
PublicAPI namespace entries are correct (TestRequestHandler uses
CommunicationUtilities namespace despite living in CrossPlatEngine
project) — no change needed there.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem
When testhost crashes, users get an unhelpful
Test host process crashedmessage with no indication of which test was running or how to diagnose. This is the most upvoted open issue (#2952, 26 thumbs up).Solution
Always-on blame lite: lightweight real-time test tracking without requiring
--blame.New protocol messages
TestCaseStarting/TestCaseFinishedsent unbatched from testhost (~280 bytes per test). Console tracks in-flight tests inInFlightTestTrackerwith inline-slot optimization.On crash, users now see:
Testing
3 new integration tests (6 variants), all existing tests pass.
Backward compatibility
Older testhosts: console shows old behavior plus
--blame-crashsuggestion. Older consoles: unknown messages silently ignored.Fixes #2952