Skip to content

Report in-flight tests and suggest --blame-crash when testhost crashes#15680

Draft
nohwnd wants to merge 2 commits intomicrosoft:mainfrom
nohwnd:improve-crash-diagnostics-2952
Draft

Report in-flight tests and suggest --blame-crash when testhost crashes#15680
nohwnd wants to merge 2 commits intomicrosoft:mainfrom
nohwnd:improve-crash-diagnostics-2952

Conversation

@nohwnd
Copy link
Copy Markdown
Member

@nohwnd nohwnd commented Apr 15, 2026

Problem

When testhost crashes, users get an unhelpful Test host process crashed message with no indication of which test was running or how to diagnose. This is the most upvoted open issue (#2952, 26 thumbs up).

Solution

Always-on blame lite: lightweight real-time test tracking without requiring --blame.

New protocol messages TestCaseStarting/TestCaseFinished sent unbatched from testhost (~280 bytes per test). Console tracks in-flight tests in InFlightTestTracker with inline-slot optimization.

On crash, users now see:

Test host process crashed : <crash details>
Tests that were executing when the crash occurred:
  MyTest.CrashingMethod (running for 2s)
Consider using --blame-crash to collect a crash dump for further diagnosis.

Testing

3 new integration tests (6 variants), all existing tests pass.

Backward compatibility

Older testhosts: console shows old behavior plus --blame-crash suggestion. Older consoles: unknown messages silently ignored.

Fixes #2952

When testhost crashes, users see an unhelpful 'Test host process crashed'
message with no indication of which test was running or how to diagnose.

This adds lightweight real-time test tracking (always-on blame lite):

- New protocol messages TestCaseStarting/TestCaseFinished sent unbatched
  from testhost to console (~280 bytes per test, negligible overhead)
- Console tracks in-flight tests in ConcurrentDictionary with inline-slot
  optimization (no allocation for the common unique-Guid case)
- On crash, error message now includes running test names with elapsed time
- Always suggests --blame-crash for further diagnosis

Before:
  Test host process crashed

After:
  Test host process crashed : <crash details>
  Tests that were executing when the crash occurred:
    MyTest.CrashingMethod (running for 2s)
  Consider using --blame-crash to collect a crash dump for further diagnosis.

Fixes microsoft#2952

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 15, 2026 12:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds real-time in-flight test tracking so that when the testhost crashes, the console can report which tests were running and suggest --blame-crash for dump collection.

Changes:

  • Introduces TestCaseStarting/TestCaseFinished protocol messages and payload contracts.
  • Wires test-start/test-finish callbacks from testhost execution pipeline to the request handler for unbatched reporting.
  • Updates console-side sender to track in-flight tests and enrich abort/error output; adds/adjusts tests and public API baselines.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
test/Microsoft.TestPlatform.CommunicationUtilities.UnitTests/TestRequestSenderTests.cs Loosens assertion to accommodate augmented crash messaging.
test/Microsoft.TestPlatform.Acceptance.IntegrationTests/ExecutionTests.cs Adds crash-focused integration tests covering new stderr content.
src/Microsoft.TestPlatform.CrossPlatEngine/PublicAPI/PublicAPI.Unshipped.txt Records new unshipped public APIs (currently with questionable type identities).
src/Microsoft.TestPlatform.CrossPlatEngine/Execution/BaseRunTests.cs Threads test start/finish callbacks into FrameworkHandle via request handler.
src/Microsoft.TestPlatform.CrossPlatEngine/EventHandlers/TestRequestHandler.cs Implements sending TestCaseStarting/TestCaseFinished messages from testhost.
src/Microsoft.TestPlatform.CrossPlatEngine/Adapter/TestExecutionRecorder.cs Adds optional callbacks invoked on RecordStart/RecordEnd.
src/Microsoft.TestPlatform.CrossPlatEngine/Adapter/FrameworkHandle.cs Plumbs optional callbacks through to base TestExecutionRecorder.
src/Microsoft.TestPlatform.CommunicationUtilities/TestRequestSender.cs Tracks in-flight tests, appends crash diagnostics and --blame-crash suggestion.
src/Microsoft.TestPlatform.CommunicationUtilities/PublicAPI/PublicAPI.Unshipped.txt Adds new message types, payloads, and interface methods to public API list.
src/Microsoft.TestPlatform.CommunicationUtilities/Messages/TestCaseStartingPayload.cs Adds payload contract for test-start notifications.
src/Microsoft.TestPlatform.CommunicationUtilities/Messages/TestCaseFinishedPayload.cs Adds payload contract for test-finish notifications.
src/Microsoft.TestPlatform.CommunicationUtilities/Messages/MessageType.cs Adds new message type constants for start/finish tracking.
src/Microsoft.TestPlatform.CommunicationUtilities/Interfaces/ITestRequestHandler.cs Extends request handler interface with start/finish send methods.
src/Microsoft.TestPlatform.CommunicationUtilities/InFlightTestTracker.cs Adds in-flight tracker with inline-slot optimization.
src/Microsoft.TestPlatform.CommunicationUtilities/EventHandlers/TestRunEventsHandler.cs Exposes underlying request handler for direct message sending.

Comment on lines +64 to +87
public bool RemoveOldest()
{
// Shift slots down
if (Slot1 is not null)
{
Slot0 = Slot1;
StartTime0 = StartTime1;
Slot1 = Slot2;
StartTime1 = StartTime2;
Slot2 = Slot3;
StartTime2 = StartTime3;
Slot3 = null;
StartTime3 = default;

// Refill Slot3 from overflow if available
if (Overflow is { Count: > 0 })
{
var (payload, startTime) = Overflow.Dequeue();
Slot3 = payload;
StartTime3 = startTime;
}

return true;
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InFlightTest mutates multiple fields (slot shifting + overflow queue) without any synchronization, while InFlightTestTracker is explicitly used in a multi-threaded context (message receive thread updating vs abort thread reading). This can lead to torn reads, inconsistent state, or even NullReferenceException in GetAll() (e.g., a slot becomes null after the is not null check but before dereferencing). Consider adding a lock inside InFlightTest that protects Add, RemoveOldest, and GetAll (or copying all slot references/start times to locals under lock before yielding), and similarly guarding overflow queue access.

Copilot uses AI. Check for mistakes.
{
< 1 => $"{elapsed.Milliseconds}ms",
< 60 => $"{elapsed.TotalSeconds:F0}s",
_ => $"{elapsed.TotalMinutes:F0}m {elapsed.Seconds}s",
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For durations >= 60 seconds, elapsed.TotalMinutes:F0 rounds instead of truncating, which can produce incorrect output (e.g., 1m59s becomes 2m 59s). Use a non-rounding minute value (e.g., integer truncation of TotalMinutes, or elapsed.Hours/elapsed.Minutes composition) to keep minutes consistent with the Seconds remainder.

Suggested change
_ => $"{elapsed.TotalMinutes:F0}m {elapsed.Seconds}s",
_ => $"{(int)elapsed.TotalMinutes}m {elapsed.Seconds}s",

Copilot uses AI. Check for mistakes.
Comment on lines +2 to +3
Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TestRequestHandler.SendTestCaseFinished(Microsoft.VisualStudio.TestPlatform.ObjectModel.TestCase! testCase) -> void
Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TestRequestHandler.SendTestCaseStarting(Microsoft.VisualStudio.TestPlatform.ObjectModel.TestCase! testCase) -> void
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These public API entries appear to reference Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TestRequestHandler, but the implementation in this PR is in src/Microsoft.TestPlatform.CrossPlatEngine/EventHandlers/TestRequestHandler.cs (CrossPlatEngine). If left as-is, the public API baseline will not match the actual exported types and can break API validation tooling. Update the entry to the correct fully-qualified type name (namespace + assembly) that actually contains these members.

Suggested change
Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TestRequestHandler.SendTestCaseFinished(Microsoft.VisualStudio.TestPlatform.ObjectModel.TestCase! testCase) -> void
Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TestRequestHandler.SendTestCaseStarting(Microsoft.VisualStudio.TestPlatform.ObjectModel.TestCase! testCase) -> void
Microsoft.VisualStudio.TestPlatform.CrossPlatEngine.EventHandlers.TestRequestHandler.SendTestCaseFinished(Microsoft.VisualStudio.TestPlatform.ObjectModel.TestCase! testCase) -> void
Microsoft.VisualStudio.TestPlatform.CrossPlatEngine.EventHandlers.TestRequestHandler.SendTestCaseStarting(Microsoft.VisualStudio.TestPlatform.ObjectModel.TestCase! testCase) -> void

Copilot uses AI. Check for mistakes.
RunContext.SolutionDirectory = RunSettingsUtilities.GetSolutionDirectory(runConfig);
_runConfiguration = runConfig;

var requestHandler = (testRunEventsHandler as TestRunEventsHandler)?.RequestHandler;
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relies on downcasting IInternalTestRunEventsHandler to a concrete TestRunEventsHandler and then reaching into an internal property (RequestHandler) from another assembly. This couples CrossPlatEngine to a specific handler implementation and risks build breaks if internal visibility isn’t bridged via InternalsVisibleTo. A more robust approach is to plumb these callbacks through an interface (e.g., add optional start/finish reporting to IInternalTestRunEventsHandler or a dedicated capability interface) so you don’t need the concrete cast or internal member access.

Copilot uses AI. Check for mistakes.
Comment on lines +165 to +167
TestRunEventsHandler,
onTestCaseStarting: requestHandler is not null ? tc => requestHandler.SendTestCaseStarting(tc) : null,
onTestCaseFinished: requestHandler is not null ? tc => requestHandler.SendTestCaseFinished(tc) : null);
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relies on downcasting IInternalTestRunEventsHandler to a concrete TestRunEventsHandler and then reaching into an internal property (RequestHandler) from another assembly. This couples CrossPlatEngine to a specific handler implementation and risks build breaks if internal visibility isn’t bridged via InternalsVisibleTo. A more robust approach is to plumb these callbacks through an interface (e.g., add optional start/finish reporting to IInternalTestRunEventsHandler or a dedicated capability interface) so you don’t need the concrete cast or internal member access.

Copilot uses AI. Check for mistakes.
// separate thread and may still be draining buffered messages.
if (!_inFlightTests.HasInFlightTests)
{
Thread.Sleep(50);
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Thread.Sleep(50) for coordination is non-deterministic and can both (a) still miss buffered messages under load and (b) add avoidable latency in abort/failure paths. Prefer a deterministic signal from the receive loop (e.g., a ManualResetEventSlim/TaskCompletionSource that indicates the channel has drained/closed, waited with a short timeout) or a tracker-level “last update” timestamp and bounded spin/wait that doesn’t unconditionally block a thread.

Suggested change
Thread.Sleep(50);
SpinWait.SpinUntil(() => _inFlightTests.HasInFlightTests, 50);

Copilot uses AI. Check for mistakes.
}

/// <summary>
/// Adds another in-flight execution for the same Guid. Returns false if this should not happen (defensive).
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XML doc says “Returns false…” but the method returns void. This is misleading for future maintainers; update the comment to match the actual behavior (or change the method signature if a return value is intended).

Suggested change
/// Adds another in-flight execution for the same Guid. Returns false if this should not happen (defensive).
/// Adds another in-flight execution for the same Guid, using the overflow queue if the inline slots are full.

Copilot uses AI. Check for mistakes.
@nohwnd nohwnd marked this pull request as draft April 15, 2026 14:08
- Add lock to InFlightTest protecting Add, RemoveOldest, GetAll from
  concurrent access (message-receive thread vs abort thread). Change
  GetAll from yield-based IEnumerable to snapshot-based IReadOnlyList
  so consumers iterate a safe copy outside the lock.
- Fix duration rounding: use (int)elapsed.TotalMinutes instead of
  TotalMinutes:F0 which rounds (1m59s was showing as '2m 59s').
- Replace Thread.Sleep(50) with SpinWait.SpinUntil for non-blocking
  coordination when waiting for in-flight protocol messages.
- Replace concrete cast to TestRunEventsHandler in BaseRunTests with
  ITestCaseLifecycleNotifier interface check. TestRunEventsHandler now
  implements this internal interface, removing coupling to the concrete
  type and the internal RequestHandler property.
- Fix XML doc on InFlightTest.Add ('Returns false' but returns void).

PublicAPI namespace entries are correct (TestRequestHandler uses
CommunicationUtilities namespace despite living in CrossPlatEngine
project) — no change needed there.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"Test host process crashed" error hard to diagnose

2 participants