Skip to content

Handle ServerNotAvailableException in CollectLinuxCommand process probing#5705

Open
Copilot wants to merge 11 commits intomainfrom
copilot/fix-diagnostics-client-exception
Open

Handle ServerNotAvailableException in CollectLinuxCommand process probing#5705
Copilot wants to merge 11 commits intomainfrom
copilot/fix-diagnostics-client-exception

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 5, 2026

Summary

Handle exceptions in CollectLinuxCommand process probing to gracefully handle processes that cannot be resolved or connected to.

Builds on #5778 (Juan's simple catch-and-skip fix) by adding structured error reporting and covering an additional failure mode (IOException from mid-response disconnection).

Problem

When collect-linux --probe enumerates all published .NET processes, several exceptions can occur between discovery and probing:

  • ServerNotAvailableException — diagnostic socket not found (process exited before connection)
  • IOException ("Connection reset by peer") — process exited mid-IPC response (Fix issue in Collect-linux probing where process may have exited between enumeration and usage. #5778's catch (DiagnosticToolException or DiagnosticsClientException) does not cover this)
  • UnsupportedCommandException — runtime too old to support GetProcessInfo
  • DiagnosticToolException — process exited between GetPublishedProcesses and ResolveProcess

Previously (before #5778), these propagated to the outer catch (Exception), printing a full stack trace to stderr and returning UnknownError. #5778 added a catch-and-skip for DiagnosticToolException and DiagnosticsClientException, but silently drops the processes and doesn't cover IOException.

What this PR adds on top of #5778

  1. Structured probe resultsUserEventsProbeResult enum (Supported, NotSupported, ConnectionFailed) replaces boolean, enabling distinct handling per outcome
  2. "Could not be probed" reporting — instead of silently skipping, processes that fail probing are reported in a dedicated section (console) or as unknown (CSV), so users know these processes existed
  3. IOException catch — covers the "Connection reset by peer" scenario observed in CI that Fix issue in Collect-linux probing where process may have exited between enumeration and usage. #5778 misses
  4. Single-process graceful handling--probe -p PID and collect-linux -p PID now handle connection failures with clean error messages instead of stack traces
  5. UnsupportedCommandException handling — ancient runtimes that don't support GetProcessInfo are reported as NotSupported instead of crashing
  6. Preserved original error messages — argument validation errors (-1 is not a valid process ID, Only one of --name or --process-id, etc.) still propagate with their original specific messages and ArgumentError return code

Behavior

Non-probe mode (dotnet-trace collect-linux -p <pid>):

  • Connection failure: [ERROR] Unable to connect to process '<name> (<pid>)'. The process may have exited, or it doesn't have an accessible .NET diagnostic port.TracingError
  • Argument errors: original messages preserved → ArgumentError

Single-process probe mode (dotnet-trace collect-linux --probe -p <pid>):

  • Connection failure: Could not probe process '<name> (<pid>)'. The process may have exited, or it doesn't have an accessible .NET diagnostic port.Ok
  • Argument errors: original messages preserved → ArgumentError

Multi-process probe mode (dotnet-trace collect-linux --probe):

  • Console output shows ".NET processes that could not be probed" section when applicable
  • CSV output includes unknown value for unprobed processes
  • Processes that exit during name resolution are silently skipped
  • Processes that exit during probing are reported with name in the "could not be probed" section

Copilot AI changed the title [WIP] Investigate DiagnosticsClient.GetProcessInfo exception Handle ServerNotAvailableException in CollectLinuxCommand process probing Feb 5, 2026
Copilot AI requested a review from mdh1418 February 5, 2026 19:03
@mdh1418 mdh1418 force-pushed the copilot/fix-diagnostics-client-exception branch from 8fe26f7 to fe76ee6 Compare February 6, 2026 17:21
…lpers

Add UserEventsProbeResult enum (Supported/NotSupported) to replace boolean return.
Introduce ProbeProcess helper for probing a single process.
Add GetAndProbeAllProcesses helper that enumerates and probes all published processes.
Update callers in CollectLinux and SupportsCollectLinux to use new helpers.
Update BuildProcessSupportCsv to use UserEventsProbeResult enum.
…cess probing

Add ProcessNotFound and ConnectionFailed values to UserEventsProbeResult enum.
Update ProbeProcess to catch DiagnosticToolException (process resolution failed) and
ServerNotAvailableException (diagnostic endpoint not accessible) separately.
Add FormatProcessIdentifier helper for clean display of process ID/name.
Add unknownProcesses/unknownCsv tracking for processes that could not be probed.
Update probe mode output to show 'Processes that could not be probed' section.
Include 'unknown' value in CSV output for unprobed processes.
Update non-probe mode to show distinct errors for each failure type.
Change '.NET process' to 'Process' in messages since arbitrary PIDs may not be .NET.

Fixes #5694
Document that results are categorized as supported, not supported, or unknown.
Clarify that unknown status occurs when diagnostic endpoint is not accessible.
…iled handling

Update test expectations to match new behavior:
- Add FormatProcessNotFoundError and FormatProcessIdentifier helpers
- Update ResolveProcessExceptions test data for ProcessNotFound handling
- Update probe error test cases for process resolution errors
- Tests now expect ReturnCode.TracingError for failures in non-probe mode
- Tests expect ReturnCode.Ok for probe mode with informational output
@mdh1418 mdh1418 force-pushed the copilot/fix-diagnostics-client-exception branch from fe76ee6 to 0cbcf44 Compare February 6, 2026 19:54
@mdh1418 mdh1418 marked this pull request as ready for review February 6, 2026 20:02
@mdh1418 mdh1418 requested a review from a team as a code owner February 6, 2026 20:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves dotnet-trace collect-linux resilience by handling process-resolution and diagnostics-connection failures during “process probing” so the command no longer crashes when a target process can’t be resolved or connected to (e.g., exits between enumeration and probing, cross-container endpoint issues).

Changes:

  • Replaced boolean “supports” probing with a 4-state probe result (Supported/NotSupported/ProcessNotFound/ConnectionFailed) and updated user-facing output.
  • Updated machine-wide probe to track and report “unknown/unprobed” processes and emit unknown in CSV.
  • Adjusted functional tests to match new probe behaviors/messages (partially—some existing expectations still appear outdated).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
src/Tools/dotnet-trace/CommandLine/Commands/CollectLinuxCommand.cs Introduces multi-state probing, catches DiagnosticToolException/ServerNotAvailableException, updates probe messaging and CSV output.
src/tests/dotnet-trace/CollectLinuxCommandFunctionalTests.cs Updates/extends tests for new probe outcomes and adds helpers for the new process identifier/message formatting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mdh1418 and others added 3 commits March 6, 2026 18:38
…codes

Per review feedback, single-process paths (explicit -p PID or -n NAME)
now call CommandUtils.ResolveProcess separately so argument validation
errors propagate with original specific messages and ArgumentError
return code. ProbeProcess is only used for the resolved PID's runtime
check and connection attempt.

Restore '.NET process(es)' wording in probe output messages.
Remove unused FormatProcessIdentifier helper from both source and tests.
Revert probe error tests to expect ArgumentError with original messages.
Revert ResolveProcessExceptions test data to original error text.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The supportedCsv, unsupportedCsv, and unknownCsv variables are always
non-null when the CsvToConsole and Csv output blocks are reached,
since generateCsv is true for those modes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
mdh1418 and others added 2 commits March 9, 2026 23:43
Update connection failure messages per review: use 'diagnostic port'
instead of 'diagnostic endpoint', and reword to indicate the process
may not have a .NET diagnostic port rather than implying it exists
but is inaccessible.

Skip processes that exit during name resolution silently rather than
reporting them as unknown, per reviewer suggestion that users wouldn't
find it surprising.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mdh1418
Copy link
Copy Markdown
Member

mdh1418 commented Mar 16, 2026

Could I get another review @noahfalk

mdh1418 and others added 2 commits March 25, 2026 22:51
GetProcessInfo can throw IOException (e.g. 'Connection reset by peer')
when a process exits while the IPC response is being read. This is
distinct from ServerNotAvailableException which covers connection
failures. Observed in CI on main branch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mdh1418
Copy link
Copy Markdown
Member

mdh1418 commented Mar 26, 2026

The IOException that one CI test run exhitbed

System.IO.IOException: Unable to read data from the transport connection: Connection reset by peer.
   ---> System.Net.Sockets.SocketException (104): Connection reset by peer
     at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 count)
     --- End of inner exception stack trace ---
     at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 count)
     at System.IO.Stream.Read(Span`1 buffer)
     at System.Net.Sockets.NetworkStream.Read(Span`1 buffer)
     at System.IO.Stream.ReadAtLeastCore(Span`1 buffer, Int32 minimumBytes, Boolean throwOnEndOfStream)
     at System.IO.BinaryReader.ReadBytes(Int32 count)
     at Microsoft.Diagnostics.NETCore.Client.IpcHeader.Parse(BinaryReader reader) in /__w/1/s/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsIpc/IpcHeader.cs:line 55
     at Microsoft.Diagnostics.NETCore.Client.IpcMessage.Parse(Stream stream) in /__w/1/s/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsIpc/IpcMessage.cs:line 117
     at Microsoft.Diagnostics.NETCore.Client.IpcClient.Read(Stream stream) in /__w/1/s/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsIpc/IpcClient.cs:line 107
     at Microsoft.Diagnostics.NETCore.Client.IpcClient.SendMessageGetContinuation(IpcEndpoint endpoint, IpcMessage message) in /__w/1/s/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsIpc/IpcClient.cs:line 44
     at Microsoft.Diagnostics.NETCore.Client.DiagnosticsClient.TryGetProcessInfo3() in /__w/1/s/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsClient/DiagnosticsClient.cs:line 595
     at Microsoft.Diagnostics.NETCore.Client.DiagnosticsClient.GetProcessInfo() in /__w/1/s/src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsClient/DiagnosticsClient.cs:line 539
     at Microsoft.Diagnostics.Tools.Trace.CollectLinuxCommandHandler.ProcessSupportsUserEventsIpcCommand(Int32 pid, String processName, Int32& resolvedPid, String& resolvedName, String& detectedRuntimeVersion) in
  /__w/1/s/src/Tools/dotnet-trace/CommandLine/Commands/CollectLinuxCommand.cs:line 319
     at Microsoft.Diagnostics.Tools.Trace.CollectLinuxCommandHandler.SupportsCollectLinux(CollectLinuxArgs args) in /__w/1/s/src/Tools/dotnet-trace/CommandLine/Commands/CollectLinuxCommand.cs:line 247
     ```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants