Skip to content

infra: publish Aspire CLI native AOT symbols (Win + Linux + macOS) to MSDL#17567

Draft
radical wants to merge 2 commits into
microsoft:release/13.4from
radical:radical/cli-aot-pdb-publish
Draft

infra: publish Aspire CLI native AOT symbols (Win + Linux + macOS) to MSDL#17567
radical wants to merge 2 commits into
microsoft:release/13.4from
radical:radical/cli-aot-pdb-publish

Conversation

@radical
Copy link
Copy Markdown
Member

@radical radical commented May 28, 2026

The Aspire CLI ships as a NativeAOT executable but its native debug symbols never reach MSDL/SymWeb. dotnet symbol --symbols against a shipped aspire (any platform) returns nothing, so customers and our own crash triage can't symbolicate stack traces from the CLI binary on any of Windows, Linux, or macOS.

$ dotnet symbol --symbols aspire.exe -o ./syms
Downloading from https://msdl.microsoft.com/download/symbols/
ERROR: Not Found

Root cause

ILC emits a symbol artifact next to the binary at artifacts/bin/Aspire.Cli/<config>/net10.0/<rid>/native/ on every NativeAOT build (aspire.pdb on Windows, aspire.dbg on Linux, aspire.dSYM/ on macOS — per Microsoft.NETCore.Native.targets), but it is then dropped on the floor:

  • <CopyOutputSymbolsToPublishDirectory>false</CopyOutputSymbolsToPublishDirectory> in src/Aspire.Cli/Aspire.Cli.csproj keeps it out of the publish dir.
  • The clipack archive (eng/clipack/Common.projitems) only stages the binary itself into the per-RID aspire-cli-<rid>.<zip|tar.gz>.
  • build_sign_native.yml only publishes packages/ as the native_archives_<rid> pipeline artifact, so the symbol artifact never reaches the downstream build stage.
  • The Windows build stage runs BuildAndTest.yml with /p:SkipNativeBuild=true, so it never produces the symbol artifact locally either.

End result: arcade's symbol-publishing infrastructure has no file to upload, even though it is fully wired up.

The fix

Arcade has two distinct symbol-publishing pipelines with different shape requirements; we use each for the platforms it supports:

  • Windows .pdb — loose-file path via FilesToPublishToSymbolServer (Publish.proj GatherPublishItems). Arcade's PrepLoosePdbsForPublish hard-filters loose files to .pdb/.dll, so this path is Windows-only.
  • Linux .dbg and macOS .dwarf — wrapped in a NuGet symbol package and routed via arcade's _ExistingSymbolPackage filter. SymbolUploadHelper.AddPackageToRequest opens the .symbols.nupkg with raw ZipFile.Open (SymbolUploadHelper.cs#L273), filters entries by an extension allowlist that includes .dbg/.dwarf/.so/.dylib (SymbolUploadHelper.cs#L37), and indexes with symbol.exe adddirectory. The symbol-server key is computed from the file's intrinsic build-id — ELF .note.gnu.build-id on Linux, Mach-O LC_UUID on macOS — giving SSQP keys <name>.dbg/elf-buildid-sym-<id>/_.debug (Linux) and _.dwarf/mach-uuid-sym-<uuid>/_.dwarf (macOS) that dotnet-symbol resolves on lookup.

dotnet/runtime uses this exact same path for CoreCLR and libraries native symbols:

The macOS lookup side is implemented in MachOKeyGenerator.cs, with the protocol documented in SSQP_Key_Conventions.md.

Three coordinated edits plumb the symbol artifacts from build_sign_native into the build stage's working directory before arcade's -publish runs:

  • eng/pipelines/templates/build_sign_native.yml
    • Windows agents stage aspire.pdb into artifacts/native-symbols-staging/<rid>/.
    • Linux agents pack aspire.dbg into Aspire.Cli.<rid>.<version>.symbols.nupkg using a minimal hand-built zip + nuspec.
    • macOS agents extract the inner Mach-O DWARF from aspire.dSYM/Contents/Resources/DWARF/aspire, ship it as aspire.dwarf in the same .symbols.nupkg shape. The file's LC_UUID matches the binary's, so dotnet-symbol's Mach-O lookup resolves to it.
    • All three platforms publish under the new per-RID pipeline artifact native_symbols_<rid>.
  • eng/pipelines/azure-pipelines.yml and azure-pipelines-unofficial.yml — Windows build job adds two DownloadPipelineArtifact@2 tasks: **/aspire.pdb into artifacts/native-symbols/ (consumed by the FilesToPublishToSymbolServer glob), and **/Aspire.Cli.*.symbols.nupkg into artifacts/native-symbol-pkgs/. A subsequent pwsh step copies the symbol packages into packages/<config>/Shipping so arcade's manifest generation picks them up as Symbols assets. The existing **/Aspire.Cli*.nupkg download is tightened to exclude .symbols.nupkg so the same file isn't downloaded twice.
  • eng/Publishing.props — project-level FilesToPublishToSymbolServer glob for the Windows pdbs only (the loose-file path is .pdb/.dll-only). Linux and macOS symbol packages flow through arcade's existing .symbols.nupkg routing without needing a separate property.

Why this approach (and not the alternatives)

  • The .dSYM directory bundle is not separately published. The Apple-native automatic symbolication path (lldb / atos / Instruments via Spotlight UUID indexing) needs the bundle form and is tracked by Distribute macOS symbols as dSYM, not .dwarf dotnet/runtime#88286. For server-mediated symbolication via dotnet-symbol — the primary CLI crash-triage workflow — the flat .dwarf we ship is the working format that dotnet/runtime itself uses.
  • AutoGenerateSymbolPackages stays false. That property controls arcade's managed-PDB → .symbols.nupkg wrapper for shipping NuGet packages, independent of the symbol publishing here.
  • Not unsetting <CopyOutputSymbolsToPublishDirectory>false</CopyOutputSymbolsToPublishDirectory>. Globbing directly from bin/<rid>/native/ avoids re-triggering the SymStore race that the comment at eng/clipack/Common.projitems warns about for the managed pdb. Keeps a clean separation between the managed-pdb path (suppressed) and the native-pdb path (this PR).
  • Hand-built .symbols.nupkg rather than a NuGet Pack invocation. Arcade's SymbolUploadHelper opens the package with raw ZipFile.Open, not NuGet OPC validation, so OPC compliance is unnecessary. A minimal .nuspec + symbol payload zip is sufficient and avoids adding a NuGet Pack task to the build_sign_native job (and the cross-platform tooling that would require).

Surprises and call-outs

  • CI structurally required build_sign_native → pipeline artifact → build job download for symbols to be on disk when -publish runs. A Publishing.props glob alone could not pick them up — the files aren't on the publishing agent without this plumbing.
  • This change cannot be PR-validated through GitHub Actions: azure-pipelines-public.yml does not run build_sign_native. Verified end-to-end on internal AzDO build 2985850 for the Windows path (an earlier revision): both 🟣Stage native AOT pdb steps succeed; the Windows build job's -publish log emits Uploading 'PdbArtifacts/Windows/native_symbols_win_<arch>/aspire.pdb' to the BAR with distinct relative paths per RID, keeping the same-named pdbs apart on MSDL via RelativePDBPath keying. A fresh AzDO build is in flight to exercise the Linux + macOS paths.
  • Path shape for the Linux .dbg is documented in Microsoft.NETCore.Native.targets (NativeOutputPath = $(OutputPath)native\, NativeSymbolExt = .dbg on Linux, StripSymbols=true default on non-Windows so debug info is split into the .dbg sidecar via objcopy --only-keep-debug + --add-gnu-debuglink).
  • macOS payload shape (aspire.dSYM/Contents/Resources/DWARF/aspire) confirmed locally on macOS NativeAOT publish; dwarfdump --uuid showed the inner file's UUID matches the binary's, which is the contract MachOFileKeyGenerator relies on for symbol-server lookup.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 17567

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 17567"

Copy link
Copy Markdown
Member

@joperezr joperezr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we have done a dry run and this works.

@radical radical changed the title infra: publish Aspire CLI native AOT pdbs (win-x64, win-arm64) to MSDL infra: publish Aspire CLI native AOT symbols (Windows + Linux) to MSDL May 28, 2026
@radical radical force-pushed the radical/cli-aot-pdb-publish branch 2 times, most recently from 6cc16ec to b432f9c Compare May 28, 2026 20:16
@radical radical changed the title infra: publish Aspire CLI native AOT symbols (Windows + Linux) to MSDL infra: publish Aspire CLI native AOT symbols (Win + Linux + macOS) to MSDL May 28, 2026
@radical radical force-pushed the radical/cli-aot-pdb-publish branch from b432f9c to a9ceac7 Compare May 28, 2026 21:43
@github-actions
Copy link
Copy Markdown
Contributor

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

@github-actions
Copy link
Copy Markdown
Contributor

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

@radical radical force-pushed the radical/cli-aot-pdb-publish branch from 4a45433 to 9917ea1 Compare May 29, 2026 01:31
@davidfowl davidfowl added this to the 13.4 milestone May 29, 2026
@radical radical force-pushed the radical/cli-aot-pdb-publish branch 3 times, most recently from db2d323 to 7eb23a7 Compare May 29, 2026 20:46
radical and others added 2 commits May 29, 2026 17:49
… MSDL

The Aspire CLI ships as a NativeAOT executable but its native debug symbols
have never reached MSDL/SymWeb. `dotnet symbol --symbols` against a shipped
`aspire` (any platform) returns nothing, so customers (and our own crash
triage) cannot symbolicate stack traces from the CLI binary anywhere.

Root cause: ILC emits a symbol artifact next to the binary under
`artifacts/bin/Aspire.Cli/<config>/net10.0/<rid>/native/` on every NativeAOT
build (`aspire.pdb` on Windows, `aspire.dbg` on Linux, `aspire.dSYM/` on
macOS — see `Microsoft.NETCore.Native.targets`), but it is then dropped:

* `<CopyOutputSymbolsToPublishDirectory>false</CopyOutputSymbolsToPublishDirectory>`
  in `src/Aspire.Cli/Aspire.Cli.csproj` keeps it out of the publish dir.
* The clipack archive (`eng/clipack/Common.projitems`) only stages the binary
  itself into the per-RID `aspire-cli-<rid>.<zip|tar.gz>`.
* `build_sign_native.yml` only publishes `packages/` as the
  `native_archives_<rid>` pipeline artifact, so the symbol artifact never
  reaches the downstream `build` stage.
* The Windows `build` stage runs `BuildAndTest.yml` with
  `/p:SkipNativeBuild=true`, so it never produces the symbol artifact locally
  either.

Arcade has two distinct symbol-publishing pipelines, with different shape
requirements:

* **Windows `.pdb`**: loose-file path via `FilesToPublishToSymbolServer`
  (Publish.proj `GatherPublishItems`). Arcade's `PrepLoosePdbsForPublish`
  hard-filters loose files to `.pdb`/`.dll`, so this path is Windows-only.
* **Cross-platform `.symbols.nupkg`**: arcade's `_ExistingSymbolPackage`
  filter routes files ending in `.symbols.nupkg` to the Symbols asset
  category, which `SymbolUploadHelper.AddPackageToRequest` opens with raw
  `ZipFile.Open` (not NuGet OPC validation), filters by an extension
  allowlist `["", ".exe", ".dll", ".pdb", ".so", ".dbg", ".dylib", ".dwarf",
  ".r2rmap"]`, and indexes with `symbol.exe adddirectory`. The symbol-server
  key is computed from the file's intrinsic build-id — ELF
  `.note.gnu.build-id` on Linux, Mach-O `LC_UUID` on macOS — giving SSQP
  keys `<name>.dbg/elf-buildid-sym-<id>/_.debug` (Linux) and
  `_.dwarf/mach-uuid-sym-<uuid>/_.dwarf` (macOS) that `dotnet-symbol`
  resolves on lookup.

dotnet/runtime uses this same path for CoreCLR and libraries native symbols:

* https://github.com/dotnet/runtime/blob/main/eng/liveBuilds.targets#L122-L141
  — CoreCLR shared framework `RuntimeFiles` ItemGroup includes
  `*.pdb;*.dbg;*.dwarf`.
* https://github.com/dotnet/runtime/blob/main/eng/native/functions.cmake#L362-L431
  — defaults `dsymutil` to its flat-DWARF output mode for macOS, producing
  flat `.dwarf` files instead of `.dSYM` bundles.
* https://github.com/dotnet/runtime/blob/main/src/installer/pkg/projects/Microsoft.DotNet.ILCompiler/Microsoft.DotNet.ILCompiler.pkgproj#L54-L63
  — `LibPackageExcludes` for `.dbg/.dwarf/.dSYM` with the comment
  *"exclude native symbols from ilc package (they are included in symbols
  package)"*.

dotnet-symbol's macOS lookup uses the Mach-O `LC_UUID` and SSQP key
`_.dwarf/mach-uuid-sym-<uuid>/_.dwarf` — see
https://github.com/dotnet/symstore/blob/main/src/Microsoft.SymbolStore/KeyGenerators/MachOKeyGenerator.cs#L17-L21,L120-L127
and the protocol doc at
https://github.com/dotnet/symstore/blob/main/docs/specs/SSQP_Key_Conventions.md#L120-L132.

Fix (covers Windows, Linux, and macOS):

* `build_sign_native.yml`
  * Windows: stage `aspire.pdb` from `bin/<rid>/native/` into a per-RID
    staging dir and publish as the new `native_symbols_<rid>` pipeline
    artifact.
  * Linux: pack `aspire.dbg` into `Aspire.Cli.<rid>.<version>.symbols.nupkg`
    using a minimal hand-built zip + nuspec.
  * macOS: extract the inner Mach-O DWARF from
    `aspire.dSYM/Contents/Resources/DWARF/aspire`, ship it as `aspire.dwarf`
    in the same `.symbols.nupkg` shape (matches dotnet/runtime's flat-DWARF
    convention; the file's `LC_UUID` matches the binary's, so dotnet-symbol's
    Mach-O lookup resolves to it).
  All three platforms publish under the same `native_symbols_<rid>` artifact
  name.
* `azure-pipelines.yml` and `azure-pipelines-unofficial.yml` Windows `build`
  job: two parallel downloads — `**/aspire.pdb` into `artifacts/native-symbols/`
  (consumed by the FilesToPublishToSymbolServer glob below) and
  `**/Aspire.Cli.*.symbols.nupkg` into `artifacts/native-symbol-pkgs/`. A
  staging step then copies the symbol packages into `packages/<config>/Shipping`
  so arcade's manifest generation picks them up as Symbols assets. The
  existing `**/Aspire.Cli*.nupkg` download is tightened to exclude
  `.symbols.nupkg` to prevent duplicate downloads.
* `eng/Publishing.props`: project-level `FilesToPublishToSymbolServer` glob
  for the Windows pdbs only (the loose-file path is .pdb/.dll-only). Linux
  and macOS symbol packages flow through arcade's existing `.symbols.nupkg`
  routing without needing a separate property.

The `.dSYM` directory bundle itself is not separately published. The
Apple-native automatic symbolication path (lldb / atos / Instruments via
Spotlight UUID indexing) needs the bundle form and is tracked by
dotnet/runtime#88286. For server-mediated symbolication via dotnet-symbol —
the primary CLI crash-triage workflow — the flat .dwarf we ship is the
working format.

`AutoGenerateSymbolPackages` stays `false`. That property controls arcade's
managed-PDB → `.symbols.nupkg` wrapper for shipping NuGet packages and is
independent of the symbol publishing here.

Verified end-to-end on internal AzDO build 2985850 for the Windows path:
both `Stage native AOT pdb` steps succeed; `Publish native symbols` artifacts
contain the pdbs; the Windows `build` job downloads them; arcade's `-publish`
emits `Uploading 'PdbArtifacts/Windows/native_symbols_win_<arch>/aspire.pdb'`
to the BAR with distinct relative paths per RID. The new
`eng/scripts/validate-cli-symbols.ps1` script (separate commit) covers
end-to-end round-trip validation locally per RID without uploading anything.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…alidation

Companion to the pipeline changes that publish Aspire CLI native AOT
symbols (.pdb / .dbg / .dwarf) to MSDL. Validates the entire symbol
round-trip locally per-RID against a current build, without uploading
anything to MSDL.

`docs/ci/cli-native-symbols.md` covers the operating doctrine: when
to run the script (file-level triggers), how to triage a failed
check, the per-RID baseline for what a clean run looks like, the
mapping from each check to its production-pipeline counterpart, and
the criteria under which the script can be retired. Script usage /
parameters / examples live in the comment-based help (run
`Get-Help eng/scripts/validate-cli-symbols.ps1 -Detailed`).

Four checks per RID — A→D — each isolating one piece of the pipeline so
a failure points at the responsible link instead of the symptom:

Check A — build-id symmetry
  The binary and its symbol artifact must report the same identifier.
  On Windows the CodeView GUID+Age in the PE debug directory must match
  the PDB's PDB70 GUID+Age. On Linux the binary's
  `.note.gnu.build-id` ELF note must match the .dbg's. On macOS the
  binary's Mach-O `LC_UUID` must match the .dwarf's. If A fails, the
  pipeline produced a binary/symbol pair that no symbol-server lookup
  could ever resolve, no matter how the pipeline routes them.

Check B — .symbols.nupkg pack/extract
  Linux + macOS only (Windows uses arcade's loose-file path). Builds
  the per-RID `Aspire.Cli.<rid>.<version>.symbols.nupkg` the same way
  `build_sign_native.yml` does — minimal hand-built zip + nuspec —
  extracts it, and asserts the contained symbol file is byte-identical
  to the source. Guards against subtle pack-side regressions (zip
  options, file mode bits, path normalization) that would otherwise
  only surface when MSDL refused the upload.

Check C — dotnet-symbol round-trip via local symstore
  Places the symbol file at its SSQP key under a temp symstore root,
  serves the root over a loopback HTTP listener, and runs
  `dotnet-symbol --symbols <binary> --server-path http://127.0.0.1:<port>`.
  Asserts dotnet-symbol downloaded a file, the downloaded file is
  byte-identical to the source, and (for nupkg-shaped artifacts) the
  SSQP key the script computed actually matches what dotnet-symbol
  requested. This is the highest-fidelity check of the protocol path
  end users actually traverse.

Check D — symbol resolution against the downloaded file
  Check C only proves the protocol delivers byte-identical bytes for
  the right key; a same-sized zero-filled file with the right build-id
  would still pass C. Check D points the platform's native
  symbolicator at the file C just downloaded and asks it to resolve
  the binary's entry-point VA:

    macOS:   atos -o <dwarf> -l <__TEXT vmaddr> <entry-va>
             → "main (in aspire.dwarf) (main.cpp:228)"

    Linux:   addr2line -e <dbg> -f -C <entry-va>
             → "_start"

    Windows: llvm-symbolizer --obj=<exe> <entry-va>
             (binary + downloaded .pdb staged to the same temp dir so
              the PE's CodeView path finds the file dotnet-symbol
              delivered)

  Fails if the resolver returns empty, "??", or just echoes the
  address back — all "no debug info found" outputs. Skips with a
  warning if the platform tool is missing; D never fails the script
  just because tooling is absent.

Local HTTP server uses System.Net.HttpListener inside a Start-ThreadJob.
HttpListener ships with the .NET runtime that pwsh 7+ runs on, which
removes an external prerequisite (an earlier draft used
`python -m http.server` which silently no-ops on a stock Windows box
because `python.exe` resolves to the Microsoft Store App Execution
Alias stub). The listener is constructed and Start()ed on the main
thread; the job only consumes contexts. That split matters for
teardown: `HttpListenerContext.GetContext()` is a blocking unmanaged
call that Stop-Job / cancellation tokens cannot interrupt — calling
`$listener.Close()` from the main thread is the only way to unblock
the job so it can terminate. Server readiness is verified by polling
with `Invoke-WebRequest` against a sentinel path until either the
listener answers (a 404 counts — the listener is up, the path just
doesn't exist) or a 6-second timeout elapses.

Windows ID extraction and entry-point VA extraction both use
System.Reflection.PortableExecutable.PEReader (also shipped with .NET):

  - **CodeView GUID+Age** (Check C): the ID dotnet-symbol itself
    computes from the PE to drive its lookup, so it's the only ID
    Check C strictly needs.
  - **ImageBase + AddressOfEntryPoint** (Check D): read from the PE
    optional header to compute the VA fed to llvm-symbolizer. With
    a non-PEReader fallback, Check D would silently bail on "could
    not determine binary entry-point VA" whenever the fallback tool
    is missing — pointing at the wrong root cause and making it look
    like the script could not analyse the binary at all.

`llvm-pdbutil` is still used to read the PDB-side ID for Check A's
symmetry comparison (no managed equivalent), and `llvm-readobj` stays
as a secondary fallback for both ID and VA paths if PEReader can't
read the records for any reason.

Net effect on a clean Windows dev box without LLVM installed: A skips
with a clear note, B is N/A (Windows uses the loose-pdb path), C runs
and passes via PEReader-extracted ID, and D skips with the accurate
"no symbolicator available for win-x64" message rather than a
misleading VA-extraction skip — exactly parallel to how addr2line /
atos absence is reported on Linux/macOS. With LLVM present, all four
checks run.

Per-platform tooling matrix (skipped checks emit a warning, never an
error):

  Always:    pwsh 7+, dotnet-symbol global tool
  macOS:     atos (Xcode CLT) for D
  Linux:     addr2line + readelf (binutils) for A and D
  Windows:   llvm-pdbutil (LLVM) for A; llvm-symbolizer (LLVM) for D
             — PEReader handles binary-side ID and entry-point VA,
             so C runs without LLVM and D's skip-vs-pass decision is
             driven purely by whether llvm-symbolizer is present

Verified end-to-end:

  osx-arm64 → A/B/C/D all pass; atos resolves
              0x100058fd8 → main (main.cpp:228)
  linux-x64 → A/B/C/D all pass; addr2line resolves
              0x4e300 → _start
  win-x64   → with LLVM on PATH: A/C/D PASS; llvm-symbolizer resolves
              0x140ee4bac → wmainCRTStartup against the
              dotnet-symbol-downloaded PDB.
            → without LLVM: A SKIP, B N/A, C PASS via PEReader, D
              SKIP with "no symbolicator available for win-x64".

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@radical radical force-pushed the radical/cli-aot-pdb-publish branch from 7eb23a7 to 153b3db Compare May 29, 2026 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants