infra: publish Aspire CLI native AOT symbols (Win + Linux + macOS) to MSDL#17567
infra: publish Aspire CLI native AOT symbols (Win + Linux + macOS) to MSDL#17567radical wants to merge 2 commits into
Conversation
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 17567Or
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 17567" |
cef8102 to
f86182b
Compare
joperezr
left a comment
There was a problem hiding this comment.
Assuming we have done a dry run and this works.
6cc16ec to
b432f9c
Compare
b432f9c to
a9ceac7
Compare
|
Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
|
|
Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
|
4a45433 to
9917ea1
Compare
db2d323 to
7eb23a7
Compare
… MSDL The Aspire CLI ships as a NativeAOT executable but its native debug symbols have never reached MSDL/SymWeb. `dotnet symbol --symbols` against a shipped `aspire` (any platform) returns nothing, so customers (and our own crash triage) cannot symbolicate stack traces from the CLI binary anywhere. Root cause: ILC emits a symbol artifact next to the binary under `artifacts/bin/Aspire.Cli/<config>/net10.0/<rid>/native/` on every NativeAOT build (`aspire.pdb` on Windows, `aspire.dbg` on Linux, `aspire.dSYM/` on macOS — see `Microsoft.NETCore.Native.targets`), but it is then dropped: * `<CopyOutputSymbolsToPublishDirectory>false</CopyOutputSymbolsToPublishDirectory>` in `src/Aspire.Cli/Aspire.Cli.csproj` keeps it out of the publish dir. * The clipack archive (`eng/clipack/Common.projitems`) only stages the binary itself into the per-RID `aspire-cli-<rid>.<zip|tar.gz>`. * `build_sign_native.yml` only publishes `packages/` as the `native_archives_<rid>` pipeline artifact, so the symbol artifact never reaches the downstream `build` stage. * The Windows `build` stage runs `BuildAndTest.yml` with `/p:SkipNativeBuild=true`, so it never produces the symbol artifact locally either. Arcade has two distinct symbol-publishing pipelines, with different shape requirements: * **Windows `.pdb`**: loose-file path via `FilesToPublishToSymbolServer` (Publish.proj `GatherPublishItems`). Arcade's `PrepLoosePdbsForPublish` hard-filters loose files to `.pdb`/`.dll`, so this path is Windows-only. * **Cross-platform `.symbols.nupkg`**: arcade's `_ExistingSymbolPackage` filter routes files ending in `.symbols.nupkg` to the Symbols asset category, which `SymbolUploadHelper.AddPackageToRequest` opens with raw `ZipFile.Open` (not NuGet OPC validation), filters by an extension allowlist `["", ".exe", ".dll", ".pdb", ".so", ".dbg", ".dylib", ".dwarf", ".r2rmap"]`, and indexes with `symbol.exe adddirectory`. The symbol-server key is computed from the file's intrinsic build-id — ELF `.note.gnu.build-id` on Linux, Mach-O `LC_UUID` on macOS — giving SSQP keys `<name>.dbg/elf-buildid-sym-<id>/_.debug` (Linux) and `_.dwarf/mach-uuid-sym-<uuid>/_.dwarf` (macOS) that `dotnet-symbol` resolves on lookup. dotnet/runtime uses this same path for CoreCLR and libraries native symbols: * https://github.com/dotnet/runtime/blob/main/eng/liveBuilds.targets#L122-L141 — CoreCLR shared framework `RuntimeFiles` ItemGroup includes `*.pdb;*.dbg;*.dwarf`. * https://github.com/dotnet/runtime/blob/main/eng/native/functions.cmake#L362-L431 — defaults `dsymutil` to its flat-DWARF output mode for macOS, producing flat `.dwarf` files instead of `.dSYM` bundles. * https://github.com/dotnet/runtime/blob/main/src/installer/pkg/projects/Microsoft.DotNet.ILCompiler/Microsoft.DotNet.ILCompiler.pkgproj#L54-L63 — `LibPackageExcludes` for `.dbg/.dwarf/.dSYM` with the comment *"exclude native symbols from ilc package (they are included in symbols package)"*. dotnet-symbol's macOS lookup uses the Mach-O `LC_UUID` and SSQP key `_.dwarf/mach-uuid-sym-<uuid>/_.dwarf` — see https://github.com/dotnet/symstore/blob/main/src/Microsoft.SymbolStore/KeyGenerators/MachOKeyGenerator.cs#L17-L21,L120-L127 and the protocol doc at https://github.com/dotnet/symstore/blob/main/docs/specs/SSQP_Key_Conventions.md#L120-L132. Fix (covers Windows, Linux, and macOS): * `build_sign_native.yml` * Windows: stage `aspire.pdb` from `bin/<rid>/native/` into a per-RID staging dir and publish as the new `native_symbols_<rid>` pipeline artifact. * Linux: pack `aspire.dbg` into `Aspire.Cli.<rid>.<version>.symbols.nupkg` using a minimal hand-built zip + nuspec. * macOS: extract the inner Mach-O DWARF from `aspire.dSYM/Contents/Resources/DWARF/aspire`, ship it as `aspire.dwarf` in the same `.symbols.nupkg` shape (matches dotnet/runtime's flat-DWARF convention; the file's `LC_UUID` matches the binary's, so dotnet-symbol's Mach-O lookup resolves to it). All three platforms publish under the same `native_symbols_<rid>` artifact name. * `azure-pipelines.yml` and `azure-pipelines-unofficial.yml` Windows `build` job: two parallel downloads — `**/aspire.pdb` into `artifacts/native-symbols/` (consumed by the FilesToPublishToSymbolServer glob below) and `**/Aspire.Cli.*.symbols.nupkg` into `artifacts/native-symbol-pkgs/`. A staging step then copies the symbol packages into `packages/<config>/Shipping` so arcade's manifest generation picks them up as Symbols assets. The existing `**/Aspire.Cli*.nupkg` download is tightened to exclude `.symbols.nupkg` to prevent duplicate downloads. * `eng/Publishing.props`: project-level `FilesToPublishToSymbolServer` glob for the Windows pdbs only (the loose-file path is .pdb/.dll-only). Linux and macOS symbol packages flow through arcade's existing `.symbols.nupkg` routing without needing a separate property. The `.dSYM` directory bundle itself is not separately published. The Apple-native automatic symbolication path (lldb / atos / Instruments via Spotlight UUID indexing) needs the bundle form and is tracked by dotnet/runtime#88286. For server-mediated symbolication via dotnet-symbol — the primary CLI crash-triage workflow — the flat .dwarf we ship is the working format. `AutoGenerateSymbolPackages` stays `false`. That property controls arcade's managed-PDB → `.symbols.nupkg` wrapper for shipping NuGet packages and is independent of the symbol publishing here. Verified end-to-end on internal AzDO build 2985850 for the Windows path: both `Stage native AOT pdb` steps succeed; `Publish native symbols` artifacts contain the pdbs; the Windows `build` job downloads them; arcade's `-publish` emits `Uploading 'PdbArtifacts/Windows/native_symbols_win_<arch>/aspire.pdb'` to the BAR with distinct relative paths per RID. The new `eng/scripts/validate-cli-symbols.ps1` script (separate commit) covers end-to-end round-trip validation locally per RID without uploading anything. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…alidation Companion to the pipeline changes that publish Aspire CLI native AOT symbols (.pdb / .dbg / .dwarf) to MSDL. Validates the entire symbol round-trip locally per-RID against a current build, without uploading anything to MSDL. `docs/ci/cli-native-symbols.md` covers the operating doctrine: when to run the script (file-level triggers), how to triage a failed check, the per-RID baseline for what a clean run looks like, the mapping from each check to its production-pipeline counterpart, and the criteria under which the script can be retired. Script usage / parameters / examples live in the comment-based help (run `Get-Help eng/scripts/validate-cli-symbols.ps1 -Detailed`). Four checks per RID — A→D — each isolating one piece of the pipeline so a failure points at the responsible link instead of the symptom: Check A — build-id symmetry The binary and its symbol artifact must report the same identifier. On Windows the CodeView GUID+Age in the PE debug directory must match the PDB's PDB70 GUID+Age. On Linux the binary's `.note.gnu.build-id` ELF note must match the .dbg's. On macOS the binary's Mach-O `LC_UUID` must match the .dwarf's. If A fails, the pipeline produced a binary/symbol pair that no symbol-server lookup could ever resolve, no matter how the pipeline routes them. Check B — .symbols.nupkg pack/extract Linux + macOS only (Windows uses arcade's loose-file path). Builds the per-RID `Aspire.Cli.<rid>.<version>.symbols.nupkg` the same way `build_sign_native.yml` does — minimal hand-built zip + nuspec — extracts it, and asserts the contained symbol file is byte-identical to the source. Guards against subtle pack-side regressions (zip options, file mode bits, path normalization) that would otherwise only surface when MSDL refused the upload. Check C — dotnet-symbol round-trip via local symstore Places the symbol file at its SSQP key under a temp symstore root, serves the root over a loopback HTTP listener, and runs `dotnet-symbol --symbols <binary> --server-path http://127.0.0.1:<port>`. Asserts dotnet-symbol downloaded a file, the downloaded file is byte-identical to the source, and (for nupkg-shaped artifacts) the SSQP key the script computed actually matches what dotnet-symbol requested. This is the highest-fidelity check of the protocol path end users actually traverse. Check D — symbol resolution against the downloaded file Check C only proves the protocol delivers byte-identical bytes for the right key; a same-sized zero-filled file with the right build-id would still pass C. Check D points the platform's native symbolicator at the file C just downloaded and asks it to resolve the binary's entry-point VA: macOS: atos -o <dwarf> -l <__TEXT vmaddr> <entry-va> → "main (in aspire.dwarf) (main.cpp:228)" Linux: addr2line -e <dbg> -f -C <entry-va> → "_start" Windows: llvm-symbolizer --obj=<exe> <entry-va> (binary + downloaded .pdb staged to the same temp dir so the PE's CodeView path finds the file dotnet-symbol delivered) Fails if the resolver returns empty, "??", or just echoes the address back — all "no debug info found" outputs. Skips with a warning if the platform tool is missing; D never fails the script just because tooling is absent. Local HTTP server uses System.Net.HttpListener inside a Start-ThreadJob. HttpListener ships with the .NET runtime that pwsh 7+ runs on, which removes an external prerequisite (an earlier draft used `python -m http.server` which silently no-ops on a stock Windows box because `python.exe` resolves to the Microsoft Store App Execution Alias stub). The listener is constructed and Start()ed on the main thread; the job only consumes contexts. That split matters for teardown: `HttpListenerContext.GetContext()` is a blocking unmanaged call that Stop-Job / cancellation tokens cannot interrupt — calling `$listener.Close()` from the main thread is the only way to unblock the job so it can terminate. Server readiness is verified by polling with `Invoke-WebRequest` against a sentinel path until either the listener answers (a 404 counts — the listener is up, the path just doesn't exist) or a 6-second timeout elapses. Windows ID extraction and entry-point VA extraction both use System.Reflection.PortableExecutable.PEReader (also shipped with .NET): - **CodeView GUID+Age** (Check C): the ID dotnet-symbol itself computes from the PE to drive its lookup, so it's the only ID Check C strictly needs. - **ImageBase + AddressOfEntryPoint** (Check D): read from the PE optional header to compute the VA fed to llvm-symbolizer. With a non-PEReader fallback, Check D would silently bail on "could not determine binary entry-point VA" whenever the fallback tool is missing — pointing at the wrong root cause and making it look like the script could not analyse the binary at all. `llvm-pdbutil` is still used to read the PDB-side ID for Check A's symmetry comparison (no managed equivalent), and `llvm-readobj` stays as a secondary fallback for both ID and VA paths if PEReader can't read the records for any reason. Net effect on a clean Windows dev box without LLVM installed: A skips with a clear note, B is N/A (Windows uses the loose-pdb path), C runs and passes via PEReader-extracted ID, and D skips with the accurate "no symbolicator available for win-x64" message rather than a misleading VA-extraction skip — exactly parallel to how addr2line / atos absence is reported on Linux/macOS. With LLVM present, all four checks run. Per-platform tooling matrix (skipped checks emit a warning, never an error): Always: pwsh 7+, dotnet-symbol global tool macOS: atos (Xcode CLT) for D Linux: addr2line + readelf (binutils) for A and D Windows: llvm-pdbutil (LLVM) for A; llvm-symbolizer (LLVM) for D — PEReader handles binary-side ID and entry-point VA, so C runs without LLVM and D's skip-vs-pass decision is driven purely by whether llvm-symbolizer is present Verified end-to-end: osx-arm64 → A/B/C/D all pass; atos resolves 0x100058fd8 → main (main.cpp:228) linux-x64 → A/B/C/D all pass; addr2line resolves 0x4e300 → _start win-x64 → with LLVM on PATH: A/C/D PASS; llvm-symbolizer resolves 0x140ee4bac → wmainCRTStartup against the dotnet-symbol-downloaded PDB. → without LLVM: A SKIP, B N/A, C PASS via PEReader, D SKIP with "no symbolicator available for win-x64". Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7eb23a7 to
153b3db
Compare
The Aspire CLI ships as a NativeAOT executable but its native debug symbols never reach MSDL/SymWeb.
dotnet symbol --symbolsagainst a shippedaspire(any platform) returns nothing, so customers and our own crash triage can't symbolicate stack traces from the CLI binary on any of Windows, Linux, or macOS.Root cause
ILC emits a symbol artifact next to the binary at
artifacts/bin/Aspire.Cli/<config>/net10.0/<rid>/native/on every NativeAOT build (aspire.pdbon Windows,aspire.dbgon Linux,aspire.dSYM/on macOS — perMicrosoft.NETCore.Native.targets), but it is then dropped on the floor:<CopyOutputSymbolsToPublishDirectory>false</CopyOutputSymbolsToPublishDirectory>insrc/Aspire.Cli/Aspire.Cli.csprojkeeps it out of the publish dir.eng/clipack/Common.projitems) only stages the binary itself into the per-RIDaspire-cli-<rid>.<zip|tar.gz>.build_sign_native.ymlonly publishespackages/as thenative_archives_<rid>pipeline artifact, so the symbol artifact never reaches the downstreambuildstage.buildstage runsBuildAndTest.ymlwith/p:SkipNativeBuild=true, so it never produces the symbol artifact locally either.End result: arcade's symbol-publishing infrastructure has no file to upload, even though it is fully wired up.
The fix
Arcade has two distinct symbol-publishing pipelines with different shape requirements; we use each for the platforms it supports:
.pdb— loose-file path viaFilesToPublishToSymbolServer(Publish.projGatherPublishItems). Arcade'sPrepLoosePdbsForPublishhard-filters loose files to.pdb/.dll, so this path is Windows-only..dbgand macOS.dwarf— wrapped in a NuGet symbol package and routed via arcade's_ExistingSymbolPackagefilter.SymbolUploadHelper.AddPackageToRequestopens the.symbols.nupkgwith rawZipFile.Open(SymbolUploadHelper.cs#L273), filters entries by an extension allowlist that includes.dbg/.dwarf/.so/.dylib(SymbolUploadHelper.cs#L37), and indexes withsymbol.exe adddirectory. The symbol-server key is computed from the file's intrinsic build-id — ELF.note.gnu.build-idon Linux, Mach-OLC_UUIDon macOS — giving SSQP keys<name>.dbg/elf-buildid-sym-<id>/_.debug(Linux) and_.dwarf/mach-uuid-sym-<uuid>/_.dwarf(macOS) thatdotnet-symbolresolves on lookup.dotnet/runtime uses this exact same path for CoreCLR and libraries native symbols:
liveBuilds.targets#L122-L141— CoreCLR shared frameworkRuntimeFilesItemGroup includes*.pdb;*.dbg;*.dwarf.functions.cmake#L362-L431— defaultsdsymutilto its flat-DWARF output mode for macOS, producing flat.dwarffiles (libcoreclr.dylib.dwarfetc.) instead of.dSYMbundles.Microsoft.DotNet.ILCompiler.pkgproj#L54-L63—LibPackageExcludesfor.dbg/.dwarf/.dSYMwith the comment "exclude native symbols from ilc package (they are included in symbols package)".The macOS lookup side is implemented in
MachOKeyGenerator.cs, with the protocol documented inSSQP_Key_Conventions.md.Three coordinated edits plumb the symbol artifacts from
build_sign_nativeinto thebuildstage's working directory before arcade's-publishruns:eng/pipelines/templates/build_sign_native.yml—aspire.pdbintoartifacts/native-symbols-staging/<rid>/.aspire.dbgintoAspire.Cli.<rid>.<version>.symbols.nupkgusing a minimal hand-built zip + nuspec.aspire.dSYM/Contents/Resources/DWARF/aspire, ship it asaspire.dwarfin the same.symbols.nupkgshape. The file'sLC_UUIDmatches the binary's, sodotnet-symbol's Mach-O lookup resolves to it.native_symbols_<rid>.eng/pipelines/azure-pipelines.ymlandazure-pipelines-unofficial.yml— Windowsbuildjob adds twoDownloadPipelineArtifact@2tasks:**/aspire.pdbintoartifacts/native-symbols/(consumed by the FilesToPublishToSymbolServer glob), and**/Aspire.Cli.*.symbols.nupkgintoartifacts/native-symbol-pkgs/. A subsequent pwsh step copies the symbol packages intopackages/<config>/Shippingso arcade's manifest generation picks them up as Symbols assets. The existing**/Aspire.Cli*.nupkgdownload is tightened to exclude.symbols.nupkgso the same file isn't downloaded twice.eng/Publishing.props— project-levelFilesToPublishToSymbolServerglob for the Windows pdbs only (the loose-file path is.pdb/.dll-only). Linux and macOS symbol packages flow through arcade's existing.symbols.nupkgrouting without needing a separate property.Why this approach (and not the alternatives)
.dSYMdirectory bundle is not separately published. The Apple-native automatic symbolication path (lldb / atos / Instruments via Spotlight UUID indexing) needs the bundle form and is tracked by Distribute macOS symbols as dSYM, not .dwarf dotnet/runtime#88286. For server-mediated symbolication viadotnet-symbol— the primary CLI crash-triage workflow — the flat.dwarfwe ship is the working format that dotnet/runtime itself uses.AutoGenerateSymbolPackagesstaysfalse. That property controls arcade's managed-PDB →.symbols.nupkgwrapper for shipping NuGet packages, independent of the symbol publishing here.<CopyOutputSymbolsToPublishDirectory>false</CopyOutputSymbolsToPublishDirectory>. Globbing directly frombin/<rid>/native/avoids re-triggering the SymStore race that the comment ateng/clipack/Common.projitemswarns about for the managed pdb. Keeps a clean separation between the managed-pdb path (suppressed) and the native-pdb path (this PR)..symbols.nupkgrather than a NuGetPackinvocation. Arcade'sSymbolUploadHelperopens the package with rawZipFile.Open, not NuGet OPC validation, so OPC compliance is unnecessary. A minimal.nuspec+ symbol payload zip is sufficient and avoids adding a NuGetPacktask to the build_sign_native job (and the cross-platform tooling that would require).Surprises and call-outs
build_sign_native→ pipeline artifact →buildjob download for symbols to be on disk when-publishruns. A Publishing.props glob alone could not pick them up — the files aren't on the publishing agent without this plumbing.azure-pipelines-public.ymldoes not runbuild_sign_native. Verified end-to-end on internal AzDO build 2985850 for the Windows path (an earlier revision): both🟣Stage native AOT pdbsteps succeed; the Windowsbuildjob's-publishlog emitsUploading 'PdbArtifacts/Windows/native_symbols_win_<arch>/aspire.pdb'to the BAR with distinct relative paths per RID, keeping the same-named pdbs apart on MSDL viaRelativePDBPathkeying. A fresh AzDO build is in flight to exercise the Linux + macOS paths..dbgis documented inMicrosoft.NETCore.Native.targets(NativeOutputPath = $(OutputPath)native\,NativeSymbolExt = .dbgon Linux,StripSymbols=truedefault on non-Windows so debug info is split into the.dbgsidecar viaobjcopy --only-keep-debug+--add-gnu-debuglink).aspire.dSYM/Contents/Resources/DWARF/aspire) confirmed locally on macOS NativeAOT publish;dwarfdump --uuidshowed the inner file's UUID matches the binary's, which is the contractMachOFileKeyGeneratorrelies on for symbol-server lookup.