Skip to content

ELF .comment section in shipped native binaries is non-deterministic #128158

@mthalman

Description

@mthalman

Summary

The ELF .comment section in native binaries shipped with .NET embeds compiler/linker identification strings whose contents and ordering differ between two builds of the same source. The drift comes from two sources: (a) the linker concatenating per-object .comment entries in a non-deterministic order, and (b) the compiler/linker version itself changing across point releases of the toolchain image used to build.

Why this matters

A reproducibility validator that wants byte equality for shipped binaries today has only one option: pin the toolchain (clang/LLD) version that produced the original build and use the exact same version for the rebuild. That works, but it's a workaround with a structural failure mode: whenever the .NET runtime's toolchain rotates, there is a synchronization window during which the validator's pinned toolchain doesn't match what new builds use, and validation breaks until it catches up.

Unlike .debug_str (tracked separately as a sibling issue — see G in our investigation), .comment in a stripped binary contributes no information that consumers depend on at runtime, so this section is a pure metadata leak with no consumer-visible cost to changing it.

Background

.comment in an ELF object is a NUL-separated table of identification strings emitted by the compiler and linker. For each input object that LLD links, its .comment entries are merged into the output .comment. The output ordering depends on the order in which LLD encounters unique strings across the input set — which in turn depends on input file order, link-time deduplication strategy, and the LLD version itself.

The section is not guaranteed stable across:

  • compiler / linker upgrades (point releases included);
  • input file ordering changes (parallel link, ar archive ordering, build-system globbing);
  • build-host or build-image changes that bring in different toolchain builds.

Observed behavior

Two builds of the same dotnet/dotnet VMR commit produce two different artifacts. Inspecting libclrjit.so:

Build A (.comment, 82 bytes):
  [0] "GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
  [1] "Linker: LLD 22.1.3"
  [2] "clang version 22.1.3"

Build B (.comment, 82 bytes):
  [0] "Linker: LLD 22.1.4"
  [1] "GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
  [2] "clang version 22.1.4"

Same exact pattern in libcoreclr.so, libhostfxr.so, libnethost.so, and the other ~20 native ELFs. Both ordering and version differ; we cannot isolate "ordering shift" from "version-driven shift" with only these two builds, but the section is provably not byte-stable across the LLD point-release bump that occurred between them.

The same .comment non-determinism also appears inside shipped static archives. libnethost.a (in the Microsoft.NETCore.App.Host.linux-x64 pack) is an ar archive of .o files; every member's .comment (22 bytes, single entry) drifts between builds:

Build A: "clang version 22.1.3"
Build B: "clang version 22.1.4"

Affected files

  • Stripped binaries (.comment): all native ELFs shipped in the SDK (~24 files), including libcoreclr.so, libclrjit.so, libhostfxr.so, libcoreclrtraceptprovider.so, libmscordaccore.so, libmscordbi.so, libnethost.so, libhostpolicy.so, libSystem.*.Native.so, apphost, singlefilehost, dotnet, createdump.
  • Static archive members (.comment): packs/Microsoft.NETCore.App.Host.linux-x64/<ver>/runtimes/linux-x64/native/libnethost.a (and any other shipping .a).

Context

Found while building the SDK reproducibility validation test for the dotnet/dotnet VMR (dotnet/source-build#5486). Resolution of this issue is important to meet the goal of reproducible builds: dotnet/source-build#4963

Related issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions