Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ require (
github.com/magefile/mage v1.16.1
github.com/mark3labs/mcp-go v0.45.0
github.com/mattn/go-isatty v0.0.20
github.com/mitchellh/hashstructure/v2 v2.0.2
github.com/muesli/termenv v0.16.0
github.com/nxadm/tail v1.4.11
github.com/opencontainers/selinux v1.13.1
Expand Down
2 changes: 2 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,8 @@ github.com/mattn/go-runewidth v0.0.15/go.mod h1:Jdepj2loyihRzMpdS35Xk/zdY8IAYHsh
github.com/mattn/go-runewidth v0.0.16/go.mod h1:Jdepj2loyihRzMpdS35Xk/zdY8IAYHsh153qUoGf23w=
github.com/mattn/go-runewidth v0.0.21 h1:jJKAZiQH+2mIinzCJIaIG9Be1+0NR+5sz/lYEEjdM8w=
github.com/mattn/go-runewidth v0.0.21/go.mod h1:XBkDxAl56ILZc9knddidhrOlY5R/pDhgLpndooCuJAs=
github.com/mitchellh/hashstructure/v2 v2.0.2 h1:vGKWl0YJqUNxE8d+h8f6NJLcCJrgbhC4NcD46KavDd4=
github.com/mitchellh/hashstructure/v2 v2.0.2/go.mod h1:MG3aRVU/N29oo/V/IhBX8GR/zz4kQkprJgF2EVszyDE=
github.com/moby/docker-image-spec v1.3.1 h1:jMKff3w6PgbfSa69GfNg+zN/XLhfXJGnEx3Nl2EsFP0=
github.com/moby/docker-image-spec v1.3.1/go.mod h1:eKmb5VW8vQEh/BAr2yvVNvuiJuY6UIocYsFu/DxxRpo=
github.com/moby/go-archive v0.2.0 h1:zg5QDUM2mi0JIM9fdQZWC7U8+2ZfixfTYoHL7rWUcP8=
Expand Down
6 changes: 3 additions & 3 deletions internal/app/azldev/core/sources/synthistory.go
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ func buildSyntheticCommits(
return nil, nil
}

projectRepo, err := openProjectRepo(configFilePath)
projectRepo, err := OpenProjectRepo(configFilePath)
if err != nil {
return nil, err
}
Expand Down Expand Up @@ -206,9 +206,9 @@ func resolveConfigFilePath(config *projectconfig.ComponentConfig, componentName
return configFilePath, nil
}

// openProjectRepo finds and opens the git repository containing configFilePath by
// OpenProjectRepo finds and opens the git repository containing configFilePath by
// walking up the directory tree.
func openProjectRepo(configFilePath string) (*gogit.Repository, error) {
func OpenProjectRepo(configFilePath string) (*gogit.Repository, error) {
repo, err := gogit.PlainOpenWithOptions(filepath.Dir(configFilePath), &gogit.PlainOpenOptions{
DetectDotGit: true,
})
Expand Down
13 changes: 13 additions & 0 deletions internal/fingerprint/doc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.

// Package fingerprint computes deterministic identity fingerprints for components.
// A fingerprint captures all resolved build inputs so that changes to any input
// (config fields, spec content, overlay files, distro context, upstream refs, or
// Affects commit count) produce a different fingerprint.
//
// The primary entry point is [ComputeIdentity], which takes a resolved
// [projectconfig.ComponentConfig] and additional context, and returns a
// [ComponentIdentity] containing the overall fingerprint hash plus a breakdown
// of individual input hashes for debugging.
Comment on lines +5 to +12
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package docs state that a fingerprint captures "spec content" and "upstream refs" as part of the computed fingerprint. In the current implementation, spec/upstream identity appears to come only from IdentityOptions.SourceIdentity (and config hashing), while ComputeIdentity itself only hashes config + overlay source file contents. Please align the package documentation with the actual inputs, or extend ComputeIdentity to hash the additional claimed inputs directly.

Suggested change
// A fingerprint captures all resolved build inputs so that changes to any input
// (config fields, spec content, overlay files, distro context, upstream refs, or
// Affects commit count) produce a different fingerprint.
//
// The primary entry point is [ComputeIdentity], which takes a resolved
// [projectconfig.ComponentConfig] and additional context, and returns a
// [ComponentIdentity] containing the overall fingerprint hash plus a breakdown
// of individual input hashes for debugging.
// A fingerprint captures the resolved build inputs that [ComputeIdentity] hashes so
// that changes to any of those inputs (config fields, overlay files, distro context,
// or caller-provided source identity) produce a different fingerprint.
//
// The primary entry point is [ComputeIdentity], which takes a resolved
// [projectconfig.ComponentConfig] and additional context, and returns a
// [ComponentIdentity] containing the overall fingerprint hash plus a breakdown
// of the hashed input categories for debugging.

Copilot uses AI. Check for mistakes.
package fingerprint
170 changes: 170 additions & 0 deletions internal/fingerprint/fingerprint.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.

package fingerprint

import (
"crypto/sha256"
"encoding/hex"
"fmt"
"io"
"sort"
"strconv"

"github.com/microsoft/azure-linux-dev-tools/internal/global/opctx"
"github.com/microsoft/azure-linux-dev-tools/internal/projectconfig"
"github.com/microsoft/azure-linux-dev-tools/internal/utils/fileutils"
"github.com/mitchellh/hashstructure/v2"
)

// hashstructureTagName is the struct tag name used by hashstructure to determine
// field inclusion. Fields tagged with `fingerprint:"-"` are excluded.
const hashstructureTagName = "fingerprint"

// ComponentIdentity holds the computed fingerprint for a single component plus
// a breakdown of individual input hashes for debugging.
type ComponentIdentity struct {
// Fingerprint is the overall SHA256 hash combining all inputs.
Fingerprint string `json:"fingerprint"`
// Inputs provides the individual input hashes that were combined.
Inputs ComponentInputs `json:"inputs"`
}

// ComponentInputs contains the individual input hashes that comprise a component's
// fingerprint.
type ComponentInputs struct {
// ConfigHash is the hash of the resolved component config fields (uint64 from hashstructure).
ConfigHash uint64 `json:"configHash"`
Comment on lines +36 to +37
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfigHash is a 64-bit non-cryptographic hash (hashstructure.Hash returns uint64). Since this value is a primary input to the overall fingerprint, two different configs could theoretically collide to the same ConfigHash and therefore to the same overall fingerprint when other inputs are unchanged. If this fingerprint is used as a cache key or identity boundary, consider hashing the config to a cryptographic digest (e.g., SHA-256 over a canonical encoding) instead of (or in addition to) a uint64.

Suggested change
// ConfigHash is the hash of the resolved component config fields (uint64 from hashstructure).
ConfigHash uint64 `json:"configHash"`
// ConfigHash is a cryptographic digest (for example, SHA256 over a canonical encoding)
// of the resolved component config fields.
ConfigHash string `json:"configHash"`

Copilot uses AI. Check for mistakes.
// SourceIdentity is the opaque identity string for the component's source.
// For local specs this is a content hash; for upstream specs this is a commit hash.
SourceIdentity string `json:"sourceIdentity,omitempty"`
// OverlayFileHashes maps overlay source file paths to their SHA256 hashes.
OverlayFileHashes map[string]string `json:"overlayFileHashes,omitempty"`
// AffectsCommitCount is the number of "Affects: <component>" commits in the project repo.
AffectsCommitCount int `json:"affectsCommitCount"`
// Distro is the effective distro name.
Distro string `json:"distro"`
// DistroVersion is the effective distro version.
DistroVersion string `json:"distroVersion"`
}

// IdentityOptions holds additional inputs for computing a component's identity
// that are not part of the component config itself.
type IdentityOptions struct {
// AffectsCommitCount is the number of "Affects: <component>" commits.
AffectsCommitCount int
// SourceIdentity is the opaque identity string from a [sourceproviders.SourceIdentityProvider].
SourceIdentity string
}

// ComputeIdentity computes the fingerprint for a component from its resolved config
// and additional context. The fs parameter is used to read overlay source file
// contents for hashing; spec content identity is provided via opts.SourceIdentity.
func ComputeIdentity(
fs opctx.FS,
component projectconfig.ComponentConfig,
distroRef projectconfig.DistroReference,
opts IdentityOptions,
) (*ComponentIdentity, error) {
inputs := ComponentInputs{
AffectsCommitCount: opts.AffectsCommitCount,
SourceIdentity: opts.SourceIdentity,
Distro: distroRef.Name,
DistroVersion: distroRef.Version,
}

// 1. Verify all source files have a hash. Without a hash the fingerprint
// cannot detect content changes, so we refuse to compute one.
for i := range component.SourceFiles {
if component.SourceFiles[i].Hash == "" {
return nil, fmt.Errorf(
"source file %#q has no hash; cannot compute a deterministic fingerprint",
component.SourceFiles[i].Filename,
)
}
}

// 2. Hash the resolved config struct (excluding fingerprint:"-" fields).
configHash, err := hashstructure.Hash(component, hashstructure.FormatV2, &hashstructure.HashOptions{
TagName: hashstructureTagName,
})
if err != nil {
return nil, fmt.Errorf("hashing component config:\n%w", err)
}

inputs.ConfigHash = configHash

// 3. Hash overlay source file contents.
overlayHashes, err := hashOverlayFiles(fs, component.Overlays)
if err != nil {
return nil, fmt.Errorf("hashing overlay files:\n%w", err)
}

inputs.OverlayFileHashes = overlayHashes

// 4. Combine all inputs into the overall fingerprint.
return &ComponentIdentity{
Fingerprint: combineInputs(inputs),
Inputs: inputs,
}, nil
}

// hashOverlayFiles computes SHA256 hashes for all overlay source files that reference
// local files. Returns a map of source path to hex hash, or an empty map if no overlay
// source files exist.
func hashOverlayFiles(
fs opctx.FS,
overlays []projectconfig.ComponentOverlay,
) (map[string]string, error) {
hashes := make(map[string]string)

for _, overlay := range overlays {
if overlay.Source == "" {
continue
}

fileHash, err := fileutils.ComputeFileHash(fs, fileutils.HashTypeSHA256, overlay.Source)
if err != nil {
return nil, fmt.Errorf("hashing overlay source %#q:\n%w", overlay.Source, err)
}

hashes[overlay.Source] = fileHash
}

return hashes, nil
}

// combineInputs deterministically combines all input hashes into a single SHA256 fingerprint.
func combineInputs(inputs ComponentInputs) string {
hasher := sha256.New()

// Write each input in a fixed order with field labels for domain separation.
writeField(hasher, "config_hash", strconv.FormatUint(inputs.ConfigHash, 10))
writeField(hasher, "source_identity", inputs.SourceIdentity)
writeField(hasher, "affects_commit_count", strconv.Itoa(inputs.AffectsCommitCount))
writeField(hasher, "distro", inputs.Distro)
writeField(hasher, "distro_version", inputs.DistroVersion)

// Overlay file hashes in sorted key order for determinism.
if len(inputs.OverlayFileHashes) > 0 {
keys := make([]string, 0, len(inputs.OverlayFileHashes))
for key := range inputs.OverlayFileHashes {
keys = append(keys, key)
}

sort.Strings(keys)

for _, key := range keys {
writeField(hasher, "overlay:"+key, inputs.OverlayFileHashes[key])
}
}

return "sha256:" + hex.EncodeToString(hasher.Sum(nil))
}

// writeField writes a labeled value to the hasher for domain separation.
func writeField(writer io.Writer, label string, value string) {
// Use label=value\n format. Length-prefixing the label prevents
// collisions between field names that are prefixes of each other.
fmt.Fprintf(writer, "%d:%s=%s\n", len(label), label, value)
Comment on lines +167 to +169
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeField uses a newline-delimited label=value\n encoding without length-prefixing or escaping the value. If any value can contain \n (or other delimiters), different logical inputs can produce the same byte stream and therefore the same fingerprint. To make the encoding unambiguous, consider length-prefixing (or otherwise escaping) the value as well (e.g., len(label):label=len(value):value).

Suggested change
// Use label=value\n format. Length-prefixing the label prevents
// collisions between field names that are prefixes of each other.
fmt.Fprintf(writer, "%d:%s=%s\n", len(label), label, value)
// Use len(label):label=len(value):value\n format. Length-prefixing both the
// label and the value prevents collisions even when values contain delimiters
// such as '=', ':' or '\n'.
fmt.Fprintf(writer, "%d:%s=%d:%s\n", len(label), label, len(value), value)

Copilot uses AI. Check for mistakes.
}
Comment on lines +165 to +170
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeField ignores the error returned by fmt.Fprintf. With errcheck enabled in this repo, this will fail linting. Please handle the error (or redesign writeField so writes cannot fail) and propagate/record failures appropriately.

Copilot uses AI. Check for mistakes.
Loading
Loading