Skip to content

feat(component): Add deterministic component fingerprints#47

Draft
dmcilvaney wants to merge 2 commits intomicrosoft:mainfrom
dmcilvaney:damcilva/component_change_detection_parts/5
Draft

feat(component): Add deterministic component fingerprints#47
dmcilvaney wants to merge 2 commits intomicrosoft:mainfrom
dmcilvaney:damcilva/component_change_detection_parts/5

Conversation

@dmcilvaney
Copy link
Copy Markdown
Contributor

No description provided.

Copilot AI review requested due to automatic review settings March 31, 2026 00:53
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new internal fingerprinting utility intended to compute deterministic “component fingerprints” from resolved component configuration plus additional build context.

Changes:

  • Introduces internal/fingerprint package with ComputeIdentity to produce a SHA256-based fingerprint and an inputs breakdown.
  • Adds a comprehensive test suite covering many config/input variations.
  • Exports OpenProjectRepo in synthetic history code and adds hashstructure/v2 dependency.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
internal/fingerprint/fingerprint.go New fingerprint computation logic (config hashing + overlay file hashing + SHA256 combiner).
internal/fingerprint/fingerprint_test.go New tests asserting determinism and expected sensitivity/insensitivity to various inputs.
internal/fingerprint/doc.go Package documentation for the new fingerprint module.
internal/app/azldev/core/sources/synthistory.go Renames openProjectRepo to exported OpenProjectRepo and updates call site.
go.mod Adds github.com/mitchellh/hashstructure/v2 dependency.
go.sum Adds checksums for the new dependency.

Comment on lines +76 to +83
// 1. Hash the resolved config struct (excluding fingerprint:"-" fields).
configHash, err := hashstructure.Hash(component, hashstructure.FormatV2, &hashstructure.HashOptions{
TagName: hashstructureTagName,
})
if err != nil {
return nil, fmt.Errorf("hashing component config:\n%w", err)
}

Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashstructure.Hash is configured to respect fingerprint struct tags, but the projectconfig types being hashed (e.g., ComponentConfig, ComponentOverlay, DistroReference, build configs) currently do not define any fingerprint:"-" tags. As a result, non-functional / non-deterministic fields like ComponentConfig.Name, ComponentConfig.SourceConfigFile, overlay Description, and DistroReference.Snapshot will be included in the hash and will change fingerprints unexpectedly (and will also break the tests that expect these fields to be excluded). Consider either adding the needed fingerprint:"-" tags to the relevant projectconfig structs, or hashing a purpose-built fingerprint struct that includes only deterministic build inputs.

Copilot uses AI. Check for mistakes.
Comment on lines +154 to +159
// writeField writes a labeled value to the hasher for domain separation.
func writeField(writer io.Writer, label string, value string) {
// Use label=value\n format. Length-prefixing the label prevents
// collisions between field names that are prefixes of each other.
fmt.Fprintf(writer, "%d:%s=%s\n", len(label), label, value)
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeField ignores the error returned by fmt.Fprintf. With errcheck enabled in this repo, this will fail linting. Please handle the error (or redesign writeField so writes cannot fail) and propagate/record failures appropriately.

Copilot uses AI. Check for mistakes.
Comment on lines +61 to +62
// and additional context. The fs parameter is used to read spec file and overlay
// source file contents for hashing.
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ComputeIdentity doc comment says the fs parameter is used to read "spec file" contents for hashing, but the implementation only uses fs to hash overlay source files. Either include spec content hashing here, or update the comment/docs to reflect that spec identity is expected to be provided via opts.SourceIdentity.

Suggested change
// and additional context. The fs parameter is used to read spec file and overlay
// source file contents for hashing.
// and additional context. Spec identity is provided via opts.SourceIdentity; the fs
// parameter is used to read overlay source file contents for hashing.

Copilot uses AI. Check for mistakes.
Comment on lines +170 to +204
func TestComputeIdentity_ExcludedFieldsDoNotChange(t *testing.T) {
ctx := newTestFS(t, map[string]string{
"/specs/test.spec": "Name: testpkg\nVersion: 1.0",
})
distro := baseDistroRef()

// Base component.
comp := baseComponent()
fpBase := computeFingerprint(t, ctx, comp, distro, 0)

// Changing Name (fingerprint:"-") should NOT change fingerprint.
compName := baseComponent()
compName.Name = "different-name"
fpName := computeFingerprint(t, ctx, compName, distro, 0)
assert.Equal(t, fpBase, fpName, "changing Name must NOT change fingerprint")

// Changing Build.Failure.Expected (fingerprint:"-") should NOT change fingerprint.
compFailure := baseComponent()
compFailure.Build.Failure.Expected = true
compFailure.Build.Failure.ExpectedReason = "known issue"
fpFailure := computeFingerprint(t, ctx, compFailure, distro, 0)
assert.Equal(t, fpBase, fpFailure, "changing failure.expected must NOT change fingerprint")

// Changing Build.Hints.Expensive (fingerprint:"-") should NOT change fingerprint.
compHints := baseComponent()
compHints.Build.Hints.Expensive = true
fpHints := computeFingerprint(t, ctx, compHints, distro, 0)
assert.Equal(t, fpBase, fpHints, "changing hints.expensive must NOT change fingerprint")

// Changing Build.Check.SkipReason (fingerprint:"-") should NOT change fingerprint.
compReason := baseComponent()
compReason.Build.Check.SkipReason = "tests require network"
fpReason := computeFingerprint(t, ctx, compReason, distro, 0)
assert.Equal(t, fpBase, fpReason, "changing check.skip_reason must NOT change fingerprint")
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests assume various config fields are excluded from the config hash via fingerprint:"-" tags (e.g., ComponentConfig.Name, Build.Failure.Expected, Build.Hints.Expensive, Build.Check.SkipReason, overlay Description, and upstream distro Snapshot). The projectconfig structs currently do not define any fingerprint tags, so these expectations will fail unless the exclusion mechanism is implemented in production code (or the tests are updated to match the actual hashing behavior).

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +26
func newTestFS(t *testing.T, files map[string]string) *testctx.TestCtx {
t.Helper()

ctx := testctx.NewCtx()

for path, content := range files {
err := afero.WriteFile(ctx.FS(), path, []byte(content), 0o644)
require.NoError(t, err)
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newTestFS writes files using afero.WriteFile directly. Elsewhere in the repo the convention is to use fileutils.WriteFile (and fileperms for permissions) when writing via an opctx.FS. Aligning with that pattern keeps file operations consistent and avoids importing afero in tests unnecessarily.

Copilot uses AI. Check for mistakes.
@@ -208,7 +208,7 @@ func resolveConfigFilePath(config *projectconfig.ComponentConfig, componentName

// openProjectRepo finds and opens the git repository containing configFilePath by
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenProjectRepo is now exported, but its leading comment still starts with openProjectRepo. With revive's exported-symbol checks enabled, exported functions must have comments that begin with the function name. Please update the comment to start with OpenProjectRepo ....

Suggested change
// openProjectRepo finds and opens the git repository containing configFilePath by
// OpenProjectRepo finds and opens the git repository containing configFilePath by

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +12
// A fingerprint captures all resolved build inputs so that changes to any input
// (config fields, spec content, overlay files, distro context, upstream refs, or
// Affects commit count) produce a different fingerprint.
//
// The primary entry point is [ComputeIdentity], which takes a resolved
// [projectconfig.ComponentConfig] and additional context, and returns a
// [ComponentIdentity] containing the overall fingerprint hash plus a breakdown
// of individual input hashes for debugging.
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package docs state that a fingerprint captures "spec content" and "upstream refs" as part of the computed fingerprint. In the current implementation, spec/upstream identity appears to come only from IdentityOptions.SourceIdentity (and config hashing), while ComputeIdentity itself only hashes config + overlay source file contents. Please align the package documentation with the actual inputs, or extend ComputeIdentity to hash the additional claimed inputs directly.

Suggested change
// A fingerprint captures all resolved build inputs so that changes to any input
// (config fields, spec content, overlay files, distro context, upstream refs, or
// Affects commit count) produce a different fingerprint.
//
// The primary entry point is [ComputeIdentity], which takes a resolved
// [projectconfig.ComponentConfig] and additional context, and returns a
// [ComponentIdentity] containing the overall fingerprint hash plus a breakdown
// of individual input hashes for debugging.
// A fingerprint captures the resolved build inputs that [ComputeIdentity] hashes so
// that changes to any of those inputs (config fields, overlay files, distro context,
// or caller-provided source identity) produce a different fingerprint.
//
// The primary entry point is [ComputeIdentity], which takes a resolved
// [projectconfig.ComponentConfig] and additional context, and returns a
// [ComponentIdentity] containing the overall fingerprint hash plus a breakdown
// of the hashed input categories for debugging.

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +37
// ConfigHash is the hash of the resolved component config fields (uint64 from hashstructure).
ConfigHash uint64 `json:"configHash"`
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfigHash is a 64-bit non-cryptographic hash (hashstructure.Hash returns uint64). Since this value is a primary input to the overall fingerprint, two different configs could theoretically collide to the same ConfigHash and therefore to the same overall fingerprint when other inputs are unchanged. If this fingerprint is used as a cache key or identity boundary, consider hashing the config to a cryptographic digest (e.g., SHA-256 over a canonical encoding) instead of (or in addition to) a uint64.

Suggested change
// ConfigHash is the hash of the resolved component config fields (uint64 from hashstructure).
ConfigHash uint64 `json:"configHash"`
// ConfigHash is a cryptographic digest (for example, SHA256 over a canonical encoding)
// of the resolved component config fields.
ConfigHash string `json:"configHash"`

Copilot uses AI. Check for mistakes.
@dmcilvaney dmcilvaney force-pushed the damcilva/component_change_detection_parts/5 branch from a7d028e to 3fb8df7 Compare March 31, 2026 01:06
Copilot AI review requested due to automatic review settings March 31, 2026 01:07
@dmcilvaney dmcilvaney force-pushed the damcilva/component_change_detection_parts/5 branch from 3fb8df7 to 759c099 Compare March 31, 2026 01:07
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 5 comments.

Comment on lines +76 to +80
// 1. Hash the resolved config struct (excluding fingerprint:"-" fields).
configHash, err := hashstructure.Hash(component, hashstructure.FormatV2, &hashstructure.HashOptions{
TagName: hashstructureTagName,
})
if err != nil {
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashstructure.Hash is configured to look for struct tags named fingerprint to exclude fields (fingerprint:"-"), but there are currently no such tags on internal/projectconfig structs. As a result, all exported fields (including ComponentConfig.Name, SourceConfigFile, overlay Description, source file Origin, distro Snapshot, etc.) will be included in ConfigHash, which contradicts the intended exclusions and the added tests. Either add the fingerprint:"-" tags to the relevant config structs, or change the hashing strategy/tag name to match existing tags and update the tests/expectations accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +156 to +158
// Use label=value\n format. Length-prefixing the label prevents
// collisions between field names that are prefixes of each other.
fmt.Fprintf(writer, "%d:%s=%s\n", len(label), label, value)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeField uses a newline-delimited label=value\n encoding without length-prefixing or escaping the value. If any value can contain \n (or other delimiters), different logical inputs can produce the same byte stream and therefore the same fingerprint. To make the encoding unambiguous, consider length-prefixing (or otherwise escaping) the value as well (e.g., len(label):label=len(value):value).

Suggested change
// Use label=value\n format. Length-prefixing the label prevents
// collisions between field names that are prefixes of each other.
fmt.Fprintf(writer, "%d:%s=%s\n", len(label), label, value)
// Use len(label):label=len(value):value\n format. Length-prefixing both the
// label and the value prevents collisions even when values contain delimiters
// such as '=', ':' or '\n'.
fmt.Fprintf(writer, "%d:%s=%d:%s\n", len(label), label, len(value), value)

Copilot uses AI. Check for mistakes.
Comment on lines +206 to +226
func TestComputeIdentity_OverlayDescriptionExcluded(t *testing.T) {
ctx := newTestFS(t, map[string]string{
"/specs/test.spec": "Name: testpkg\nVersion: 1.0",
})
distro := baseDistroRef()

comp1 := baseComponent()
comp1.Overlays = []projectconfig.ComponentOverlay{
{Type: "spec-set-tag", Tag: "Release", Value: "2%{?dist}"},
}

comp2 := baseComponent()
comp2.Overlays = []projectconfig.ComponentOverlay{
{Type: "spec-set-tag", Tag: "Release", Value: "2%{?dist}", Description: "bumped release"},
}

fp1 := computeFingerprint(t, ctx, comp1, distro, 0)
fp2 := computeFingerprint(t, ctx, comp2, distro, 0)

assert.Equal(t, fp1, fp2, "overlay description must NOT change fingerprint")
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test expects overlay Description to be excluded from the fingerprint, but projectconfig.ComponentOverlay.Description currently has no fingerprint:"-" tag and will be included in hashstructure.Hash(component, ...). As written, comp1 and comp2 will hash differently and the assertion will fail unless the overlay struct is updated to exclude Description (or the fingerprint strategy is changed).

Copilot uses AI. Check for mistakes.
fp1 := computeFingerprint(t, ctx, comp1, distro, 0)
fp2 := computeFingerprint(t, ctx, comp2, distro, 0)

assert.Equal(t, fp1, fp2, "changing source file origin URL must NOT change fingerprint")
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test expects a change in SourceFileReference.Origin.Uri to not affect the fingerprint, but projectconfig.SourceFileReference.Origin is currently part of the hashed ComponentConfig and has no fingerprint:"-" exclusion tag. With the current implementation, changing the origin URL will change ConfigHash and the fingerprint, so this assertion will fail unless Origin (or at least Origin.Uri) is excluded from hashing.

Suggested change
assert.Equal(t, fp1, fp2, "changing source file origin URL must NOT change fingerprint")
assert.NotEqual(t, fp1, fp2, "changing source file origin URL must change fingerprint")

Copilot uses AI. Check for mistakes.
Comment on lines +571 to +597
func TestComputeIdentity_SnapshotChangeDoesNotAffectFingerprint(t *testing.T) {
ctx := newTestFS(t, nil)

comp := projectconfig.ComponentConfig{
Spec: projectconfig.SpecSource{
SourceType: projectconfig.SpecSourceTypeUpstream,
UpstreamName: "curl",
UpstreamCommit: "abc1234",
UpstreamDistro: projectconfig.DistroReference{
Name: "fedora",
Version: "41",
Snapshot: "2025-01-01T00:00:00Z",
},
},
}
distro := baseDistroRef()

fp1 := computeFingerprint(t, ctx, comp, distro, 0)

// Change only the snapshot timestamp.
comp.Spec.UpstreamDistro.Snapshot = "2026-06-15T00:00:00Z"
fp2 := computeFingerprint(t, ctx, comp, distro, 0)

assert.Equal(t, fp1, fp2,
"changing upstream distro snapshot must NOT change fingerprint "+
"(snapshot is excluded; resolved commit hash is what matters)")
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test expects Spec.UpstreamDistro.Snapshot changes to be excluded from the fingerprint, but projectconfig.DistroReference.Snapshot is currently a normal exported field with no fingerprint:"-" tag, so it will be included in ConfigHash and will change the computed fingerprint. Either exclude Snapshot from the hashed config (via tags or hashing options), or update the test/behavior to reflect that snapshot is part of the identity.

Copilot uses AI. Check for mistakes.
@dmcilvaney dmcilvaney marked this pull request as draft March 31, 2026 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants