feat(component): Add deterministic component fingerprints#47
feat(component): Add deterministic component fingerprints#47dmcilvaney wants to merge 2 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new internal fingerprinting utility intended to compute deterministic “component fingerprints” from resolved component configuration plus additional build context.
Changes:
- Introduces
internal/fingerprintpackage withComputeIdentityto produce a SHA256-based fingerprint and an inputs breakdown. - Adds a comprehensive test suite covering many config/input variations.
- Exports
OpenProjectRepoin synthetic history code and addshashstructure/v2dependency.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/fingerprint/fingerprint.go | New fingerprint computation logic (config hashing + overlay file hashing + SHA256 combiner). |
| internal/fingerprint/fingerprint_test.go | New tests asserting determinism and expected sensitivity/insensitivity to various inputs. |
| internal/fingerprint/doc.go | Package documentation for the new fingerprint module. |
| internal/app/azldev/core/sources/synthistory.go | Renames openProjectRepo to exported OpenProjectRepo and updates call site. |
| go.mod | Adds github.com/mitchellh/hashstructure/v2 dependency. |
| go.sum | Adds checksums for the new dependency. |
internal/fingerprint/fingerprint.go
Outdated
| // 1. Hash the resolved config struct (excluding fingerprint:"-" fields). | ||
| configHash, err := hashstructure.Hash(component, hashstructure.FormatV2, &hashstructure.HashOptions{ | ||
| TagName: hashstructureTagName, | ||
| }) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("hashing component config:\n%w", err) | ||
| } | ||
|
|
There was a problem hiding this comment.
hashstructure.Hash is configured to respect fingerprint struct tags, but the projectconfig types being hashed (e.g., ComponentConfig, ComponentOverlay, DistroReference, build configs) currently do not define any fingerprint:"-" tags. As a result, non-functional / non-deterministic fields like ComponentConfig.Name, ComponentConfig.SourceConfigFile, overlay Description, and DistroReference.Snapshot will be included in the hash and will change fingerprints unexpectedly (and will also break the tests that expect these fields to be excluded). Consider either adding the needed fingerprint:"-" tags to the relevant projectconfig structs, or hashing a purpose-built fingerprint struct that includes only deterministic build inputs.
| // writeField writes a labeled value to the hasher for domain separation. | ||
| func writeField(writer io.Writer, label string, value string) { | ||
| // Use label=value\n format. Length-prefixing the label prevents | ||
| // collisions between field names that are prefixes of each other. | ||
| fmt.Fprintf(writer, "%d:%s=%s\n", len(label), label, value) | ||
| } |
There was a problem hiding this comment.
writeField ignores the error returned by fmt.Fprintf. With errcheck enabled in this repo, this will fail linting. Please handle the error (or redesign writeField so writes cannot fail) and propagate/record failures appropriately.
internal/fingerprint/fingerprint.go
Outdated
| // and additional context. The fs parameter is used to read spec file and overlay | ||
| // source file contents for hashing. |
There was a problem hiding this comment.
The ComputeIdentity doc comment says the fs parameter is used to read "spec file" contents for hashing, but the implementation only uses fs to hash overlay source files. Either include spec content hashing here, or update the comment/docs to reflect that spec identity is expected to be provided via opts.SourceIdentity.
| // and additional context. The fs parameter is used to read spec file and overlay | |
| // source file contents for hashing. | |
| // and additional context. Spec identity is provided via opts.SourceIdentity; the fs | |
| // parameter is used to read overlay source file contents for hashing. |
| func TestComputeIdentity_ExcludedFieldsDoNotChange(t *testing.T) { | ||
| ctx := newTestFS(t, map[string]string{ | ||
| "/specs/test.spec": "Name: testpkg\nVersion: 1.0", | ||
| }) | ||
| distro := baseDistroRef() | ||
|
|
||
| // Base component. | ||
| comp := baseComponent() | ||
| fpBase := computeFingerprint(t, ctx, comp, distro, 0) | ||
|
|
||
| // Changing Name (fingerprint:"-") should NOT change fingerprint. | ||
| compName := baseComponent() | ||
| compName.Name = "different-name" | ||
| fpName := computeFingerprint(t, ctx, compName, distro, 0) | ||
| assert.Equal(t, fpBase, fpName, "changing Name must NOT change fingerprint") | ||
|
|
||
| // Changing Build.Failure.Expected (fingerprint:"-") should NOT change fingerprint. | ||
| compFailure := baseComponent() | ||
| compFailure.Build.Failure.Expected = true | ||
| compFailure.Build.Failure.ExpectedReason = "known issue" | ||
| fpFailure := computeFingerprint(t, ctx, compFailure, distro, 0) | ||
| assert.Equal(t, fpBase, fpFailure, "changing failure.expected must NOT change fingerprint") | ||
|
|
||
| // Changing Build.Hints.Expensive (fingerprint:"-") should NOT change fingerprint. | ||
| compHints := baseComponent() | ||
| compHints.Build.Hints.Expensive = true | ||
| fpHints := computeFingerprint(t, ctx, compHints, distro, 0) | ||
| assert.Equal(t, fpBase, fpHints, "changing hints.expensive must NOT change fingerprint") | ||
|
|
||
| // Changing Build.Check.SkipReason (fingerprint:"-") should NOT change fingerprint. | ||
| compReason := baseComponent() | ||
| compReason.Build.Check.SkipReason = "tests require network" | ||
| fpReason := computeFingerprint(t, ctx, compReason, distro, 0) | ||
| assert.Equal(t, fpBase, fpReason, "changing check.skip_reason must NOT change fingerprint") | ||
| } |
There was a problem hiding this comment.
These tests assume various config fields are excluded from the config hash via fingerprint:"-" tags (e.g., ComponentConfig.Name, Build.Failure.Expected, Build.Hints.Expensive, Build.Check.SkipReason, overlay Description, and upstream distro Snapshot). The projectconfig structs currently do not define any fingerprint tags, so these expectations will fail unless the exclusion mechanism is implemented in production code (or the tests are updated to match the actual hashing behavior).
| func newTestFS(t *testing.T, files map[string]string) *testctx.TestCtx { | ||
| t.Helper() | ||
|
|
||
| ctx := testctx.NewCtx() | ||
|
|
||
| for path, content := range files { | ||
| err := afero.WriteFile(ctx.FS(), path, []byte(content), 0o644) | ||
| require.NoError(t, err) | ||
| } |
There was a problem hiding this comment.
newTestFS writes files using afero.WriteFile directly. Elsewhere in the repo the convention is to use fileutils.WriteFile (and fileperms for permissions) when writing via an opctx.FS. Aligning with that pattern keeps file operations consistent and avoids importing afero in tests unnecessarily.
| @@ -208,7 +208,7 @@ func resolveConfigFilePath(config *projectconfig.ComponentConfig, componentName | |||
|
|
|||
| // openProjectRepo finds and opens the git repository containing configFilePath by | |||
There was a problem hiding this comment.
OpenProjectRepo is now exported, but its leading comment still starts with openProjectRepo. With revive's exported-symbol checks enabled, exported functions must have comments that begin with the function name. Please update the comment to start with OpenProjectRepo ....
| // openProjectRepo finds and opens the git repository containing configFilePath by | |
| // OpenProjectRepo finds and opens the git repository containing configFilePath by |
| // A fingerprint captures all resolved build inputs so that changes to any input | ||
| // (config fields, spec content, overlay files, distro context, upstream refs, or | ||
| // Affects commit count) produce a different fingerprint. | ||
| // | ||
| // The primary entry point is [ComputeIdentity], which takes a resolved | ||
| // [projectconfig.ComponentConfig] and additional context, and returns a | ||
| // [ComponentIdentity] containing the overall fingerprint hash plus a breakdown | ||
| // of individual input hashes for debugging. |
There was a problem hiding this comment.
Package docs state that a fingerprint captures "spec content" and "upstream refs" as part of the computed fingerprint. In the current implementation, spec/upstream identity appears to come only from IdentityOptions.SourceIdentity (and config hashing), while ComputeIdentity itself only hashes config + overlay source file contents. Please align the package documentation with the actual inputs, or extend ComputeIdentity to hash the additional claimed inputs directly.
| // A fingerprint captures all resolved build inputs so that changes to any input | |
| // (config fields, spec content, overlay files, distro context, upstream refs, or | |
| // Affects commit count) produce a different fingerprint. | |
| // | |
| // The primary entry point is [ComputeIdentity], which takes a resolved | |
| // [projectconfig.ComponentConfig] and additional context, and returns a | |
| // [ComponentIdentity] containing the overall fingerprint hash plus a breakdown | |
| // of individual input hashes for debugging. | |
| // A fingerprint captures the resolved build inputs that [ComputeIdentity] hashes so | |
| // that changes to any of those inputs (config fields, overlay files, distro context, | |
| // or caller-provided source identity) produce a different fingerprint. | |
| // | |
| // The primary entry point is [ComputeIdentity], which takes a resolved | |
| // [projectconfig.ComponentConfig] and additional context, and returns a | |
| // [ComponentIdentity] containing the overall fingerprint hash plus a breakdown | |
| // of the hashed input categories for debugging. |
| // ConfigHash is the hash of the resolved component config fields (uint64 from hashstructure). | ||
| ConfigHash uint64 `json:"configHash"` |
There was a problem hiding this comment.
ConfigHash is a 64-bit non-cryptographic hash (hashstructure.Hash returns uint64). Since this value is a primary input to the overall fingerprint, two different configs could theoretically collide to the same ConfigHash and therefore to the same overall fingerprint when other inputs are unchanged. If this fingerprint is used as a cache key or identity boundary, consider hashing the config to a cryptographic digest (e.g., SHA-256 over a canonical encoding) instead of (or in addition to) a uint64.
| // ConfigHash is the hash of the resolved component config fields (uint64 from hashstructure). | |
| ConfigHash uint64 `json:"configHash"` | |
| // ConfigHash is a cryptographic digest (for example, SHA256 over a canonical encoding) | |
| // of the resolved component config fields. | |
| ConfigHash string `json:"configHash"` |
a7d028e to
3fb8df7
Compare
3fb8df7 to
759c099
Compare
internal/fingerprint/fingerprint.go
Outdated
| // 1. Hash the resolved config struct (excluding fingerprint:"-" fields). | ||
| configHash, err := hashstructure.Hash(component, hashstructure.FormatV2, &hashstructure.HashOptions{ | ||
| TagName: hashstructureTagName, | ||
| }) | ||
| if err != nil { |
There was a problem hiding this comment.
hashstructure.Hash is configured to look for struct tags named fingerprint to exclude fields (fingerprint:"-"), but there are currently no such tags on internal/projectconfig structs. As a result, all exported fields (including ComponentConfig.Name, SourceConfigFile, overlay Description, source file Origin, distro Snapshot, etc.) will be included in ConfigHash, which contradicts the intended exclusions and the added tests. Either add the fingerprint:"-" tags to the relevant config structs, or change the hashing strategy/tag name to match existing tags and update the tests/expectations accordingly.
| // Use label=value\n format. Length-prefixing the label prevents | ||
| // collisions between field names that are prefixes of each other. | ||
| fmt.Fprintf(writer, "%d:%s=%s\n", len(label), label, value) |
There was a problem hiding this comment.
writeField uses a newline-delimited label=value\n encoding without length-prefixing or escaping the value. If any value can contain \n (or other delimiters), different logical inputs can produce the same byte stream and therefore the same fingerprint. To make the encoding unambiguous, consider length-prefixing (or otherwise escaping) the value as well (e.g., len(label):label=len(value):value).
| // Use label=value\n format. Length-prefixing the label prevents | |
| // collisions between field names that are prefixes of each other. | |
| fmt.Fprintf(writer, "%d:%s=%s\n", len(label), label, value) | |
| // Use len(label):label=len(value):value\n format. Length-prefixing both the | |
| // label and the value prevents collisions even when values contain delimiters | |
| // such as '=', ':' or '\n'. | |
| fmt.Fprintf(writer, "%d:%s=%d:%s\n", len(label), label, len(value), value) |
| func TestComputeIdentity_OverlayDescriptionExcluded(t *testing.T) { | ||
| ctx := newTestFS(t, map[string]string{ | ||
| "/specs/test.spec": "Name: testpkg\nVersion: 1.0", | ||
| }) | ||
| distro := baseDistroRef() | ||
|
|
||
| comp1 := baseComponent() | ||
| comp1.Overlays = []projectconfig.ComponentOverlay{ | ||
| {Type: "spec-set-tag", Tag: "Release", Value: "2%{?dist}"}, | ||
| } | ||
|
|
||
| comp2 := baseComponent() | ||
| comp2.Overlays = []projectconfig.ComponentOverlay{ | ||
| {Type: "spec-set-tag", Tag: "Release", Value: "2%{?dist}", Description: "bumped release"}, | ||
| } | ||
|
|
||
| fp1 := computeFingerprint(t, ctx, comp1, distro, 0) | ||
| fp2 := computeFingerprint(t, ctx, comp2, distro, 0) | ||
|
|
||
| assert.Equal(t, fp1, fp2, "overlay description must NOT change fingerprint") | ||
| } |
There was a problem hiding this comment.
This test expects overlay Description to be excluded from the fingerprint, but projectconfig.ComponentOverlay.Description currently has no fingerprint:"-" tag and will be included in hashstructure.Hash(component, ...). As written, comp1 and comp2 will hash differently and the assertion will fail unless the overlay struct is updated to exclude Description (or the fingerprint strategy is changed).
| fp1 := computeFingerprint(t, ctx, comp1, distro, 0) | ||
| fp2 := computeFingerprint(t, ctx, comp2, distro, 0) | ||
|
|
||
| assert.Equal(t, fp1, fp2, "changing source file origin URL must NOT change fingerprint") |
There was a problem hiding this comment.
This test expects a change in SourceFileReference.Origin.Uri to not affect the fingerprint, but projectconfig.SourceFileReference.Origin is currently part of the hashed ComponentConfig and has no fingerprint:"-" exclusion tag. With the current implementation, changing the origin URL will change ConfigHash and the fingerprint, so this assertion will fail unless Origin (or at least Origin.Uri) is excluded from hashing.
| assert.Equal(t, fp1, fp2, "changing source file origin URL must NOT change fingerprint") | |
| assert.NotEqual(t, fp1, fp2, "changing source file origin URL must change fingerprint") |
| func TestComputeIdentity_SnapshotChangeDoesNotAffectFingerprint(t *testing.T) { | ||
| ctx := newTestFS(t, nil) | ||
|
|
||
| comp := projectconfig.ComponentConfig{ | ||
| Spec: projectconfig.SpecSource{ | ||
| SourceType: projectconfig.SpecSourceTypeUpstream, | ||
| UpstreamName: "curl", | ||
| UpstreamCommit: "abc1234", | ||
| UpstreamDistro: projectconfig.DistroReference{ | ||
| Name: "fedora", | ||
| Version: "41", | ||
| Snapshot: "2025-01-01T00:00:00Z", | ||
| }, | ||
| }, | ||
| } | ||
| distro := baseDistroRef() | ||
|
|
||
| fp1 := computeFingerprint(t, ctx, comp, distro, 0) | ||
|
|
||
| // Change only the snapshot timestamp. | ||
| comp.Spec.UpstreamDistro.Snapshot = "2026-06-15T00:00:00Z" | ||
| fp2 := computeFingerprint(t, ctx, comp, distro, 0) | ||
|
|
||
| assert.Equal(t, fp1, fp2, | ||
| "changing upstream distro snapshot must NOT change fingerprint "+ | ||
| "(snapshot is excluded; resolved commit hash is what matters)") | ||
| } |
There was a problem hiding this comment.
This test expects Spec.UpstreamDistro.Snapshot changes to be excluded from the fingerprint, but projectconfig.DistroReference.Snapshot is currently a normal exported field with no fingerprint:"-" tag, so it will be included in ConfigHash and will change the computed fingerprint. Either exclude Snapshot from the hashed config (via tags or hashing options), or update the test/behavior to reflect that snapshot is part of the identity.
No description provided.