gix-ref: skip name validation in packed-refs binary search#2604
gix-ref: skip name validation in packed-refs binary search#2604Nik B (nikicat) wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8726942cfc
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".
| match packed::decode::name_at_record_start(line, self.object_hash) { | ||
| Some(name) => name, | ||
| None => { | ||
| encountered_parse_failure = true; | ||
| }) | ||
| .unwrap_or(&[]) | ||
| &[] |
There was a problem hiding this comment.
Surface malformed packed-refs records as parse failures
This match only treats shape mismatches as parse failures, so records with the right <hash><space><name><newline> layout but invalid hash/refname content are now considered comparable and no longer flip encountered_parse_failure. In a sorted packed-refs file containing such a malformed line (that is not the exact lookup target), try_find_full_name can return Ok(None) instead of Error::Parse, which hides on-disk corruption that the previous decode::reference-based comparison reported whenever the binary search touched that record.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
The pre-PR detection here was already opportunistic, not a guarantee: encountered_parse_failure only flipped for records the binary search happened to touch (~O(log n) per query), so corruption in untouched lines was never reported by try_find_full_name either. Skipping the full gix_validate::reference::name call on those comparisons is the whole point of the change.
Full validation still runs in the paths that matter:
- on a match,
try_find_full_namere-parses throughdecode::referenceand returnsErr(Parse)on bad content - full iteration (
buf.iter(), used byloose_iter/iter_packed) validates every record
Leaving as-is.
`packed::Buffer::binary_search_by` calls `decode::reference` on every comparison step (~17 per lookup on a 154k-ref store), and each decode runs `gix_validate::reference::name` -> `gix_validate::tag::name_inner` plus full hex validation on the hash. The binary search only needs the name bytes for ordered comparison; the final match in `try_find_full_name` already re-parses the record through `decode::reference` and surfaces any parse error. Introduce `decode::name_at_record_start`, which scans for the `<hash><space><name><newline>` shape and returns just the name slice without validating either piece, and use it from the binary search. Parse-failure detection is preserved via the same flag the previous implementation set. On a wide-refs incremental fetch (gitaur AUR mirror, 154k branches) this is the single largest CPU sink — `gix_validate::tag::name_inner` shows up at ~14% of total fetch CPU in `samply`, all of it from this loop. End-to-end on the same fetch the change brings wall time from ~11.0s to ~7.7s (~30%) and user CPU from ~8.1s to ~5.1s (~37%). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
8726942 to
a8f7474
Compare
|
Pushed a small follow-up that elides the redundant lifetime on |
…wins Combines two upstream PRs that are not yet released: - GitoxideLabs/gitoxide#2604 — gix-ref: skip name validation in packed-refs binary search. - GitoxideLabs/gitoxide#2605 — gix: cache packed-refs in a HashMap during fetch update_refs. On `gitaur -Sy` against the AUR mirror (154k branches in packed-refs, warm cache, no incoming updates) the combo brings wall time from ~11.0s on stock gix 0.83 to ~3.4-5.0s, putting gitaur on par with system `git fetch` on the same workload. Profile evidence: docs/PROFILING.md plus the `gitaur-bc.json.gz` samply capture. Revert to a versioned crates.io dependency once both PRs land and a gix release ships with them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cover the comparator's contract directly: valid records terminated by `\n` or `\r`, the empty-name and missing-space rejections, truncated input, inputs shorter than the hash, SHA-256 records, and the "only the first line" rule that lets the match-site decoder handle the optional peeled `^<hash>` follow-up. The new helper sits inside the hot binary-search loop, so pinning its shape contract independently of `decode::reference` guards against accidental drift if either is touched later. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
d778df6 to
0cf8980
Compare
Two small polish changes to the packed-refs binary-search hot path: - Reword the comment in `binary_search_by` to talk about `log₂(n)` redundant `gix_validate::reference::name` calls rather than the concrete "~17×" figure tied to a 154k-ref store, so the comment ages with the codebase rather than with a specific repo's ref count. - Expand the doc on `decode::name_at_record_start` to make the parse-failure contract explicit: returning `None` is the signal for the caller to flip its parse-failure flag, which is what lets a search miss against a corrupt packed-refs surface as `Error::Parse` instead of `Ok(None)`. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
packed::Buffer::binary_search_bycurrently callsdecode::referenceon every comparison step of the binary search (~17 per lookup on a 154k-ref store). Eachdecode::referencerunsgix_validate::reference::name→gix_validate::tag::name_innerand full hex validation on the hash. The binary search only needs the name bytes for ordered comparison; the final match intry_find_full_namealready re-parses the record throughdecode::referenceand surfaces any parse error.This PR introduces
decode::name_at_record_start, which scans for the<hash><space><name><newline>shape and returns just the name slice without validating either piece, and uses it from the binary search. Parse-failure detection is preserved via the same flag the previous implementation set.Measured impact
Profiled on a real-world wide-refs incremental fetch —
gitaur -Syagainst the AUR mirror (154k branches in packed-refs, ~12 MB file) on a warm cache with no incoming ref updates, sampled withsamply:gix_validate::tag::name_innerself timegix_ref::packed::decode::referenceself timegix_validate::tag::name_innerwas the single largest CPU sink, all of it coming from inside this loop.Test coverage
cargo test -p gix-refpasses locally (161/161). The existing tests fortry_find/try_find_full_nameexercise both the hit and miss paths against packed-refs fixtures; behavior on malformed records is preserved by detecting the shape mismatch and setting the sameencountered_parse_failureflag as before, so a corrupt packed-refs file still surfaces asError::Parse.AI disclosure
Per
CONTRIBUTING.md, this change was drafted with the help of Claude Opus 4.7 (see theCo-authored-bytrailer on the commit). The investigation, profiling methodology, and final review were performed by me.