gix: cache packed-refs in a HashMap during fetch update_refs#2605
gix: cache packed-refs in a HashMap during fetch update_refs#2605Nik B (nikicat) wants to merge 3 commits into
Conversation
`fetch::refs::update()` calls `repo.try_find_reference(name)` once per advertised mapping. Each call does a filesystem stat for a loose ref followed by a binary search over `packed-refs` — and on wide-refs mirrors (e.g. AUR's 154k branches) that loop dominates wall time. Snapshot `packed-refs` into a `HashMap<BString, gix_ref::Reference>` once and enumerate loose refs into a `HashSet` once at the start of `update()`. Per-mapping lookups consult the HashMap when the name has no loose shadow, falling back to `repo.try_find_reference()` only when a loose ref exists (preserving the precedence rule) or when the snapshot couldn't be built. The optimization is purely additive: any failure to build the snapshots falls back to the unchanged slow path. Measured on a `gitaur -Sy` against the AUR mirror (154k packed refs, ~200 loose refs, warm cache, no incoming updates): unpatched 11.0s wall / 8.1s user CPU (C) alone 8.0s wall / 5.8s user CPU (~27% faster) (B) + (C) 5.0s wall / 3.3s user CPU (~55% faster, ≈ git CLI) (B) is the sibling change in gix-ref that drops name validation from the binary-search comparator; (C) and (B) target different phases — the have-set build still drives binary searches (helped by B), while update_refs itself is now a hash lookup (helped by C). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aa9dc5550a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".
| let packed: HashMap<BString, gix_ref::Reference> = iter | ||
| .filter_map(Result::ok) | ||
| .map(|r| { | ||
| let name = r.name.as_bstr().to_owned(); |
There was a problem hiding this comment.
Preserve namespace when keying packed-refs snapshot
Store entries in packed using namespace-aware names: this code keys by r.name as read from packed-refs, but lookups are performed with local mapping names (fp.lookup(name.as_bstr())) that are not namespace-prefixed. In repositories with core.refsNamespace set, try_find_reference() applies/strips the namespace during lookup, while this fast path does not, so existing packed refs can be misclassified as missing (or matched against non-namespaced refs), leading to incorrect update planning and possible transaction failures during fetch.
Useful? React with 👍 / 👎.
| let buf = repo.refs.cached_packed_buffer().ok().flatten()?; | ||
| let iter = buf.iter().ok()?; | ||
| let packed: HashMap<BString, gix_ref::Reference> = iter | ||
| .filter_map(Result::ok) |
There was a problem hiding this comment.
Stop discarding ref-parse errors in snapshot construction
Do not ignore iterator errors here: filter_map(Result::ok) silently drops malformed packed/loose refs while building the fast-path snapshot. That changes behavior versus repo.try_find_reference(); for example, a malformed loose ref that shadows a packed ref would now be skipped, allowing the packed ref to be used instead of surfacing the corruption. The fast path should bail out to the slow path (or propagate) on iterator errors to preserve existing semantics.
Useful? React with 👍 / 👎.
…wins Combines two upstream PRs that are not yet released: - GitoxideLabs/gitoxide#2604 — gix-ref: skip name validation in packed-refs binary search. - GitoxideLabs/gitoxide#2605 — gix: cache packed-refs in a HashMap during fetch update_refs. On `gitaur -Sy` against the AUR mirror (154k branches in packed-refs, warm cache, no incoming updates) the combo brings wall time from ~11.0s on stock gix 0.83 to ~3.4-5.0s, putting gitaur on par with system `git fetch` on the same workload. Profile evidence: docs/PROFILING.md plus the `gitaur-bc.json.gz` samply capture. Revert to a versioned crates.io dependency once both PRs land and a gix release ships with them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The packed-refs fast path keyed entries by their raw on-disk names, which include the namespace prefix when `core.refsNamespace` is set, while lookups use namespace-stripped local names — packed refs would be misclassified as missing. Bail out when a namespace is configured so the slow path applies/strips it correctly. Also stop dropping iterator errors via `filter_map(Result::ok)`. A malformed loose ref shadowing a packed entry would otherwise let the fast path return the packed value instead of surfacing the corruption the slow path raises.
|
Superseded by #2608, which moves the optimization inside |
Summary
fetch::refs::update()callsrepo.try_find_reference(name)once per advertised mapping. Each call does a filesystem stat for a loose ref followed by a binary search overpacked-refs— and on wide-refs mirrors (e.g. AUR's 154k branches) that loop dominates wall time.This PR snapshots
packed-refsinto aHashMap<BString, gix_ref::Reference>once and enumerates loose ref names into aHashSetonce at the start ofupdate(). Per-mapping lookups consult the HashMap when the name has no loose shadow, falling back torepo.try_find_reference()only when a loose ref exists (preserving the loose-shadows-packed precedence) or when the snapshot couldn't be built.The optimization is purely additive: any failure to build the snapshots makes
build_lookup_fast_pathreturnNoneand the loop falls back to the unchanged slow path.Measured impact
gitaur -Syagainst the AUR mirror (154k packed refs, ~200 loose refs, warm cache, no incoming updates):The two changes target different fetch phases:
packed::Buffer::binary_search_by, which speeds up the negotiation have-set build phase (still does many binary searches viaNegotiate::mark_complete).Together they bring gitaur's incremental fetch on the AUR mirror to roughly the same wall time as
git fetch, down from ~2.2× slower previously.Trade-offs / scope
packed-refsperupdate()call (cost amortized across all mappings). For small repos with few refs the overhead is in the microseconds — not benchmarked specifically, but the HashMap insertions are O(n) where n is the packed-refs entry count, and agitoxidesmoke run on a small repo showed no regression.repo.refs.loose_iter()and bails to the slow path on iteration error. For repos where most refs are loose this is essentially the same work as today; for mirror-style repos with packed-heavy state it's the right shape.packed::Bufferitself (e.g. a lazy index of name→offset), to also accelerate non-fetch callers oftry_find_full_name. That's a larger change touching a cached data structure — happy to discuss whether you'd prefer the optimization in that form instead, but this localized version was easier to scope, measure, and review independently.Test coverage
cargo test -p gix-refpasses (161/161).cargo test -p gix --features blocking-network-client --test gix remote::fetch::blocking_and_async_iopasses the 7 tests that pass on main (3 tests fail with the same hash-mismatch on main without this PR — pre-existing, fixture-dependent).AI disclosure
Per
CONTRIBUTING.md, this change was drafted with the help of Claude Opus 4.7 (see theCo-authored-by:trailer on the commit). Investigation, profiling, and review were done by me.