fix(gix-pack)!: accept non-canonical pack entry size headers#2616
Open
_WD_ (0WD0) wants to merge 1 commit into
Open
fix(gix-pack)!: accept non-canonical pack entry size headers#2616_WD_ (0WD0) wants to merge 1 commit into
_WD_ (0WD0) wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR adjusts pack entry parsing to accept Git-compatible non-canonical pack entry size encodings by preserving the actual encoded header byte length, preventing offset math from relying on recomputed canonical header lengths.
Changes:
- Store the actual consumed pack entry header byte length on
data::Entryand use it for offset computations. - Stop rejecting non-canonical size encodings during pack entry header parsing.
- Add regression tests to ensure non-canonical encodings decode correctly and keep delta base offsets correct.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| gix-pack/tests/pack/data/fuzzed.rs | Adds an integration-style regression test for decoding a pack with a non-canonical entry header. |
| gix-pack/src/data/mod.rs | Extends data::Entry to persist the actual encoded header length (needed to accept non-canonical encodings safely). |
| gix-pack/src/data/entry/mod.rs | Updates offset calculations to rely on the stored header length and fall back when unknown. |
| gix-pack/src/data/entry/decode.rs | Removes canonical-encoding rejection, records consumed header bytes, and updates tests accordingly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
C Git accepts overlong pack entry size encodings, and real servers can send them when reusing existing pack data. Accept these headers while recording the actual header length consumed from the pack. Keeping the actual header length avoids recomputing a canonical length from the decoded size, which would break pack offset reconstruction and ofs-delta base offset calculations for non-canonical entries. BREAKING CHANGE: gix_pack::data::Entry now has a public encoded_header_size field.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Accept non-canonical pack entry size headers in
gix-pack, matching C Git's read behavior.Details
C Git accepts overlong pack entry size encodings. For example,
b3 00decodes to a blob with size3, even though the canonical encoding would be33.This can appear in real fetch/clone traffic when a server reuses existing pack data. A real-world example is web3infra-foundation/mega (tested on commit d2d797b333e8aa8a78fc1d8f5c6cae33360bc9bc), where GitHub sends a pack entry for
moon/.nvmrcwith bytes startingb3 00. C Git can read it andgit fsck --fullaccepts it, butgix-packcurrently rejects it with:This change removes the canonical-size-length rejection and records the actual number of header bytes consumed.
Preserving the actual header length is important: accepting non-canonical headers while recomputing a canonical length would break pack_offset() reconstruction and ofs-delta base offset calculations.
Pack writing still emits canonical headers.