feat: filters and partial cloning: initial support#2375
feat: filters and partial cloning: initial support#2375Jake Staehle (staehle) wants to merge 3 commits intoGitoxideLabs:mainfrom
Conversation
3f30421 to
ddd9fd1
Compare
| self.shallow, | ||
| negotiate::make_refmapping_ignore_predicate(self.tags, self.ref_map), | ||
| ) | ||
| if self.additional_wants.is_empty() { |
There was a problem hiding this comment.
The condition is inverted. When additional_wants is empty, the code calls negotiate::add_wants() but then still iterates over additional_wants (lines 315-318), which does nothing. When additional_wants is NOT empty, it skips the normal wants logic and manually implements it (lines 323-350). This should be !self.additional_wants.is_empty() to properly handle the case when additional wants are present.
| if self.additional_wants.is_empty() { | |
| if !self.additional_wants.is_empty() { |
There was a problem hiding this comment.
It looks confusing so I get why copilot doesn't understand, but the condition is not inverted, and the behavior matches the surrounding design.
When additional_wants is empty, add_wants() delegates to negotiate::add_wants and then iterates self.additional_wants (which is empty), so that loop is a harmless no-op. This is the "normal fetch" path.
When additional_wants is non-empty, the code intentionally avoids negotiate::add_wants and instead builds the wants list itself (including want_ref() when supported) and then appends the additional object-id wants. This aligns with the other deliberate behavior you already saw:
mark_complete_and_common_ref()forcesMustNegotiatewhen additional wants exist.one_round()returns an "empty round" and intentionally skips have exchange/negotiation rounds since we want to ask for these objects directly.
So, flipping the condition to !self.additional_wants.is_empty() would make the non-empty case call negotiate::add_wants and then fall through into the manual wants logic, which would duplicate/alter behavior and be wrong.
| .map(crate::remote::fetch::ObjectFilter::to_argument_string) | ||
| .filter(|_| self.additional_wants.is_empty()); |
There was a problem hiding this comment.
The filter is being suppressed when additional_wants is present (line 105). This means users cannot use object filters when fetching specific objects by ID. Unless there's a technical limitation preventing filters with additional wants, this restriction should be removed or clearly documented why filters and additional wants are mutually exclusive.
| .map(crate::remote::fetch::ObjectFilter::to_argument_string) | |
| .filter(|_| self.additional_wants.is_empty()); | |
| .map(crate::remote::fetch::ObjectFilter::to_argument_string); |
There was a problem hiding this comment.
Hard disagree. This is intentional -- additional_wants SHOULD ignore blob filters, since if you specifically want to checkout a commit, you NEED those blobs.
|
Thanks a lot for upstreaming this feature, it's much appreciated! I hope to get to giving it the attention it deserves this weekend. |
There was a problem hiding this comment.
I'd hope to finish the review and changes today. It should be alright, but I hope I will get the negotiation logic.
Due to the lack of a server implementation this is always tricky to test, but maybe that's what the journey tests are there for.
I also wonder what it would take to port them into the gix (crate) test-suite.
Tasks
- review no-network commit
- address all coments
| while remote_section.remove("partialclonefilter").is_some() {} | ||
| while remote_section.remove("promisor").is_some() {} |
There was a problem hiding this comment.
Note to self: Create gix-config remove all.
gix/src/clone/fetch/util.rs
Outdated
| while remote_section.remove("partialclonefilter").is_some() {} | ||
| while remote_section.remove("promisor").is_some() {} | ||
|
|
||
| let partial_clone_filter = ValueName::try_from("partialclonefilter").map_err(|err| Error::ConfigValueName { |
There was a problem hiding this comment.
Create new config-tree keys for partialCloneFilter and promisor.
| return Ok(action); | ||
| } | ||
|
|
||
| Ok(match action { |
There was a problem hiding this comment.
Note to self: try to understand this.
| } | ||
|
|
||
| Ok(( | ||
| negotiate::Round { |
There was a problem hiding this comment.
Note to self: Oh try to understand this.
| /// Errors emitted when parsing [`ObjectFilter`] values from command-line input. | ||
| #[derive(Debug, thiserror::Error)] | ||
| pub enum ParseError { | ||
| /// The provided filter specification was empty. | ||
| #[error("filter specification must not be empty")] | ||
| Empty, | ||
| /// The provided filter specification is not supported yet. | ||
| #[error("unsupported filter specification '{0}'")] | ||
| Unsupported(String), | ||
| /// The provided blob size limit could not be parsed as an integer. | ||
| #[error("invalid blob size limit '{value}'")] | ||
| InvalidBlobLimit { | ||
| /// The string that failed to parse. | ||
| value: String, | ||
| /// The underlying parse error. | ||
| source: ParseIntError, | ||
| }, | ||
| } |
There was a problem hiding this comment.
Note to self: Turn this to gix-error::ValidationError right away.
ddd9fd1 to
1fdf406
Compare
1fdf406 to
2c60401
Compare
|
Apologies, GitHub just did something stupid I didn't expect it to do (I reset this branch back to the |
f7ac294 to
2c60401
Compare
|
Codex (@codex) review |
|
Apologies for the long wait, I won't rest now until this is merged. Feel free to ignore the additional auto-reviews unless of course you think something is worth addressing. I thought they might help me, primarily. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2c604016b2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
| if spec.len() == repo.object_hash().len_in_hex() { | ||
| if let Ok(oid) = ObjectId::from_hex(spec.as_ref()) { | ||
| wants.push(oid); |
There was a problem hiding this comment.
Disambiguate hex-only fetch args before treating them as OIDs
The new argument split treats any token whose length matches the object hash and parses as hex as an object ID, so that token is never passed through refspec handling. This breaks valid hex-only ref names (for example a branch/tag named with 40 hex chars), because fetch will issue a raw want instead of a refspec and can fail on servers that don't allow arbitrary SHA wants. Please avoid auto-converting ambiguous tokens unless the user explicitly asks for object-ID mode or after confirming they are not valid remote refs.
Useful? React with 👍 / 👎.
| }, | ||
| true, | ||
| )) |
There was a problem hiding this comment.
Keep HAVE negotiation enabled for additional wants
When additional_wants is non-empty, one_round() returns is_done = true without sending any HAVEs, so the server cannot learn what objects the client already has. In this mode, requesting a commit that is already local can still trigger a full reachable pack transfer, causing very large redundant downloads and violating the with_additional_wants() contract that existing objects are ignored during negotiation.
Useful? React with 👍 / 👎.
| while extensions_section.remove("partialClone").is_some() {} | ||
|
|
||
| let partial_clone = ValueName::try_from("partialClone").expect("known to be valid"); |
There was a problem hiding this comment.
extensions.partialclone is written/removed as partialClone (camelCase) here, but the rest of the code/tests refer to extensions.partialclone (lowercase). If config key lookup/removal is case-sensitive (or if existing repos already have lowercase keys), this can cause duplicated entries and/or make the persisted setting not discoverable by code expecting extensions.partialclone. Use the canonical lowercase key name consistently for remove/push (i.e., partialclone).
| while extensions_section.remove("partialClone").is_some() {} | |
| let partial_clone = ValueName::try_from("partialClone").expect("known to be valid"); | |
| while extensions_section.remove("partialclone").is_some() {} | |
| let partial_clone = ValueName::try_from("partialclone").expect("known to be valid"); |
| if self.additional_wants.is_empty() { | ||
| return negotiate::one_round( | ||
| self.negotiator.deref_mut(), | ||
| &mut *self.graph, | ||
| state, | ||
| arguments, | ||
| previous_response, | ||
| ); | ||
| } | ||
|
|
||
| Ok(( | ||
| negotiate::Round { | ||
| haves_sent: 0, | ||
| in_vain: 0, | ||
| haves_to_send: 0, | ||
| previous_response_had_at_least_one_in_common: false, | ||
| }, | ||
| true, | ||
| )) |
There was a problem hiding this comment.
When additional_wants is non-empty, negotiation is bypassed entirely (is_done = true with zero haves). This forces the server to generate a pack without any have/ACK negotiation, which can significantly increase transferred data and server load when the client already has many objects locally. Prefer still running negotiate::one_round(...) (and just ensure add_wants() includes additional_wants) so the negotiator can send haves and reduce pack size.
| if self.additional_wants.is_empty() { | |
| return negotiate::one_round( | |
| self.negotiator.deref_mut(), | |
| &mut *self.graph, | |
| state, | |
| arguments, | |
| previous_response, | |
| ); | |
| } | |
| Ok(( | |
| negotiate::Round { | |
| haves_sent: 0, | |
| in_vain: 0, | |
| haves_to_send: 0, | |
| previous_response_had_at_least_one_in_common: false, | |
| }, | |
| true, | |
| )) | |
| negotiate::one_round( | |
| self.negotiator.deref_mut(), | |
| &mut *self.graph, | |
| state, | |
| arguments, | |
| previous_response, | |
| ) |
| if self.additional_wants.is_empty() { | ||
| let mut has_wants = negotiate::add_wants( | ||
| self.objects, | ||
| arguments, | ||
| self.ref_map, | ||
| remote_ref_target_known, | ||
| self.shallow, | ||
| negotiate::make_refmapping_ignore_predicate(self.tags, self.ref_map), | ||
| ); | ||
|
|
||
| for want in self.additional_wants { | ||
| arguments.want(want); | ||
| has_wants = true; | ||
| } | ||
|
|
||
| return has_wants; | ||
| } |
There was a problem hiding this comment.
The loop over self.additional_wants is inside the is_empty() branch, so it never executes. Even though it’s harmless, it’s misleading and makes future modifications risky. Remove the unreachable loop and keep the branch as “pure pass-through”, or invert the condition so the “append additional wants” logic is only in the non-empty case.
| if !arguments.can_use_filter() { | ||
| return Err(Error::MissingServerFeature { | ||
| feature: "filter", | ||
| description: "Partial clone filters require server support configured on the remote server", |
There was a problem hiding this comment.
The error description doesn’t clearly state the concrete failure observed (“remote didn’t advertise filter capability”). Consider rewording to explicitly mention that the server does not advertise (or permit) the filter capability, and that partial clone filtering can’t be used against that remote unless enabled server-side. This will better match the PR intent of providing a “clear error”.
| description: "Partial clone filters require server support configured on the remote server", | |
| description: "Remote does not advertise the `filter` capability; partial clone filtering cannot be used with this remote unless it is enabled server-side", |
| /// Request that the server also sends the objects identified by the given object ids. | ||
| /// | ||
| /// Objects already present locally will be ignored during negotiation. | ||
| pub fn with_additional_wants(mut self, wants: impl IntoIterator<Item = ObjectId>) -> Self { | ||
| for want in wants { | ||
| if !self.additional_wants.contains(&want) { | ||
| self.additional_wants.push(want); | ||
| } | ||
| } | ||
| self | ||
| } |
There was a problem hiding this comment.
The doc comment says objects already present locally are ignored during negotiation, but this builder method only deduplicates within additional_wants and does not check whether the object exists locally. Either adjust the documentation to reflect actual behavior (“deduplicated only”) or implement the “already present locally” check where the object store is available.
| for spec in ref_specs { | ||
| if spec.len() == repo.object_hash().len_in_hex() { | ||
| if let Ok(oid) = ObjectId::from_hex(spec.as_ref()) { | ||
| wants.push(oid); | ||
| continue; | ||
| } | ||
| } | ||
| fetch_refspecs.push(spec); | ||
| } |
There was a problem hiding this comment.
This introduces an ambiguity: any fetch argument that is exactly the hash length and hex-decodable is treated as an object ID, even if the user intended it to be a refspec/refname (e.g., an all-hex branch name of that length). If that ambiguity is acceptable, it should be documented in the CLI help; otherwise consider adding an explicit syntax/flag for object-id wants (or only treating it as an OID when prefixed, e.g. oid:<hex>).
| if self.additional_wants.is_empty() { | ||
| return Ok(action); | ||
| } | ||
|
|
||
| Ok(match action { | ||
| negotiate::Action::MustNegotiate { | ||
| remote_ref_target_known, | ||
| } => negotiate::Action::MustNegotiate { | ||
| remote_ref_target_known, | ||
| }, | ||
| negotiate::Action::NoChange | negotiate::Action::SkipToRefUpdate => negotiate::Action::MustNegotiate { | ||
| remote_ref_target_known: std::iter::repeat_n(true, self.ref_map.mappings.len()).collect(), | ||
| }, | ||
| }) |
There was a problem hiding this comment.
The additional-wants path changes negotiation behavior (forcing MustNegotiate even when the regular code would return NoChange/SkipToRefUpdate). There should be a targeted test covering this path to ensure fetch still behaves correctly (and doesn’t regress) when the repo is otherwise up-to-date but additional object IDs are provided.
Introduce `gix::remote::fetch::ObjectFilter` (currently blob filters only) and plumb it through clone/fetch all the way into the fetch protocol, returning a clear error if the remote doesn’t advertise `filter` capability. Also persist the partial-clone configuration on clone (`remote.<name>.partialclonefilter`, `remote.<name>.promisor`, `extensions.partialclone`) so the repository is marked as a promisor/partial clone in the same way as Git. On the CLI/plumbing side, add `--filter <spec>` to `gix clone`, and allow fetch arguments to be either refspecs or raw object IDs (treated as additional wants). Includes tests for filter parsing and for persisting partial-clone settings during clone. Creates a blobless bare clone using `gix`, creates (using real `git`) a worktree from that, and then using `gix` verify that `--filter=blob:none` is working! Credit: - Original work by Cameron Esfahani <cesfahani@roku.com> - Rewritten for modern Gitoxide by Jake Staehle <jstaehle@roku.com> [Upstream-Status: Appropriate for OSS Release]
…t to avoid the network. Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2c60401 to
28464b5
Compare
Resolves #1046
Hi Sebastian Thiel (@Byron)! I'm Jake Staehle and I work at Roku here with cesfahani. We've had an internal version of this for our internal tools for quite some time, but I recently updated our
gitoxideversion to0.49and had to refactor this code myself for all the changes since0.40-- I would very much rather not do that again, so it's upstreaming time :)Introduce
gix::remote::fetch::ObjectFilter(currently blob filters only) and plumb it through clone/fetch all the way into the fetch protocol, returning a clear error if the remote doesn’t advertisefiltercapability.Also persist the partial-clone configuration on clone (
remote.<name>.partialclonefilter,remote.<name>.promisor,extensions.partialclone) so the repository is marked as a promisor/partial clone in the same way as Git.On the CLI/plumbing side, add
--filter <spec>togix clone, and allow fetch arguments to be either refspecs or raw object IDs (treated as additional wants).Includes tests for filter parsing and for persisting partial-clone settings during clone, as well as a full round-trip test creating a blobless bare clone using
gix, creating (using realgit) a worktree from that, and then usinggixverify that--filter=blob:noneis working!I did create a new repo https://github.com/staehle/gitoxide-testing for that last test to work properly, with a small set of known commits, tags, and blob objects. If you think this is useful, I'd be glad to transfer the repo to you / the
GitoxideLabsgroup!Let me know what you think! Thanks!
Credit: