Skip to content

feat: filters and partial cloning: initial support#2375

Open
Jake Staehle (staehle) wants to merge 3 commits intoGitoxideLabs:mainfrom
staehle:user/jstaehle/partial-clone-filters-oss
Open

feat: filters and partial cloning: initial support#2375
Jake Staehle (staehle) wants to merge 3 commits intoGitoxideLabs:mainfrom
staehle:user/jstaehle/partial-clone-filters-oss

Conversation

@staehle
Copy link

Resolves #1046

Hi Sebastian Thiel (@Byron)! I'm Jake Staehle and I work at Roku here with cesfahani. We've had an internal version of this for our internal tools for quite some time, but I recently updated our gitoxide version to 0.49 and had to refactor this code myself for all the changes since 0.40 -- I would very much rather not do that again, so it's upstreaming time :)

Introduce gix::remote::fetch::ObjectFilter (currently blob filters only) and plumb it through clone/fetch all the way into the fetch protocol, returning a clear error if the remote doesn’t advertise filter capability.

Also persist the partial-clone configuration on clone (remote.<name>.partialclonefilter, remote.<name>.promisor, extensions.partialclone) so the repository is marked as a promisor/partial clone in the same way as Git.

On the CLI/plumbing side, add --filter <spec> to gix clone, and allow fetch arguments to be either refspecs or raw object IDs (treated as additional wants).

Includes tests for filter parsing and for persisting partial-clone settings during clone, as well as a full round-trip test creating a blobless bare clone using gix, creating (using real git) a worktree from that, and then using gix verify that --filter=blob:none is working!

I did create a new repo https://github.com/staehle/gitoxide-testing for that last test to work properly, with a small set of known commits, tags, and blob objects. If you think this is useful, I'd be glad to transfer the repo to you / the GitoxideLabs group!

Let me know what you think! Thanks!

Credit:

Copilot AI review requested due to automatic review settings January 13, 2026 17:41
@staehle Jake Staehle (staehle) force-pushed the user/jstaehle/partial-clone-filters-oss branch 2 times, most recently from 3f30421 to ddd9fd1 Compare January 13, 2026 18:26
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

self.shallow,
negotiate::make_refmapping_ignore_predicate(self.tags, self.ref_map),
)
if self.additional_wants.is_empty() {
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition is inverted. When additional_wants is empty, the code calls negotiate::add_wants() but then still iterates over additional_wants (lines 315-318), which does nothing. When additional_wants is NOT empty, it skips the normal wants logic and manually implements it (lines 323-350). This should be !self.additional_wants.is_empty() to properly handle the case when additional wants are present.

Suggested change
if self.additional_wants.is_empty() {
if !self.additional_wants.is_empty() {

Copilot uses AI. Check for mistakes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks confusing so I get why copilot doesn't understand, but the condition is not inverted, and the behavior matches the surrounding design.

When additional_wants is empty, add_wants() delegates to negotiate::add_wants and then iterates self.additional_wants (which is empty), so that loop is a harmless no-op. This is the "normal fetch" path.

When additional_wants is non-empty, the code intentionally avoids negotiate::add_wants and instead builds the wants list itself (including want_ref() when supported) and then appends the additional object-id wants. This aligns with the other deliberate behavior you already saw:

  • mark_complete_and_common_ref() forces MustNegotiate when additional wants exist.
  • one_round() returns an "empty round" and intentionally skips have exchange/negotiation rounds since we want to ask for these objects directly.

So, flipping the condition to !self.additional_wants.is_empty() would make the non-empty case call negotiate::add_wants and then fall through into the manual wants logic, which would duplicate/alter behavior and be wrong.

Comment on lines +104 to +105
.map(crate::remote::fetch::ObjectFilter::to_argument_string)
.filter(|_| self.additional_wants.is_empty());
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filter is being suppressed when additional_wants is present (line 105). This means users cannot use object filters when fetching specific objects by ID. Unless there's a technical limitation preventing filters with additional wants, this restriction should be removed or clearly documented why filters and additional wants are mutually exclusive.

Suggested change
.map(crate::remote::fetch::ObjectFilter::to_argument_string)
.filter(|_| self.additional_wants.is_empty());
.map(crate::remote::fetch::ObjectFilter::to_argument_string);

Copilot uses AI. Check for mistakes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard disagree. This is intentional -- additional_wants SHOULD ignore blob filters, since if you specifically want to checkout a commit, you NEED those blobs.

@Byron
Copy link
Member

Thanks a lot for upstreaming this feature, it's much appreciated!

I hope to get to giving it the attention it deserves this weekend.

Copy link
Member

@Byron Sebastian Thiel (Byron) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd hope to finish the review and changes today. It should be alright, but I hope I will get the negotiation logic.

Due to the lack of a server implementation this is always tricky to test, but maybe that's what the journey tests are there for.

I also wonder what it would take to port them into the gix (crate) test-suite.

Tasks

  • review no-network commit
  • address all coments

Comment on lines +38 to +39
while remote_section.remove("partialclonefilter").is_some() {}
while remote_section.remove("promisor").is_some() {}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: Create gix-config remove all.

while remote_section.remove("partialclonefilter").is_some() {}
while remote_section.remove("promisor").is_some() {}

let partial_clone_filter = ValueName::try_from("partialclonefilter").map_err(|err| Error::ConfigValueName {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create new config-tree keys for partialCloneFilter and promisor.

return Ok(action);
}

Ok(match action {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: try to understand this.

}

Ok((
negotiate::Round {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: Oh try to understand this.

Comment on lines +32 to +49
/// Errors emitted when parsing [`ObjectFilter`] values from command-line input.
#[derive(Debug, thiserror::Error)]
pub enum ParseError {
/// The provided filter specification was empty.
#[error("filter specification must not be empty")]
Empty,
/// The provided filter specification is not supported yet.
#[error("unsupported filter specification '{0}'")]
Unsupported(String),
/// The provided blob size limit could not be parsed as an integer.
#[error("invalid blob size limit '{value}'")]
InvalidBlobLimit {
/// The string that failed to parse.
value: String,
/// The underlying parse error.
source: ParseIntError,
},
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: Turn this to gix-error::ValidationError right away.

@staehle
Copy link
Author

Jake Staehle (staehle) commented Feb 9, 2026

Apologies, GitHub just did something stupid I didn't expect it to do

(I reset this branch back to the 2c60401 commit you were working on)

@staehle Jake Staehle (staehle) force-pushed the user/jstaehle/partial-clone-filters-oss branch from f7ac294 to 2c60401 Compare February 9, 2026 19:31
@Byron
Copy link
Member

Codex (@codex) review

@Byron
Copy link
Member

Apologies for the long wait, I won't rest now until this is merged. Feel free to ignore the additional auto-reviews unless of course you think something is worth addressing. I thought they might help me, primarily.

Copy link
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2c604016b2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

Comment on lines +63 to +65
if spec.len() == repo.object_hash().len_in_hex() {
if let Ok(oid) = ObjectId::from_hex(spec.as_ref()) {
wants.push(oid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Disambiguate hex-only fetch args before treating them as OIDs

The new argument split treats any token whose length matches the object hash and parses as hex as an object ID, so that token is never passed through refspec handling. This breaks valid hex-only ref names (for example a branch/tag named with 40 hex chars), because fetch will issue a raw want instead of a refspec and can fail on servers that don't allow arbitrary SHA wants. Please avoid auto-converting ambiguous tokens unless the user explicitly asks for object-ID mode or after confirming they are not valid remote refs.

Useful? React with 👍 / 👎.

Comment on lines +375 to +377
},
true,
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep HAVE negotiation enabled for additional wants

When additional_wants is non-empty, one_round() returns is_done = true without sending any HAVEs, so the server cannot learn what objects the client already has. In this mode, requesting a commit that is already local can still trigger a full reachable pack transfer, causing very large redundant downloads and violating the with_additional_wants() contract that existing objects are ignored during negotiation.

Useful? React with 👍 / 👎.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 7 comments.

Comment on lines +48 to +50
while extensions_section.remove("partialClone").is_some() {}

let partial_clone = ValueName::try_from("partialClone").expect("known to be valid");
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extensions.partialclone is written/removed as partialClone (camelCase) here, but the rest of the code/tests refer to extensions.partialclone (lowercase). If config key lookup/removal is case-sensitive (or if existing repos already have lowercase keys), this can cause duplicated entries and/or make the persisted setting not discoverable by code expecting extensions.partialclone. Use the canonical lowercase key name consistently for remove/push (i.e., partialclone).

Suggested change
while extensions_section.remove("partialClone").is_some() {}
let partial_clone = ValueName::try_from("partialClone").expect("known to be valid");
while extensions_section.remove("partialclone").is_some() {}
let partial_clone = ValueName::try_from("partialclone").expect("known to be valid");

Copilot uses AI. Check for mistakes.
Comment on lines +359 to +377
if self.additional_wants.is_empty() {
return negotiate::one_round(
self.negotiator.deref_mut(),
&mut *self.graph,
state,
arguments,
previous_response,
);
}

Ok((
negotiate::Round {
haves_sent: 0,
in_vain: 0,
haves_to_send: 0,
previous_response_had_at_least_one_in_common: false,
},
true,
))
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When additional_wants is non-empty, negotiation is bypassed entirely (is_done = true with zero haves). This forces the server to generate a pack without any have/ACK negotiation, which can significantly increase transferred data and server load when the client already has many objects locally. Prefer still running negotiate::one_round(...) (and just ensure add_wants() includes additional_wants) so the negotiator can send haves and reduce pack size.

Suggested change
if self.additional_wants.is_empty() {
return negotiate::one_round(
self.negotiator.deref_mut(),
&mut *self.graph,
state,
arguments,
previous_response,
);
}
Ok((
negotiate::Round {
haves_sent: 0,
in_vain: 0,
haves_to_send: 0,
previous_response_had_at_least_one_in_common: false,
},
true,
))
negotiate::one_round(
self.negotiator.deref_mut(),
&mut *self.graph,
state,
arguments,
previous_response,
)

Copilot uses AI. Check for mistakes.
Comment on lines +305 to +321
if self.additional_wants.is_empty() {
let mut has_wants = negotiate::add_wants(
self.objects,
arguments,
self.ref_map,
remote_ref_target_known,
self.shallow,
negotiate::make_refmapping_ignore_predicate(self.tags, self.ref_map),
);

for want in self.additional_wants {
arguments.want(want);
has_wants = true;
}

return has_wants;
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop over self.additional_wants is inside the is_empty() branch, so it never executes. Even though it’s harmless, it’s misleading and makes future modifications risky. Remove the unreachable loop and keep the branch as “pure pass-through”, or invert the condition so the “append additional wants” logic is only in the non-empty case.

Copilot uses AI. Check for mistakes.
if !arguments.can_use_filter() {
return Err(Error::MissingServerFeature {
feature: "filter",
description: "Partial clone filters require server support configured on the remote server",
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error description doesn’t clearly state the concrete failure observed (“remote didn’t advertise filter capability”). Consider rewording to explicitly mention that the server does not advertise (or permit) the filter capability, and that partial clone filtering can’t be used against that remote unless enabled server-side. This will better match the PR intent of providing a “clear error”.

Suggested change
description: "Partial clone filters require server support configured on the remote server",
description: "Remote does not advertise the `filter` capability; partial clone filtering cannot be used with this remote unless it is enabled server-side",

Copilot uses AI. Check for mistakes.
Comment on lines +240 to +250
/// Request that the server also sends the objects identified by the given object ids.
///
/// Objects already present locally will be ignored during negotiation.
pub fn with_additional_wants(mut self, wants: impl IntoIterator<Item = ObjectId>) -> Self {
for want in wants {
if !self.additional_wants.contains(&want) {
self.additional_wants.push(want);
}
}
self
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment says objects already present locally are ignored during negotiation, but this builder method only deduplicates within additional_wants and does not check whether the object exists locally. Either adjust the documentation to reflect actual behavior (“deduplicated only”) or implement the “already present locally” check where the object store is available.

Copilot uses AI. Check for mistakes.
Comment on lines +62 to +70
for spec in ref_specs {
if spec.len() == repo.object_hash().len_in_hex() {
if let Ok(oid) = ObjectId::from_hex(spec.as_ref()) {
wants.push(oid);
continue;
}
}
fetch_refspecs.push(spec);
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces an ambiguity: any fetch argument that is exactly the hash length and hex-decodable is treated as an object ID, even if the user intended it to be a refspec/refname (e.g., an all-hex branch name of that length). If that ambiguity is acceptable, it should be documented in the CLI help; otherwise consider adding an explicit syntax/flag for object-id wants (or only treating it as an OID when prefixed, e.g. oid:<hex>).

Copilot uses AI. Check for mistakes.
Comment on lines +288 to +301
if self.additional_wants.is_empty() {
return Ok(action);
}

Ok(match action {
negotiate::Action::MustNegotiate {
remote_ref_target_known,
} => negotiate::Action::MustNegotiate {
remote_ref_target_known,
},
negotiate::Action::NoChange | negotiate::Action::SkipToRefUpdate => negotiate::Action::MustNegotiate {
remote_ref_target_known: std::iter::repeat_n(true, self.ref_map.mappings.len()).collect(),
},
})
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional-wants path changes negotiation behavior (forcing MustNegotiate even when the regular code would return NoChange/SkipToRefUpdate). There should be a targeted test covering this path to ensure fetch still behaves correctly (and doesn’t regress) when the repo is otherwise up-to-date but additional object IDs are provided.

Copilot uses AI. Check for mistakes.
Jake Staehle (staehle) and others added 3 commits March 26, 2026 13:26
Introduce `gix::remote::fetch::ObjectFilter` (currently blob filters only)
and plumb it through clone/fetch all the way into the fetch protocol,
returning a clear error if the remote doesn’t advertise `filter` capability.

Also persist the partial-clone configuration on clone
(`remote.<name>.partialclonefilter`, `remote.<name>.promisor`,
`extensions.partialclone`) so the repository is marked as a
promisor/partial clone in the same way as Git.

On the CLI/plumbing side, add `--filter <spec>` to `gix clone`, and allow
fetch arguments to be either refspecs or raw object IDs
(treated as additional wants).

Includes tests for filter parsing and for persisting partial-clone settings during clone.

Creates a blobless bare clone using `gix`,
creates (using real `git`) a worktree from that,
and then using `gix` verify that `--filter=blob:none` is working!

Credit:
 - Original work by Cameron Esfahani <cesfahani@roku.com>
 - Rewritten for modern Gitoxide by Jake Staehle <jstaehle@roku.com>

[Upstream-Status: Appropriate for OSS Release]
…t to avoid the network.

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
@Byron Sebastian Thiel (Byron) force-pushed the user/jstaehle/partial-clone-filters-oss branch from 2c60401 to 28464b5 Compare March 26, 2026 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Minimal implementation of partial clones & promisor objects

3 participants