feat: filters and partial cloning: initial support by staehle · Pull Request #2375 · GitoxideLabs/gitoxide

Jake Staehle (staehle) · 2026-01-13T17:41:35Z

Resolves #1046

Hi Sebastian Thiel (@Byron)! I'm Jake Staehle and I work at Roku here with cesfahani. We've had an internal version of this for our internal tools for quite some time, but I recently updated our gitoxide version to 0.49 and had to refactor this code myself for all the changes since 0.40 -- I would very much rather not do that again, so it's upstreaming time :)

Introduce gix::remote::fetch::ObjectFilter (currently blob filters only) and plumb it through clone/fetch all the way into the fetch protocol, returning a clear error if the remote doesn’t advertise filter capability.

Also persist the partial-clone configuration on clone (remote.<name>.partialclonefilter, remote.<name>.promisor, extensions.partialclone) so the repository is marked as a promisor/partial clone in the same way as Git.

On the CLI/plumbing side, add --filter <spec> to gix clone, and allow fetch arguments to be either refspecs or raw object IDs (treated as additional wants).

Includes tests for filter parsing and for persisting partial-clone settings during clone, as well as a full round-trip test creating a blobless bare clone using gix, creating (using real git) a worktree from that, and then using gix verify that --filter=blob:none is working!

I did create a new repo https://github.com/staehle/gitoxide-testing for that last test to work properly, with a small set of known commits, tags, and blob objects. If you think this is useful, I'd be glad to transfer the repo to you / the GitoxideLabs group!

Let me know what you think! Thanks!

Credit:

Original work by Cameron Esfahani cesfahani@roku.com
Rewritten for modern Gitoxide by Jake Staehle jstaehle@roku.com

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot · 2026-01-13T17:45:30Z

gix/src/remote/connection/fetch/receive_pack.rs

-            self.shallow,
-            negotiate::make_refmapping_ignore_predicate(self.tags, self.ref_map),
-        )
+        if self.additional_wants.is_empty() {


The condition is inverted. When additional_wants is empty, the code calls negotiate::add_wants() but then still iterates over additional_wants (lines 315-318), which does nothing. When additional_wants is NOT empty, it skips the normal wants logic and manually implements it (lines 323-350). This should be !self.additional_wants.is_empty() to properly handle the case when additional wants are present.

Suggested change

if self.additional_wants.is_empty() {

if !self.additional_wants.is_empty() {

It looks confusing so I get why copilot doesn't understand, but the condition is not inverted, and the behavior matches the surrounding design.

When additional_wants is empty, add_wants() delegates to negotiate::add_wants and then iterates self.additional_wants (which is empty), so that loop is a harmless no-op. This is the "normal fetch" path.

When additional_wants is non-empty, the code intentionally avoids negotiate::add_wants and instead builds the wants list itself (including want_ref() when supported) and then appends the additional object-id wants. This aligns with the other deliberate behavior you already saw:

mark_complete_and_common_ref() forces MustNegotiate when additional wants exist.

one_round() returns an "empty round" and intentionally skips have exchange/negotiation rounds since we want to ask for these objects directly.

So, flipping the condition to !self.additional_wants.is_empty() would make the non-empty case call negotiate::add_wants and then fall through into the manual wants logic, which would duplicate/alter behavior and be wrong.

Copilot · 2026-01-13T17:45:30Z

gix/src/remote/connection/fetch/receive_pack.rs

+            .map(crate::remote::fetch::ObjectFilter::to_argument_string)
+            .filter(|_| self.additional_wants.is_empty());


The filter is being suppressed when additional_wants is present (line 105). This means users cannot use object filters when fetching specific objects by ID. Unless there's a technical limitation preventing filters with additional wants, this restriction should be removed or clearly documented why filters and additional wants are mutually exclusive.

Suggested change

.map(crate::remote::fetch::ObjectFilter::to_argument_string)

.filter(|_| self.additional_wants.is_empty());

.map(crate::remote::fetch::ObjectFilter::to_argument_string);

Hard disagree. This is intentional -- additional_wants SHOULD ignore blob filters, since if you specifically want to checkout a commit, you NEED those blobs.

Sebastian Thiel (Byron) · 2026-01-19T19:47:13Z

Thanks a lot for upstreaming this feature, it's much appreciated!

I hope to get to giving it the attention it deserves this weekend.

Sebastian Thiel (Byron)

I'd hope to finish the review and changes today. It should be alright, but I hope I will get the negotiation logic.

Due to the lack of a server implementation this is always tricky to test, but maybe that's what the journey tests are there for.

I also wonder what it would take to port them into the gix (crate) test-suite.

Tasks

review no-network commit
address all coments

Sebastian Thiel (Byron) · 2026-02-01T16:20:15Z

gix/src/clone/fetch/util.rs

+        while remote_section.remove("partialclonefilter").is_some() {}
+        while remote_section.remove("promisor").is_some() {}


Note to self: Create gix-config remove all.

Sebastian Thiel (Byron) · 2026-02-01T16:27:01Z

gix/src/clone/fetch/util.rs

+        while remote_section.remove("partialclonefilter").is_some() {}
+        while remote_section.remove("promisor").is_some() {}
+
+        let partial_clone_filter = ValueName::try_from("partialclonefilter").map_err(|err| Error::ConfigValueName {


Create new config-tree keys for partialCloneFilter and promisor.

Sebastian Thiel (Byron) · 2026-02-01T16:34:39Z

gix/src/remote/connection/fetch/receive_pack.rs

+            return Ok(action);
+        }
+
+        Ok(match action {


Note to self: try to understand this.

Sebastian Thiel (Byron) · 2026-02-01T16:40:12Z

gix/src/remote/connection/fetch/receive_pack.rs

+        }
+
+        Ok((
+            negotiate::Round {


Note to self: Oh try to understand this.

Sebastian Thiel (Byron) · 2026-02-01T16:40:57Z

gix/src/remote/fetch/object_filter.rs

+/// Errors emitted when parsing [`ObjectFilter`] values from command-line input.
+#[derive(Debug, thiserror::Error)]
+pub enum ParseError {
+    /// The provided filter specification was empty.
+    #[error("filter specification must not be empty")]
+    Empty,
+    /// The provided filter specification is not supported yet.
+    #[error("unsupported filter specification '{0}'")]
+    Unsupported(String),
+    /// The provided blob size limit could not be parsed as an integer.
+    #[error("invalid blob size limit '{value}'")]
+    InvalidBlobLimit {
+        /// The string that failed to parse.
+        value: String,
+        /// The underlying parse error.
+        source: ParseIntError,
+    },
+}


Note to self: Turn this to gix-error::ValidationError right away.

Jake Staehle (staehle) · 2026-02-09T19:29:17Z

Apologies, GitHub just did something stupid I didn't expect it to do

(I reset this branch back to the 2c60401 commit you were working on)

Sebastian Thiel (Byron) · 2026-03-25T12:40:47Z

Codex (@codex) review

Sebastian Thiel (Byron) · 2026-03-25T12:42:29Z

Apologies for the long wait, I won't rest now until this is merged. Feel free to ignore the additional auto-reviews unless of course you think something is worth addressing. I thought they might help me, primarily.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2c604016b2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

chatgpt-codex-connector · 2026-03-25T12:46:46Z

gitoxide-core/src/repository/fetch.rs

+            if spec.len() == repo.object_hash().len_in_hex() {
+                if let Ok(oid) = ObjectId::from_hex(spec.as_ref()) {
+                    wants.push(oid);


Disambiguate hex-only fetch args before treating them as OIDs

The new argument split treats any token whose length matches the object hash and parses as hex as an object ID, so that token is never passed through refspec handling. This breaks valid hex-only ref names (for example a branch/tag named with 40 hex chars), because fetch will issue a raw want instead of a refspec and can fail on servers that don't allow arbitrary SHA wants. Please avoid auto-converting ambiguous tokens unless the user explicitly asks for object-ID mode or after confirming they are not valid remote refs.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-25T12:46:46Z

gix/src/remote/connection/fetch/receive_pack.rs

+            },
+            true,
+        ))


Keep HAVE negotiation enabled for additional wants

When additional_wants is non-empty, one_round() returns is_done = true without sending any HAVEs, so the server cannot learn what objects the client already has. In this mode, requesting a commit that is already local can still trigger a full reachable pack transfer, causing very large redundant downloads and violating the with_additional_wants() contract that existing objects are ignored during negotiation.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 7 comments.

Copilot · 2026-03-25T12:48:47Z

gix/src/clone/fetch/util.rs

+        while extensions_section.remove("partialClone").is_some() {}
+
+        let partial_clone = ValueName::try_from("partialClone").expect("known to be valid");


extensions.partialclone is written/removed as partialClone (camelCase) here, but the rest of the code/tests refer to extensions.partialclone (lowercase). If config key lookup/removal is case-sensitive (or if existing repos already have lowercase keys), this can cause duplicated entries and/or make the persisted setting not discoverable by code expecting extensions.partialclone. Use the canonical lowercase key name consistently for remove/push (i.e., partialclone).

Suggested change

while extensions_section.remove("partialClone").is_some() {}

let partial_clone = ValueName::try_from("partialClone").expect("known to be valid");

while extensions_section.remove("partialclone").is_some() {}

let partial_clone = ValueName::try_from("partialclone").expect("known to be valid");

Copilot · 2026-03-25T12:48:48Z

gix/src/remote/connection/fetch/receive_pack.rs

+        if self.additional_wants.is_empty() {
+            return negotiate::one_round(
+                self.negotiator.deref_mut(),
+                &mut *self.graph,
+                state,
+                arguments,
+                previous_response,
+            );
+        }
+
+        Ok((
+            negotiate::Round {
+                haves_sent: 0,
+                in_vain: 0,
+                haves_to_send: 0,
+                previous_response_had_at_least_one_in_common: false,
+            },
+            true,
+        ))


When additional_wants is non-empty, negotiation is bypassed entirely (is_done = true with zero haves). This forces the server to generate a pack without any have/ACK negotiation, which can significantly increase transferred data and server load when the client already has many objects locally. Prefer still running negotiate::one_round(...) (and just ensure add_wants() includes additional_wants) so the negotiator can send haves and reduce pack size.

Suggested change

if self.additional_wants.is_empty() {

return negotiate::one_round(

self.negotiator.deref_mut(),

&mut *self.graph,

state,

arguments,

previous_response,

);

}

Ok((

negotiate::Round {

haves_sent: 0,

in_vain: 0,

haves_to_send: 0,

previous_response_had_at_least_one_in_common: false,

},

true,

))

negotiate::one_round(

self.negotiator.deref_mut(),

&mut *self.graph,

state,

arguments,

previous_response,

)

Copilot · 2026-03-25T12:48:48Z

gix/src/remote/connection/fetch/receive_pack.rs

+        if self.additional_wants.is_empty() {
+            let mut has_wants = negotiate::add_wants(
+                self.objects,
+                arguments,
+                self.ref_map,
+                remote_ref_target_known,
+                self.shallow,
+                negotiate::make_refmapping_ignore_predicate(self.tags, self.ref_map),
+            );
+
+            for want in self.additional_wants {
+                arguments.want(want);
+                has_wants = true;
+            }
+
+            return has_wants;
+        }


The loop over self.additional_wants is inside the is_empty() branch, so it never executes. Even though it’s harmless, it’s misleading and makes future modifications risky. Remove the unreachable loop and keep the branch as “pure pass-through”, or invert the condition so the “append additional wants” logic is only in the non-empty case.

Copilot · 2026-03-25T12:48:48Z

gix-protocol/src/fetch/function.rs

+        if !arguments.can_use_filter() {
+            return Err(Error::MissingServerFeature {
+                feature: "filter",
+                description: "Partial clone filters require server support configured on the remote server",


The error description doesn’t clearly state the concrete failure observed (“remote didn’t advertise filter capability”). Consider rewording to explicitly mention that the server does not advertise (or permit) the filter capability, and that partial clone filtering can’t be used against that remote unless enabled server-side. This will better match the PR intent of providing a “clear error”.

Suggested change

description: "Partial clone filters require server support configured on the remote server",

description: "Remote does not advertise the `filter` capability; partial clone filtering cannot be used with this remote unless it is enabled server-side",

Copilot · 2026-03-25T12:48:49Z

gix/src/remote/connection/fetch/mod.rs

+    /// Request that the server also sends the objects identified by the given object ids.
+    ///
+    /// Objects already present locally will be ignored during negotiation.
+    pub fn with_additional_wants(mut self, wants: impl IntoIterator<Item = ObjectId>) -> Self {
+        for want in wants {
+            if !self.additional_wants.contains(&want) {
+                self.additional_wants.push(want);
+            }
+        }
+        self
+    }


The doc comment says objects already present locally are ignored during negotiation, but this builder method only deduplicates within additional_wants and does not check whether the object exists locally. Either adjust the documentation to reflect actual behavior (“deduplicated only”) or implement the “already present locally” check where the object store is available.

Copilot · 2026-03-25T12:48:49Z

gitoxide-core/src/repository/fetch.rs

+        for spec in ref_specs {
+            if spec.len() == repo.object_hash().len_in_hex() {
+                if let Ok(oid) = ObjectId::from_hex(spec.as_ref()) {
+                    wants.push(oid);
+                    continue;
+                }
+            }
+            fetch_refspecs.push(spec);
+        }


This introduces an ambiguity: any fetch argument that is exactly the hash length and hex-decodable is treated as an object ID, even if the user intended it to be a refspec/refname (e.g., an all-hex branch name of that length). If that ambiguity is acceptable, it should be documented in the CLI help; otherwise consider adding an explicit syntax/flag for object-id wants (or only treating it as an OID when prefixed, e.g. oid:<hex>).

Copilot · 2026-03-25T12:48:49Z

gix/src/remote/connection/fetch/receive_pack.rs

+        if self.additional_wants.is_empty() {
+            return Ok(action);
+        }
+
+        Ok(match action {
+            negotiate::Action::MustNegotiate {
+                remote_ref_target_known,
+            } => negotiate::Action::MustNegotiate {
+                remote_ref_target_known,
+            },
+            negotiate::Action::NoChange | negotiate::Action::SkipToRefUpdate => negotiate::Action::MustNegotiate {
+                remote_ref_target_known: std::iter::repeat_n(true, self.ref_map.mappings.len()).collect(),
+            },
+        })


The additional-wants path changes negotiation behavior (forcing MustNegotiate even when the regular code would return NoChange/SkipToRefUpdate). There should be a targeted test covering this path to ensure fetch still behaves correctly (and doesn’t regress) when the repo is otherwise up-to-date but additional object IDs are provided.

Introduce `gix::remote::fetch::ObjectFilter` (currently blob filters only) and plumb it through clone/fetch all the way into the fetch protocol, returning a clear error if the remote doesn’t advertise `filter` capability. Also persist the partial-clone configuration on clone (`remote.<name>.partialclonefilter`, `remote.<name>.promisor`, `extensions.partialclone`) so the repository is marked as a promisor/partial clone in the same way as Git. On the CLI/plumbing side, add `--filter <spec>` to `gix clone`, and allow fetch arguments to be either refspecs or raw object IDs (treated as additional wants). Includes tests for filter parsing and for persisting partial-clone settings during clone. Creates a blobless bare clone using `gix`, creates (using real `git`) a worktree from that, and then using `gix` verify that `--filter=blob:none` is working! Credit: - Original work by Cameron Esfahani <cesfahani@roku.com> - Rewritten for modern Gitoxide by Jake Staehle <jstaehle@roku.com> [Upstream-Status: Appropriate for OSS Release]

-

…t to avoid the network. Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings January 13, 2026 17:41

Copilot started reviewing on behalf of Jake Staehle (staehle) January 13, 2026 17:42 View session

Jake Staehle (staehle) force-pushed the user/jstaehle/partial-clone-filters-oss branch 2 times, most recently from 3f30421 to ddd9fd1 Compare January 13, 2026 18:26

Copilot AI reviewed Jan 13, 2026

View reviewed changes

Sebastian Thiel (Byron) self-assigned this Feb 1, 2026

Sebastian Thiel (Byron) reviewed Feb 1, 2026

View reviewed changes

Sebastian Thiel (Byron) force-pushed the user/jstaehle/partial-clone-filters-oss branch from ddd9fd1 to 1fdf406 Compare February 1, 2026 16:47

LOCI-AI (loci-dev) mentioned this pull request Feb 1, 2026

UPSTREAM PR #2375: feat: filters and partial cloning: initial support auroralabs-loci/gitoxide#13

Open

Sebastian Thiel (Byron) force-pushed the user/jstaehle/partial-clone-filters-oss branch from 1fdf406 to 2c60401 Compare February 1, 2026 18:19

LOCI-AI (loci-dev) mentioned this pull request Feb 2, 2026

UPSTREAM PR #2375: feat: filters and partial cloning: initial support auroralabs-loci/gitoxide#14

Open

Sebastian Thiel (Byron) mentioned this pull request Feb 8, 2026

Reorganise Error-Handling around Exn #2351

Open

72 tasks

Jake Staehle (staehle) force-pushed the user/jstaehle/partial-clone-filters-oss branch from f7ac294 to 2c60401 Compare February 9, 2026 19:31

Sebastian Thiel (Byron) requested a review from Copilot March 25, 2026 12:40

Copilot started reviewing on behalf of Sebastian Thiel (Byron) March 25, 2026 12:41 View session

chatgpt-codex-connector bot reviewed Mar 25, 2026

View reviewed changes

Copilot AI reviewed Mar 25, 2026

View reviewed changes

Jake Staehle (staehle) and others added 3 commits March 26, 2026 13:26

refactor

bb48bae

-

Turn https://github.com/staehle/gitoxide-testing` into a local scrip…

28464b5

…t to avoid the network. Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

Sebastian Thiel (Byron) force-pushed the user/jstaehle/partial-clone-filters-oss branch from 2c60401 to 28464b5 Compare March 26, 2026 05:26

	if self.additional_wants.is_empty() {
	if !self.additional_wants.is_empty() {

		.map(crate::remote::fetch::ObjectFilter::to_argument_string)
		.filter(\|_\| self.additional_wants.is_empty());

		while remote_section.remove("partialclonefilter").is_some() {}
		while remote_section.remove("promisor").is_some() {}

		while extensions_section.remove("partialClone").is_some() {}

		let partial_clone = ValueName::try_from("partialClone").expect("known to be valid");

	description: "Partial clone filters require server support configured on the remote server",
	description: "Remote does not advertise the `filter` capability; partial clone filtering cannot be used with this remote unless it is enabled server-side",

Uh oh!

Conversation

Jake Staehle (staehle) commented Jan 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Jake Staehle (staehle) Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Jake Staehle (staehle) Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Sebastian Thiel (Byron) commented Jan 19, 2026

Uh oh!

Sebastian Thiel (Byron) left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Tasks

Uh oh!

Sebastian Thiel (Byron) Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

Sebastian Thiel (Byron) Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

Sebastian Thiel (Byron) Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

Sebastian Thiel (Byron) Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

Sebastian Thiel (Byron) Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

Jake Staehle (staehle) commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sebastian Thiel (Byron) commented Mar 25, 2026

Uh oh!

Sebastian Thiel (Byron) commented Mar 25, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Sebastian Thiel (Byron) left a comment •

edited

Loading

Jake Staehle (staehle) commented Feb 9, 2026 •

edited

Loading