Skip to content

Redesigned Components iterator to use front and back indexing instead mutating and subslicing path field#156496

Open
asder8215 wants to merge 2 commits into
rust-lang:mainfrom
asder8215:components_rewrite
Open

Redesigned Components iterator to use front and back indexing instead mutating and subslicing path field#156496
asder8215 wants to merge 2 commits into
rust-lang:mainfrom
asder8215:components_rewrite

Conversation

@asder8215
Copy link
Copy Markdown
Contributor

@asder8215 asder8215 commented May 12, 2026

This PR entirely changes how Components<'_> is implemented. Currently, the Components<'_> iterator 'consumes' components through mutating its path field to a subslice that presents the left over unconsumed path components (this consumed path component is what's returned in Components::next or Components::next_back). However, this PR keeps the path field alive/unmodified and uses front and back indexing strategy to extract consumed/unconsumed components.

This PR benefits implementations like Components::as_path, which is pretty used is multiple areas of the standard library. Previously, Components<'_> iterator was required to clone inside the function to present the unconsumed path because our original Component<'_> consuming behavior on path will not allow the returned &'a Path from Components::as_path to last after a Components::next or Components::next_back call. Due to the current implementation of Components iterator has a size of 64 bytes, if you're using Components::as_path after each Components::next/Components::next_back, then it's pretty unfortunate to be cloning 64 bytes again and again, especially if each of your path components are a few bytes (e.g., "foo/bar/baz").

On the point of size, with the indexing strategy, this PR has further optimized the size of Components<'_> from 64 bytes -> 40 bytes since a large chunk of the Components<'_> was taken up by the Option<Prefix> (this takes up 40 bytes), which we indicate that a prefix exists/unconsumed through calling parse_prefix on the path field (which I think is inexpensive since these Windows prefix length are not that long I believe) and seeing if our first_comp field is Some(_) or None (front index is encoded with prefix length if it exists, so we don't need to parse prefix again within Components::next or Components::next_back).

Due to not having the prefix Option<Prefix> field inside Components<'_> anymore, all the prefix functions in Components<'_> have been removed in favor of calling parse_prefix, Prefix::is_verbatim, Prefix::is_drive, etc.

I'm curious if this redesign of Components<'_> improves Path equality as pointed out by @clarfonthey in #154521 with Path equality being slow; I haven't benchmarked this though.

Right now, when I tested it locally on my PC (Fedora OS), it passed all the standard library tests and rust analyzer didn't crash on me (had a few crash reports coming from rust analyzer early on when I messed around with Components<'_> dealing something with threads using Path::components, but now that's all resolved). I have not tested this on Windows yet, and I would probably need someone to help me test on this platform as my Windows VM is not working properly to run the standard library test suite.

There's a lot of things being done here, and possibly there may be better approaches or ways I could improve this implementation or write the code in a neater way here. I am open to any advice or feedback on this approach.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 12, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented May 12, 2026

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: @ChrisDenton, libs
  • @ChrisDenton, libs expanded to 8 candidates

@rustbot

This comment has been minimized.

@asder8215 asder8215 force-pushed the components_rewrite branch from 1627e2f to 33e69e1 Compare May 12, 2026 09:09
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented May 12, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@rust-log-analyzer

This comment has been minimized.

@asder8215 asder8215 force-pushed the components_rewrite branch from 33e69e1 to ed9d33d Compare May 12, 2026 17:05
@rust-log-analyzer

This comment has been minimized.

@asder8215 asder8215 force-pushed the components_rewrite branch from ed9d33d to 0b0f84c Compare May 12, 2026 17:19
@rust-log-analyzer

This comment has been minimized.

… of mutating and subslicing path field; as a result, Components iterator memory size goes from 64 bytes to 40 bytes and as_path does not use cloning at all
@asder8215 asder8215 force-pushed the components_rewrite branch from 0b0f84c to 8ed33ea Compare May 12, 2026 22:05
@asder8215
Copy link
Copy Markdown
Contributor Author

asder8215 commented May 13, 2026

I benchmarked this Components<'_> implementation with a path involving 1000 path components (each components is the character "a" 64 times). The following cases are benchmarked are:

fn components_iter(path: &Path) {
    let comps = path.components();
    let mut comp_count = 0;
    for comp in comps {
        comp_count += 1;
    }
}

fn components_next_iter(path: &Path) {
    let mut comps = path.iter();
    let mut comp_count = 0;
    while let Some(comp) = comps.next() {
        // let path = comps.as_path();
        comp_count += 1;
    }
}

fn components_next_back_iter(path: &Path) {
    let mut comps = path.iter();
    let mut comp_count = 0;
    while let Some(comp) = comps.next_back() {
        // let path = comps.as_path();
        comp_count += 1;
    }
}

fn path_iter(path: &Path) {
    let comps = path.iter();
    let mut comp_count = 0;
    for comp in comps {
        comp_count += 1;
    }
}

fn as_path_iter(path: &Path) {
    let mut comps = path.iter();
    let mut comp_count = 0;
    while let Some(comp) = comps.next() {
        let path = comps.as_path();
        comp_count += 1;
    }
}

fn eq_comps(path: &Path, other_path: &Path) {
    path.components() == other_path.components();
}

fn compare_comps(path: &Path, other_path: &Path) {
    let comp = path.components();
    let other_comp = path.components();
    comp.cmp(other_comp);
}

(Similar code with Components<'_> extracted from this PR and placed in a separate benchmark file)

Benchmark function:

fn bench_components(c: &mut Criterion) {
    let mut path = String::from("/");
    let chars = vec!["a"; 64];
    let mut str = chars.join("");
    str.push('/');

    for i in 0..1000 {
        path.push_str(&str);
    }

    c.bench_function("Std Components", |b| {
        b.iter(|| {
            // Use black_box to prevent compiler optimizations from 
            // skipping the code you want to measure
            black_box(components_iter(black_box(path.as_ref())))
        })
    });

    c.bench_function("Std Components Next", |b| {
        b.iter(|| {
            // Use black_box to prevent compiler optimizations from 
            // skipping the code you want to measure
            black_box(components_next_iter(black_box(path.as_ref())))
        })
    });

    c.bench_function("Std Components Next Back", |b| {
        b.iter(|| {
            // Use black_box to prevent compiler optimizations from 
            // skipping the code you want to measure
            black_box(components_next_back_iter(black_box(path.as_ref())))
        })
    });

    c.bench_function("Std Path Iter", |b| {
        b.iter(|| {
            // Use black_box to prevent compiler optimizations from 
            // skipping the code you want to measure
            black_box(path_iter(black_box(path.as_ref())))
        })
    });

    c.bench_function("Std As Path Iter", |b| {
        b.iter(|| {
            // Use black_box to prevent compiler optimizations from 
            // skipping the code you want to measure
            black_box(as_path_iter(black_box(path.as_ref())))
        })
    });

    c.bench_function("Std Eq Comps", |b| {
        b.iter(|| {
            // Use black_box to prevent compiler optimizations from 
            // skipping the code you want to measure
            black_box(eq_comps(black_box(path.as_ref()), black_box(path.as_ref())))
        })
    });

    c.bench_function("Std Compare Comps", |b| {
        b.iter(|| {
            // Use black_box to prevent compiler optimizations from 
            // skipping the code you want to measure
            black_box(compare_comps(black_box(path.as_ref()), black_box(path.as_ref())))
        })
    });
}

(Similar benchmark code with this PR, ending with "Rewrite")

The benchmark timings:

Components Rewrite      time:   [34.542 µs 34.625 µs 34.719 µs]
                        change: [−0.2577% −0.0247% +0.1992%] (p = 0.84 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  7 (7.00%) high mild
  9 (9.00%) high severe

Components Next Rewrite time:   [34.003 µs 34.032 µs 34.070 µs]
                        change: [−0.0358% +0.3069% +0.6599%] (p = 0.08 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  8 (8.00%) high severe

Components Next Back Rewrite
                        time:   [33.886 µs 33.901 µs 33.919 µs]
                        change: [−0.5481% −0.5050% −0.4550%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

Path Iter Rewrite       time:   [33.949 µs 34.025 µs 34.107 µs]
                        change: [+0.2680% +0.6786% +1.1680%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe

As Path Iter Rewrite    time:   [34.741 µs 34.801 µs 34.881 µs]
                        change: [+0.5618% +0.9272% +1.3623%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) high mild
  12 (12.00%) high severe

Eq Comps Rewrite        time:   [3.7759 ns 3.7849 ns 3.7968 ns]
                        change: [−1.5996% −1.0255% −0.5037%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe

Compare Comps Rewrite   time:   [70.722 µs 70.752 µs 70.790 µs]
                        change: [−1.0925% −0.6107% −0.2103%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

Std Components          time:   [21.290 µs 21.355 µs 21.427 µs]
                        change: [−0.8364% −0.2580% +0.2856%] (p = 0.38 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

Std Components Next     time:   [21.516 µs 21.702 µs 21.919 µs]
                        change: [+1.6392% +2.1751% +2.7898%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

Std Components Next Back
                        time:   [35.998 µs 36.013 µs 36.032 µs]
                        change: [−1.5651% −1.0293% −0.5762%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  6 (6.00%) high mild
  9 (9.00%) high severe

Std Path Iter           time:   [21.254 µs 21.311 µs 21.382 µs]
                        change: [+1.5226% +2.8646% +4.3798%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

Std As Path Iter        time:   [81.911 µs 81.992 µs 82.093 µs]
                        change: [+0.3076% +0.7163% +1.2456%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe

Std Eq Comps            time:   [543.10 ns 544.15 ns 545.07 ns]
                        change: [−0.8566% −0.6767% −0.4746%] (p = 0.00 < 0.05)
                        Change within noise threshold.

Std Compare Comps       time:   [46.165 µs 46.262 µs 46.356 µs]
                        change: [+0.0378% +0.5060% +1.0238%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

Edit: I realized from Components<'_> that it should check for equality of subslices in the fast path with the front and back indices with this PR's approach (not that it would cause it to have an incorrect end result, but it would correctly check in the fast path for equality between two Components<'_> iterators through the subslices they are referring to). I updated benchmarking for that Comps Equality, but everything else remains unaffected by this change.

…ity, added safety comments, and check for root dir after Prefix component (e.g., '\\?\checkout\src\tools' should produce Prefix, RootDir, Normal, Normal, None, ...) in Components::parse_single_component
@asder8215 asder8215 force-pushed the components_rewrite branch from 2151b8f to 83cdbed Compare May 13, 2026 22:21
@asder8215
Copy link
Copy Markdown
Contributor Author

This is the benchmark results without black_box:

Components Rewrite      time:   [35.153 µs 35.373 µs 35.610 µs]
                        change: [+2.4138% +3.5551% +5.0408%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  11 (11.00%) high mild
  2 (2.00%) high severe

Components Next Rewrite time:   [34.616 µs 34.716 µs 34.830 µs]
                        change: [+1.9506% +2.6634% +3.5368%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe

Components Next Back Rewrite
                        time:   [35.194 µs 35.291 µs 35.390 µs]
                        change: [+4.0758% +4.2776% +4.4781%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild

Path Iter Rewrite       time:   [34.765 µs 34.844 µs 34.934 µs]
                        change: [+1.4297% +1.9768% +2.4675%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

As Path Iter Rewrite    time:   [35.900 µs 36.098 µs 36.319 µs]
                        change: [+3.1306% +4.1251% +5.1819%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  9 (9.00%) high mild
  3 (3.00%) high severe

Eq Comps Rewrite        time:   [3.6167 ns 3.6299 ns 3.6451 ns]
                        change: [−4.7385% −4.3880% −4.0336%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

Compare Comps Rewrite   time:   [72.708 µs 73.231 µs 73.832 µs]
                        change: [+3.6861% +4.3403% +5.0362%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Std Components          time:   [20.746 µs 20.845 µs 20.962 µs]
                        change: [−3.7883% −3.0503% −2.1791%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

Std Components Next     time:   [20.897 µs 21.043 µs 21.222 µs]
                        change: [−2.5946% −1.7931% −1.0059%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  9 (9.00%) high mild

Std Components Next Back
                        time:   [36.440 µs 36.720 µs 37.095 µs]
                        change: [+0.7005% +1.1991% +1.9175%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

Std Path Iter           time:   [20.753 µs 20.901 µs 21.126 µs]
                        change: [−2.7912% −2.3811% −1.8853%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

Std As Path Iter        time:   [82.370 µs 82.803 µs 83.267 µs]
                        change: [−0.9980% −0.3092% +0.3951%] (p = 0.39 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

Std Eq Comps            time:   [554.22 ns 556.83 ns 559.74 ns]
                        change: [+2.2678% +2.9113% +3.5816%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Std Compare Comps       time:   [46.039 µs 46.350 µs 46.689 µs]
                        change: [−0.2813% +0.8170% +2.0435%] (p = 0.16 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe

The results of these benchmark seem to indicate that the performance of Components::as_path, Components::next_back and Components<'_> equality (which uses Components::next_back underneath the hood) from this PR implementation of Components<'_> is better than the current Components<'_> implementation. Of course, there's the trade off in that Components::next and as a result comparing components (which uses Components::next underneath the hood) in this PR performs worse than current Components<'_>.

I do think the trade-off is pretty worth it though since there's a lot of Path equality (rather than Path comparison) occurring in the std library that would benefit from this performance improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants