Redesigned Components iterator to use front and back indexing instead mutating and subslicing path field#156496
Redesigned Components iterator to use front and back indexing instead mutating and subslicing path field#156496asder8215 wants to merge 2 commits into
Components iterator to use front and back indexing instead mutating and subslicing path field#156496Conversation
|
rustbot has assigned @Mark-Simulacrum. Use Why was this reviewer chosen?The reviewer was selected based on:
|
This comment has been minimized.
This comment has been minimized.
1627e2f to
33e69e1
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
This comment has been minimized.
This comment has been minimized.
33e69e1 to
ed9d33d
Compare
This comment has been minimized.
This comment has been minimized.
ed9d33d to
0b0f84c
Compare
This comment has been minimized.
This comment has been minimized.
… of mutating and subslicing path field; as a result, Components iterator memory size goes from 64 bytes to 40 bytes and as_path does not use cloning at all
0b0f84c to
8ed33ea
Compare
|
I benchmarked this fn components_iter(path: &Path) {
let comps = path.components();
let mut comp_count = 0;
for comp in comps {
comp_count += 1;
}
}
fn components_next_iter(path: &Path) {
let mut comps = path.iter();
let mut comp_count = 0;
while let Some(comp) = comps.next() {
// let path = comps.as_path();
comp_count += 1;
}
}
fn components_next_back_iter(path: &Path) {
let mut comps = path.iter();
let mut comp_count = 0;
while let Some(comp) = comps.next_back() {
// let path = comps.as_path();
comp_count += 1;
}
}
fn path_iter(path: &Path) {
let comps = path.iter();
let mut comp_count = 0;
for comp in comps {
comp_count += 1;
}
}
fn as_path_iter(path: &Path) {
let mut comps = path.iter();
let mut comp_count = 0;
while let Some(comp) = comps.next() {
let path = comps.as_path();
comp_count += 1;
}
}
fn eq_comps(path: &Path, other_path: &Path) {
path.components() == other_path.components();
}
fn compare_comps(path: &Path, other_path: &Path) {
let comp = path.components();
let other_comp = path.components();
comp.cmp(other_comp);
}(Similar code with Benchmark function: fn bench_components(c: &mut Criterion) {
let mut path = String::from("/");
let chars = vec!["a"; 64];
let mut str = chars.join("");
str.push('/');
for i in 0..1000 {
path.push_str(&str);
}
c.bench_function("Std Components", |b| {
b.iter(|| {
// Use black_box to prevent compiler optimizations from
// skipping the code you want to measure
black_box(components_iter(black_box(path.as_ref())))
})
});
c.bench_function("Std Components Next", |b| {
b.iter(|| {
// Use black_box to prevent compiler optimizations from
// skipping the code you want to measure
black_box(components_next_iter(black_box(path.as_ref())))
})
});
c.bench_function("Std Components Next Back", |b| {
b.iter(|| {
// Use black_box to prevent compiler optimizations from
// skipping the code you want to measure
black_box(components_next_back_iter(black_box(path.as_ref())))
})
});
c.bench_function("Std Path Iter", |b| {
b.iter(|| {
// Use black_box to prevent compiler optimizations from
// skipping the code you want to measure
black_box(path_iter(black_box(path.as_ref())))
})
});
c.bench_function("Std As Path Iter", |b| {
b.iter(|| {
// Use black_box to prevent compiler optimizations from
// skipping the code you want to measure
black_box(as_path_iter(black_box(path.as_ref())))
})
});
c.bench_function("Std Eq Comps", |b| {
b.iter(|| {
// Use black_box to prevent compiler optimizations from
// skipping the code you want to measure
black_box(eq_comps(black_box(path.as_ref()), black_box(path.as_ref())))
})
});
c.bench_function("Std Compare Comps", |b| {
b.iter(|| {
// Use black_box to prevent compiler optimizations from
// skipping the code you want to measure
black_box(compare_comps(black_box(path.as_ref()), black_box(path.as_ref())))
})
});
}(Similar benchmark code with this PR, ending with "Rewrite") The benchmark timings: Edit: I realized from |
…ity, added safety comments, and check for root dir after Prefix component (e.g., '\\?\checkout\src\tools' should produce Prefix, RootDir, Normal, Normal, None, ...) in Components::parse_single_component
2151b8f to
83cdbed
Compare
|
This is the benchmark results without The results of these benchmark seem to indicate that the performance of I do think the trade-off is pretty worth it though since there's a lot of Path equality (rather than Path comparison) occurring in the std library that would benefit from this performance improvement. |
This PR entirely changes how
Components<'_>is implemented. Currently, theComponents<'_>iterator 'consumes' components through mutating its path field to a subslice that presents the left over unconsumed path components (this consumed path component is what's returned inComponents::nextorComponents::next_back). However, this PR keeps the path field alive/unmodified and uses front and back indexing strategy to extract consumed/unconsumed components.This PR benefits implementations like
Components::as_path, which is pretty used is multiple areas of the standard library. Previously,Components<'_>iterator was required to clone inside the function to present the unconsumed path because our originalComponent<'_>consuming behavior on path will not allow the returned&'a PathfromComponents::as_pathto last after aComponents::nextorComponents::next_backcall. Due to the current implementation ofComponentsiterator has a size of 64 bytes, if you're usingComponents::as_pathafter eachComponents::next/Components::next_back, then it's pretty unfortunate to be cloning 64 bytes again and again, especially if each of your path components are a few bytes (e.g., "foo/bar/baz").On the point of size, with the indexing strategy, this PR has further optimized the size of
Components<'_>from 64 bytes -> 40 bytes since a large chunk of theComponents<'_>was taken up by theOption<Prefix>(this takes up 40 bytes), which we indicate that a prefix exists/unconsumed through callingparse_prefixon the path field (which I think is inexpensive since these Windows prefix length are not that long I believe) and seeing if ourfirst_compfield isSome(_)orNone(front index is encoded with prefix length if it exists, so we don't need to parse prefix again withinComponents::nextorComponents::next_back).Due to not having the prefix
Option<Prefix>field insideComponents<'_>anymore, all the prefix functions inComponents<'_>have been removed in favor of callingparse_prefix,Prefix::is_verbatim,Prefix::is_drive, etc.I'm curious if this redesign of
Components<'_>improves Path equality as pointed out by @clarfonthey in #154521 with Path equality being slow; I haven't benchmarked this though.Right now, when I tested it locally on my PC (Fedora OS), it passed all the standard library tests and rust analyzer didn't crash on me (had a few crash reports coming from rust analyzer early on when I messed around with
Components<'_>dealing something with threads usingPath::components, but now that's all resolved). I have not tested this on Windows yet, and I would probably need someone to help me test on this platform as my Windows VM is not working properly to run the standard library test suite.There's a lot of things being done here, and possibly there may be better approaches or ways I could improve this implementation or write the code in a neater way here. I am open to any advice or feedback on this approach.