-
Notifications
You must be signed in to change notification settings - Fork 172
Description
I've benched both v1 and v2, and it seems there are some performance regressions in v2.
v1 performance:
test bench_extend ... bench: 33.43 ns/iter (+/- 1.79) test bench_extend_from_slice ... bench: 34.40 ns/iter (+/- 6.53) test bench_extend_from_slice_small ... bench: 5.70 ns/iter (+/- 0.09) test bench_push ... bench: 262.09 ns/iter (+/- 2.49) test bench_push_small ... bench: 16.84 ns/iter (+/- 0.16) test bench_insert_push ... bench: 276.11 ns/iter (+/- 3.33) test bench_insert_push_small ... bench: 20.81 ns/iter (+/- 0.21)but on v2:
test bench_extend ... bench: 81.64 ns/iter (+/- 2.10) test bench_extend_from_slice ... bench: 83.78 ns/iter (+/- 1.58) test bench_extend_from_slice_small ... bench: 9.72 ns/iter (+/- 0.25) test bench_push ... bench: 275.09 ns/iter (+/- 3.12) test bench_push_small ... bench: 42.53 ns/iter (+/- 1.75) test bench_insert_push ... bench: 305.75 ns/iter (+/- 10.22) test bench_insert_push_small ... bench: 45.65 ns/iter (+/- 9.09)The most critical performance regressions are on extend methods, push_small, and insert_push_small.
It may be that the implementation for the
pushmethod changed from v1:pub fn push(&mut self, value: A::Item) { unsafe { let (mut ptr, mut len, cap) = self.triple_mut(); if *len == cap { self.reserve_one_unchecked(); let (heap_ptr, heap_len) = self.data.heap_mut(); ptr = heap_ptr; len = heap_len; } ptr::write(ptr.as_ptr().add(*len), value); *len += 1; } }to v2:
pub fn push(&mut self, value: T) { let len = self.len(); if len == self.capacity() { self.reserve(1); } // SAFETY: both the input and output are within the allocation let ptr = unsafe { self.as_mut_ptr().add(len) }; // SAFETY: we allocated enough space in case it wasn't enough, so the address is valid for // writes. unsafe { ptr.write(value) }; unsafe { self.set_len(len + 1) } }and it seems it's cascading to all other methods that depend on it. However performance only seems to degrade meaningfully when we don't push past the SmallVec preallocated N and has to spill to the heap. So it's probably slower just when
pushis called and the current capacity is less than N.On the benches that have slowed massively, it's by a factor of around 130% slower, pretty constant across those benches, so it all points towards a shared slowing cause.
It might also be that the
pushis not what's causing the bottleneck but something else downstream.
Originally posted by @alejandro-vaz in #395