Skip to content

Conversation

@thevilledev
Copy link
Contributor

Motivation

Iteration over slices previously used reflection to access elements, which was slow and allocated unnecessarily.

Changes

This change adds type-specialized fast paths for common slice types ([]int, []float64, []string, []any) that bypass reflection entirely. Scope objects are now pooled and reused across loop iterations. The current scope pointer is cached to avoid repeated slice lookups.

Bench run:

go test -bench='Benchmark_filter' -benchmem -count=10 -run=^$ ./...

Results:

cpu: Apple M1 Pro
               │   old.txt    │               new.txt               │
               │    sec/op    │   sec/op     vs base                │
_filter-8        48.99µ ±  4%   40.79µ ± 6%  -16.73% (p=0.000 n=10)
_filterLen-8     44.64µ ± 14%   37.64µ ± 6%  -15.67% (p=0.000 n=10)
_filterFirst-8   430.4n ±  4%   362.0n ± 2%  -15.88% (p=0.000 n=10)
_filterLast-8    798.2n ± 12%   766.6n ± 3%   -3.96% (p=0.009 n=10)
_filterMap-8     5.231µ ±  1%   3.742µ ± 4%  -28.47% (p=0.000 n=10)
geomean          5.234µ         4.370µ       -16.51%
               │   old.txt    │               new.txt                │
               │     B/op     │     B/op      vs base                │
_filter-8        20.38Ki ± 0%   18.20Ki ± 0%  -10.69% (p=0.000 n=10)
_filterLen-8     7.938Ki ± 0%   6.039Ki ± 0%  -23.92% (p=0.000 n=10)
_filterFirst-8     192.0 ± 0%     224.0 ± 0%  +16.67% (p=0.000 n=10)
_filterLast-8      712.0 ± 0%     808.0 ± 0%  +13.48% (p=0.000 n=10)
_filterMap-8      1736.0 ± 0%     920.0 ± 0%  -47.00% (p=0.000 n=10)
geomean          2.045Ki        1.763Ki       -13.77%
               │   old.txt    │               new.txt                │
               │  allocs/op   │ allocs/op   vs base                  │
_filter-8         1155.0 ± 0%   864.0 ± 0%  -25.19% (p=0.000 n=10)
_filterLen-8      1004.0 ± 0%   749.0 ± 0%  -25.40% (p=0.000 n=10)
_filterFirst-8    12.000 ± 0%   4.000 ± 0%  -66.67% (p=0.000 n=10)
_filterLast-8      24.00 ± 0%   24.00 ± 0%        ~ (p=1.000 n=10) ¹
_filterMap-8     123.000 ± 0%   9.000 ± 0%  -92.68% (p=0.000 n=10)
geomean            132.7        56.17       -57.66%
¹ all samples are equal 

Further comments

As the Scope struct is bigger, so is the B/op for filterFirst and filterLast. They are faster still.

Iteration over slices previously used reflection to access elements,
which was slow and allocated unnecessarily. This change adds
type-specialized fast paths for common slice types ([]int, []float64,
[]string, []any) that bypass reflection entirely. Scope objects are
now pooled and reused across loop iterations. The current scope
pointer is cached to avoid repeated slice lookups.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant