-
Notifications
You must be signed in to change notification settings - Fork 66
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Gather operation is 5 times slower when AVX2 instructions are allowed with -mavx2.
To Reproduce
- Set up Google Benchmark project
- Disable CPU frequency scaling with
sudo cpupower frequency-set --governor performance - Test the following code with
-O3and-O3 -mavx2:
void run(benchmark::State& state) {
float data[4] = {1, 2, 3, 4};
for(auto _ : state) {
eve::wide<float, eve::fixed<4>> vec = eve::gather(data, eve::wide<unsigned char, eve::fixed<4>>{2, 3, 0, 1});
benchmark::DoNotOptimize(vec);
}
}
BENCHMARK(run);
BENCHMARK_MAIN();Without -mavx2:
-----------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------
run 0.198 ns 0.198 ns 3315565533
With -mavx2:
-----------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------
run 1.01 ns 1.01 ns 648857193
Setup:
- Compiler: g++ 14.2.1, clang++ 19.1.7
- OS: Gentoo Linux
- CPU: Ryzen 9 7940HS
- Instructions Set used: SSE, AVX2
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working