Add OddEvenSort to SVE microbenchmark by ylpoonlg · Pull Request #5046 · dotnet/performance

ylpoonlg · 2025-11-14T14:46:49Z

Performance Results

Run on Neoverse-V2

Method	Size	Mean	Error	StdDev	Median	Min	Max	Allocated
Scalar	15	6.604 ns	0.1605 ns	0.1577 ns	6.707 ns	6.397 ns	6.748 ns	-
Vector128OddEvenSort	15	13.196 ns	0.0052 ns	0.0048 ns	13.195 ns	13.190 ns	13.205 ns	-
SveOddEvenSort	15	19.118 ns	0.0149 ns	0.0139 ns	19.120 ns	19.091 ns	19.138 ns	-
SveTail	15	17.710 ns	0.0153 ns	0.0136 ns	17.708 ns	17.693 ns	17.736 ns	-
Scalar	127	42.785 ns	0.0486 ns	0.0379 ns	42.771 ns	42.738 ns	42.844 ns	-
Vector128OddEvenSort	127	35.148 ns	0.0383 ns	0.0358 ns	35.155 ns	35.056 ns	35.183 ns	-
SveOddEvenSort	127	90.251 ns	0.1244 ns	0.1164 ns	90.283 ns	90.023 ns	90.436 ns	-
SveTail	127	32.287 ns	0.0128 ns	0.0113 ns	32.285 ns	32.275 ns	32.315 ns	-
Scalar	527	180.766 ns	0.2361 ns	0.2093 ns	180.670 ns	180.580 ns	181.265 ns	-
Vector128OddEvenSort	527	149.811 ns	0.0485 ns	0.0405 ns	149.810 ns	149.771 ns	149.920 ns	-
SveOddEvenSort	527	376.324 ns	0.5892 ns	0.4920 ns	376.425 ns	375.506 ns	377.375 ns	-
SveTail	527	130.668 ns	0.1057 ns	0.0937 ns	130.643 ns	130.566 ns	130.867 ns	-
Scalar	10015	3,774.353 ns	17.7558 ns	13.8626 ns	3,770.468 ns	3,758.883 ns	3,809.131 ns	-
Vector128OddEvenSort	10015	2,813.042 ns	0.9627 ns	0.8534 ns	2,813.069 ns	2,812.061 ns	2,815.025 ns	-
SveOddEvenSort	10015	7,162.929 ns	5.7581 ns	5.3861 ns	7,161.363 ns	7,156.213 ns	7,173.464 ns	-
SveTail	10015	2,468.773 ns	1.2800 ns	1.1346 ns	2,468.508 ns	2,467.575 ns	2,470.949 ns	-

cc @dotnet/arm64-contrib @SwapnilGaikwad @LoopedBard3

tannergooding · 2025-11-14T15:17:25Z

I have a general concern over all the SVE specific microbenchmarks being added.

Benchmarks are fairly expensive in terms of runtime and even a small number of them can have significant cost to CI and our tracking. We don't really have any platform specific intrinsic benchmarks correspondingly (i.e. you don't see explicit benchmarks covering AdvSimd or Avx512).

Rather instead we typically have our normal benchmarks like for Span, Linq, Tensors and other areas which internally accelerate using SIMD and we measure those across a range of hardware. We have some machines that support newer ISAs and can then correlate and see the more real world improvements based on that.

I would expect here that we aren't directly testing SVE either. But rather would be testing with SVE enabled and comparing that against a run with it disabled. This will require a bit more work in the JIT to enable first, but significantly reduces cost and gives better metrics as to the benefit customers will see.

CC. @DrewScoggins

Copilot

Pull request overview

Adds a new SVE-focused microbenchmark (OddEvenSort) to measure scalar, AdvSimd (Vector128), and SVE implementations of odd-even sort on Arm64/SVE-capable platforms, following the existing SveBenchmarks patterns.

Changes:

Introduces OddEvenSort benchmark class with four benchmark methods: Scalar, Vector128OddEvenSort, SveOddEvenSort, and SveTail.
Adds per-benchmark SVE support filtering via a local ManualConfig + SimpleFilter.
Adds GlobalSetup input generation and GlobalCleanup verification against the scalar reference implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.