Add floating-point routines to SVE microbenchmark by ylpoonlg · Pull Request #5043 · dotnet/performance

ylpoonlg · 2025-11-14T14:34:30Z

Adds 4 floating-point math routines to SVE benchmarks.

Performance Results

Run on Neoverse-V2

FastDivision

Method	Size	Mean	Error	StdDev	Median	Min	Max	Allocated
Scalar	15	9.825 ns	0.0037 ns	0.0031 ns	9.825 ns	9.821 ns	9.832 ns	-
Vector128FastDivision	15	7.824 ns	0.0092 ns	0.0081 ns	7.824 ns	7.811 ns	7.839 ns	-
SveFastDivision	15	9.219 ns	0.0172 ns	0.0161 ns	9.221 ns	9.182 ns	9.244 ns	-
Scalar	127	94.367 ns	0.0301 ns	0.0251 ns	94.365 ns	94.329 ns	94.422 ns	-
Vector128FastDivision	127	77.715 ns	0.0960 ns	0.0898 ns	77.762 ns	77.590 ns	77.858 ns	-
SveFastDivision	127	79.684 ns	0.0204 ns	0.0171 ns	79.682 ns	79.657 ns	79.716 ns	-
Scalar	527	397.337 ns	0.1914 ns	0.1790 ns	397.322 ns	397.022 ns	397.759 ns	-
Vector128FastDivision	527	328.083 ns	0.3779 ns	0.3535 ns	328.128 ns	327.514 ns	328.745 ns	-
SveFastDivision	527	331.842 ns	0.3459 ns	0.2888 ns	331.897 ns	331.362 ns	332.280 ns	-
Scalar	10015	7,596.185 ns	7.9107 ns	7.3997 ns	7,595.238 ns	7,585.420 ns	7,609.528 ns	-
Vector128FastDivision	10015	6,270.509 ns	6.7028 ns	5.5972 ns	6,269.118 ns	6,262.699 ns	6,281.886 ns	-
SveFastDivision	10015	6,355.315 ns	9.5852 ns	8.9660 ns	6,358.352 ns	6,343.016 ns	6,369.762 ns	-

MultiplyPow2

Method	Size	Mean	Error	StdDev	Median	Min	Max	Allocated
Scalar	15	10.068 ns	0.0067 ns	0.0052 ns	10.066 ns	10.063 ns	10.078 ns	-
Vector128MultiplyPow2	15	5.770 ns	0.0045 ns	0.0042 ns	5.769 ns	5.764 ns	5.778 ns	-
SveMultiplyPow2	15	5.651 ns	0.0015 ns	0.0012 ns	5.651 ns	5.649 ns	5.653 ns	-
SveTail	15	5.721 ns	0.0136 ns	0.0106 ns	5.718 ns	5.714 ns	5.754 ns	-
Scalar	127	88.694 ns	0.0922 ns	0.0862 ns	88.680 ns	88.570 ns	88.855 ns	-
Vector128MultiplyPow2	127	29.605 ns	0.0284 ns	0.0252 ns	29.605 ns	29.565 ns	29.649 ns	-
SveMultiplyPow2	127	40.121 ns	0.3650 ns	0.3414 ns	40.308 ns	39.560 ns	40.363 ns	-
SveTail	127	39.014 ns	0.0256 ns	0.0214 ns	39.010 ns	38.997 ns	39.082 ns	-
Scalar	527	349.414 ns	0.1524 ns	0.1351 ns	349.381 ns	349.260 ns	349.670 ns	-
Vector128MultiplyPow2	527	120.288 ns	0.0819 ns	0.0726 ns	120.312 ns	120.120 ns	120.362 ns	-
SveMultiplyPow2	527	168.595 ns	0.0843 ns	0.0747 ns	168.618 ns	168.467 ns	168.723 ns	-
SveTail	527	161.587 ns	0.0874 ns	0.0818 ns	161.559 ns	161.505 ns	161.748 ns	-
Scalar	10015	6,573.934 ns	17.9883 ns	15.0211 ns	6,579.797 ns	6,547.476 ns	6,586.919 ns	-
Vector128MultiplyPow2	10015	2,587.071 ns	0.9635 ns	0.8541 ns	2,586.873 ns	2,586.034 ns	2,588.692 ns	-
SveMultiplyPow2	10015	3,158.436 ns	24.2145 ns	22.6503 ns	3,164.490 ns	3,096.195 ns	3,180.436 ns	-
SveTail	10015	3,039.080 ns	1.4834 ns	1.2387 ns	3,039.081 ns	3,037.630 ns	3,041.798 ns	-

FP64Overflow

Method	Size	Mean	Error	StdDev	Median	Min	Max	Allocated
Scalar	15	19.399 ns	0.0096 ns	0.0090 ns	19.396 ns	19.387 ns	19.417 ns	-
Vector128FP64Overflow	15	8.756 ns	0.0063 ns	0.0056 ns	8.755 ns	8.748 ns	8.766 ns	-
SveFP64Overflow	15	8.147 ns	0.0066 ns	0.0062 ns	8.145 ns	8.139 ns	8.159 ns	-
Sve2FP64Overflow	15	7.619 ns	0.0123 ns	0.0109 ns	7.616 ns	7.608 ns	7.644 ns	-
Scalar	127	164.961 ns	0.2280 ns	0.2021 ns	164.916 ns	164.576 ns	165.303 ns	-
Vector128FP64Overflow	127	66.245 ns	0.0188 ns	0.0157 ns	66.245 ns	66.220 ns	66.275 ns	-
SveFP64Overflow	127	64.313 ns	0.8416 ns	0.7872 ns	64.410 ns	62.749 ns	65.260 ns	-
Sve2FP64Overflow	127	77.453 ns	0.0143 ns	0.0127 ns	77.456 ns	77.437 ns	77.479 ns	-
Scalar	527	663.966 ns	0.9327 ns	0.8724 ns	663.922 ns	662.648 ns	665.514 ns	-
Vector128FP64Overflow	527	271.640 ns	0.0580 ns	0.0484 ns	271.639 ns	271.543 ns	271.743 ns	-
SveFP64Overflow	527	254.105 ns	0.0560 ns	0.0497 ns	254.092 ns	254.048 ns	254.190 ns	-
Sve2FP64Overflow	527	327.546 ns	0.0940 ns	0.0785 ns	327.538 ns	327.438 ns	327.724 ns	-
Scalar	10015	12,511.545 ns	37.5932 ns	35.1647 ns	12,500.521 ns	12,474.141 ns	12,588.912 ns	-
Vector128FP64Overflow	10015	5,216.800 ns	13.8224 ns	12.9294 ns	5,215.184 ns	5,188.232 ns	5,232.808 ns	-
SveFP64Overflow	10015	5,068.319 ns	16.6997 ns	15.6210 ns	5,074.020 ns	5,042.855 ns	5,089.636 ns	-
Sve2FP64Overflow	10015	6,263.925 ns	5.0281 ns	4.4572 ns	6,263.132 ns	6,257.081 ns	6,273.254 ns	-

Exponent

Method	Size	Mean	Error	StdDev	Median	Min	Max	Allocated
Scalar	15	49.909 ns	0.0501 ns	0.0418 ns	49.912 ns	49.858 ns	49.994 ns	-
Vector128Exponent	15	16.389 ns	0.0150 ns	0.0125 ns	16.385 ns	16.376 ns	16.418 ns	-
SveExponent	15	6.889 ns	0.0074 ns	0.0070 ns	6.889 ns	6.877 ns	6.903 ns	-
Scalar	127	432.017 ns	0.2920 ns	0.2588 ns	431.993 ns	431.621 ns	432.565 ns	-
Vector128Exponent	127	73.328 ns	0.0658 ns	0.0549 ns	73.319 ns	73.248 ns	73.465 ns	-
SveExponent	127	55.834 ns	0.1106 ns	0.1035 ns	55.779 ns	55.687 ns	55.970 ns	-
Scalar	527	1,784.395 ns	0.5001 ns	0.4176 ns	1,784.531 ns	1,783.495 ns	1,784.844 ns	-
Vector128Exponent	527	277.828 ns	0.1142 ns	0.0954 ns	277.825 ns	277.660 ns	277.971 ns	-
SveExponent	527	230.463 ns	0.5259 ns	0.4919 ns	230.461 ns	229.614 ns	231.321 ns	-
Scalar	10015	33,839.271 ns	26.9486 ns	22.5033 ns	33,833.955 ns	33,818.360 ns	33,897.961 ns	-
Vector128Exponent	10015	5,141.024 ns	6.0805 ns	5.3902 ns	5,142.541 ns	5,129.059 ns	5,146.362 ns	-
SveExponent	10015	4,373.064 ns	10.5089 ns	9.3159 ns	4,368.532 ns	4,364.087 ns	4,387.380 ns	-

cc @dotnet/arm64-contrib @SwapnilGaikwad @LoopedBard3

* FastDivision * MultiplyPow2 * FP64Overflow * Exponent

Copilot

Pull request overview

Adds four new floating-point focused SVE microbenchmarks to expand Arm64 SVE coverage in the microbench suite, with project-file gating to avoid compiling these benchmarks on older target frameworks.

Changes:

Introduce new SVE benchmarks: FastDivision, MultiplyPow2, Exponent, and FP64Overflow (including Vector128 baselines and SVE/SVE2 variants where applicable).
Add net10.0 target-framework gating to exclude these new benchmarks when building microbenchmarks for TFMs older than net10.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/benchmarks/micro/sve/MultiplyPow2.cs	New benchmark for power-of-two scaling of FP64 (Scalar/Vector128/SVE variants).
src/benchmarks/micro/sve/FastDivision.cs	New benchmark for reciprocal-estimate based FP64 division (Scalar/Vector128/SVE variants).
src/benchmarks/micro/sve/FP64Overflow.cs	New benchmark for FP64 “overflow” routine including a SVE2 path and exponent side-output.
src/benchmarks/micro/sve/Exponent.cs	New benchmark for `expf`-style routine (Scalar/Vector128/SVE variants).
src/benchmarks/micro/MicroBenchmarks.csproj	Excludes these new benchmarks on TFMs < net10.0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

src/benchmarks/micro/sve/Exponent.cs

+                Vector<float> invln2Vec = new Vector<float>(d[2]);
+                Vector<float> shiftVec = new Vector<float>(d[8]);
+                Vector<float> ln2hiVec = new Vector<float>(d[7]);
+                Vector<float> constVec = Sve.LoadVector(Sve.CreateTrueMaskSingle(), &d[3]);


src/benchmarks/micro/sve/Exponent.cs

+                Vector128<float> c3Vec = Vector128.Create(d[7]);
+                Vector128<float> c4Vec = Vector128.Create(d[8]);
+
+                for (; i < Size - 4; i += 4)


src/benchmarks/micro/sve/FastDivision.cs

+                    // Iteratively refine the estimation by multiplying the reicrocal step.
+                    Vector128<double> stp2;


src/benchmarks/micro/sve/FastDivision.cs

+                    // Estimate the reciprocal of 1/input2Vec.
+                    Vector<double> input2VecInv = Sve.ReciprocalEstimate(input2Vec);
+
+                    // Iteratively refine the estimation by multiplying the reicrocal step.


src/benchmarks/micro/MicroBenchmarks.csproj

+  <!-- Remove Sve2 microbenchmarks when running on net versions < 10.0 -->
+  <ItemGroup Condition="!$([MSBuild]::IsTargetFrameworkCompatible('$(TargetFramework)', 'net10.0'))">
+    <Compile Remove="sve\Exponent.cs" />
+    <Compile Remove="sve\FP64Overflow.cs" />
+    <Compile Remove="sve\FastDivision.cs" />
+    <Compile Remove="sve\MultiplyPow2.cs" />
+  </ItemGroup>


ylpoonlg added 4 commits November 13, 2025 14:31

Add floating-point routines to SVE microbenchmark

f49c88d

* FastDivision * MultiplyPow2 * FP64Overflow * Exponent

Merge branch 'main' into github-svefloatingpoint

7e016ec

Add SVE Category

3fd57b3

Disable when version < net10.0

7cbf5ef

LoopedBard3 requested a review from Copilot March 17, 2026 21:58

Copilot started reviewing on behalf of LoopedBard3 March 17, 2026 21:59 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

LoopedBard3 mentioned this pull request Mar 18, 2026

Add OddEvenSort to SVE microbenchmark #5046

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add floating-point routines to SVE microbenchmark#5043

Add floating-point routines to SVE microbenchmark#5043
ylpoonlg wants to merge 4 commits intodotnet:mainfrom
ylpoonlg:github-svefloatingpoint

ylpoonlg commented Nov 14, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// Iteratively refine the estimation by multiplying the reicrocal step.
		Vector128<double> stp2;

Conversation

ylpoonlg commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Results

FastDivision

MultiplyPow2

FP64Overflow

Exponent

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ylpoonlg commented Nov 14, 2025 •

edited

Loading