Performance testing Java 24’s Vector API.
JavaVectorAPI-1: https://github.com/tomerr90/JavaVectorAPI-1.git
Using JDK 24 we will run all five test suites from JavaVectorAPI-1 on a different CPU vendor:
-
Intel Xeon E-2378
-
AMD Ryzen 9 9950x
-
Apple M4 Base
-
Qualcomm SnapDragon X Elite X1E80100
SimpleSum:
SimpleSumNoSuperWord:
ComplexExpression:
ComplexExpressionNoSuperWord:
ArrayStats:
SimpleSum:
SimpleSumNoSuperWord:
ComplexExpression:
ComplexExpressionNoSuperWord:
ArrayStats:
SimpleSum:
SimpleSumNoSuperWord:
ComplexExpression:
ComplexExpressionNoSuperWord:
ArrayStats:
SimpleSum:
SimpleSumNoSuperWord:
ComplexExpression:
ComplexExpressionNoSuperWord:
ArrayStats:
SimpleSum/SimpleSunNoSuperWord & ComplexExpression/ComplexExpressionNoSuperWord:
JIT is in most cases providing automatic vectorization matching find tuned implementation.
ArrayStats - Branchless code:
There are a few stories that fall out of this test suite.
x64:
There is a band of data set size that will elicit optimal boost from the Vector API.
Intel’s band is wider than AMD, however when AMD is processing optimal data set size, its gaining a very large boost.
ARM:
Both Qualcomm and Apple tend to have more boost as datasets increase in size. In Qualcomm’s case the larger boost starts at a smaller data size, however as data set size grows the Apple system achieves more boost.
For these specific processors we see that the Ryzen CPU can generate the most performance boost using Vector API IFF the data set size is in its goldilocks zone. Both ARM CPUs appear to gain more boost from Vector APIs as data sets get larger.


















