Fix SGEMM returning wrong results in multithreading on NeoverseV2#5643
Fix SGEMM returning wrong results in multithreading on NeoverseV2#5643martin-frbg wants to merge 1 commit intoOpenMathLib:developfrom
Conversation
|
Is this ready to go? |
|
It definitely fixes the problem on NeoverseV2, but (a) there may be other arm64 cpus similarly affected and (b) I haven't fully understood the underlying issue with that specific parameter that was introduced fairly recently - it may be papering over a missing tail call in the gemm kernel it rode in on, or something else entirely. |
|
@Mousius I'm a bit confused now as I notice NeoverseV2 was already using the SVE SGEMM kernel (via ARMV8SVE) until you switched its KERNEL file to be based on NEOVERSEN2 rather than ARMV8SVE in order to reuse N2's sbgemm kernel in the otherwise unrelated #5399 . Was that a conscious decision to return the V2 to the basic Neon kernel due to its shorter vector register size compared to V1&A64FX, or collateral damage that I missed at the time ? |
Fixes numpy/numpy#30816