-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Split VORTEXM4 from VORTEX target and fix SGEMM_DIRECT support for SME-capable targets #5423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 35 commits
Commits
Show all changes
102 commits
Select commit
Hold shift + click to select a range
ca22e28
Rename sgemm_direct_sme1.S to sgemm_direct_sme1_2VLx2VL.S
martin-frbg 22c6607
Use ASMNAME to get symbol name from build system; leave x18 unused as…
martin-frbg 89898fc
Add sgemm_direct_performant for switching between direct and regular …
martin-frbg 08a0032
Build symbol name from build system variables
martin-frbg 53d3bb5
Get symbol name from build system; change b.first to b.mi for AppleCl…
martin-frbg 731f4dd
Add VORTEXM4 settings
martin-frbg e82bcd2
Update ARM64 sgemm_direct object generation
martin-frbg 0203657
Add sgemm_direct_performant for ARM64
martin-frbg de91afd
Move SGEMM_DIRECT after the CBLAS parameter check and add sgemm_direc…
martin-frbg 202a7a0
Separate VORTEXM4 from VORTEX and ARMV9SME
martin-frbg e76c390
Add sgemm_direct_performant for ARM64
martin-frbg ef0b883
Add sgemm_direct_performant for ARM64
martin-frbg ccfd017
Enable SME on MacOS and add VORTEXM4 to DYNAMIC_ARCH list
martin-frbg b0a00fb
Add minimal compiler flags for VORTEXM4
martin-frbg 3097046
Add VORTEXM4 target
martin-frbg 4e2a8c1
Split VORTEXM4 from VORTEX target due to SME support
martin-frbg 18f9582
Add VORTEXM4
martin-frbg ca542f3
Add VORTEXM4
martin-frbg a4f5fec
Add compiler options for VORTEXM4
martin-frbg c794d0a
Add VORTEXM4
martin-frbg 4328c91
relax requirements in compiler SME capability check
martin-frbg 426b5f2
Add compiler options for VORTEXM4
martin-frbg 0bc19a1
Update SME kernel details
martin-frbg bf98e44
Add VORTEXM4 to DYNAMIC_ARCH list
martin-frbg 4609732
Relax version number requirement for AppleClang
martin-frbg 05dbb54
Delete misplaced file
martin-frbg 107c883
Update SME-related kernels
martin-frbg 501728a
adjust register 20 accesses to 21 after moving x18
martin-frbg edaa73f
Hide the local 2VLx2VL symbol as static is insufficient for this with…
martin-frbg 1ee8879
Add VORTEXM4
martin-frbg 7f89c6f
smh-based direct sgemm currently requires leading dimensions to be sa…
martin-frbg 8e50b8d
Add d8 to d15 to clobber lists as the code does not expressly save them
martin-frbg b4fc09e
Add registers d8 to d15 to clobber lists as the code does not express…
martin-frbg 1b88c9c
remove debugging printouts
martin-frbg 2b5d8c7
remove debugging printout
martin-frbg fc516af
Merge branch 'develop' into issue5414
martin-frbg ba9d2d2
remove sme from M4 Fortran flags as gfortran couples it with sve
martin-frbg b3d0bc4
Update Makefile.L3
martin-frbg 4ae3e37
restore 2VLx2VL naming
martin-frbg c889558
Rework for DYNAMIC_ARCH use and use of SGEMM functions by SSYMM
martin-frbg 20f5ed1
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg 47a66ae
Update limits based on benchmarking the SME code on Apple M4
martin-frbg 9bfc361
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg 8211db6
Don't enable SME for VortexM4 when the compiler is gcc (which does no…
martin-frbg 2346d0b
Add HAVE_SME for VortexM4 only with non-gcc compilers
martin-frbg d7b0fcc
Enable SME-based kernels for VortexM4 with clang-based compilers only
martin-frbg 643a0b5
Allow VortexM4 on the direct_SME fast path only for clang-based compi…
martin-frbg e01b109
Allow VortexM4 on the same fast path only with non-gcc compilers
martin-frbg f4ee3ae
Allow VortexM4 on the SME fast path only with non-gcc compilers
martin-frbg 1b591ea
export HAVE_SME setting and exclude VortexM4 from DYNAMIC_ARCH if gcc…
martin-frbg 83d3e0e
fix copy/paste
martin-frbg 682f61e
Add prototype for gotoblas_corename
martin-frbg ea85b66
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg 9c0965b
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg 8c0b13c
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg 7d35bf6
Add cpuid for Apple M5 (from a PR to the archspec project)
martin-frbg 7e44f62
fix sequence of arm64 sgemm_direct_performance and sgemm_direct_ab
martin-frbg b0bd49a
Add compiler guard around the M4 HAVE_SME property
martin-frbg 4af1870
Only add dedicated VORTEXM4 if building with LLVM
martin-frbg b185c9a
small fixes for separating sme and dummy parts
martin-frbg a683287
rework for dynamic_arch
martin-frbg 705259c
remove redundant HAVE_SME
martin-frbg 7ab8dc1
rework ARM64 SME dependency handling
martin-frbg c3c857c
fix sequence
martin-frbg 825d3ad
AppleClang does not define feature local_streaming
martin-frbg e85efb8
remove za from clobber lists
martin-frbg 275eb6f
Add workaround for current LLVM SME bug on Windows
martin-frbg 5c8cf37
Add workaround for current LLVM SME bug on Windows
martin-frbg b183182
Add workaround for current LLVM SME bug on Windows
martin-frbg 7beba94
Add workaround for current LLVM SME bug on Windows
martin-frbg f4383d0
syntax fix
martin-frbg 67fd33e
syntax fix
martin-frbg 618bcbd
adjust M4 options to avoid undefined references with non-Apple LLVM
martin-frbg a18a536
Adjust M4 options to avoid unresolved reference with non-Apple LLVM
martin-frbg 02bc005
reset SVE and SME capabilities between targets
martin-frbg e384396
Use the armv9 capability set in the compiler test for SME
martin-frbg 2d46f1e
Merge branch 'develop' into issue5414
martin-frbg a9a6eda
Adapt for DYNAMIC_ARCH with multiple ...preprocess symbols
martin-frbg 6de062c
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg aafd3cb
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg 0a53d91
Move early exit up; don't rely on support_sme() for now
martin-frbg 31150eb
Move early exit up; don't rely on support_sme() for now
martin-frbg 3149408
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg 10ba0e6
fix missing parentheses on endif
martin-frbg 770ad68
Distinguish AppleClang from LLVM on ARM64
martin-frbg 5e5f9a3
Apple Clang absolutely needs the +sme in the arch string
martin-frbg 31bb6ca
Apple Clang requires +sme in the arch string for M4
martin-frbg 533cab2
add prototype
martin-frbg bdcb9b7
add prototype
martin-frbg fa021e1
fix missing endif() and add AppleClang options for M4
martin-frbg 6137236
fix os variable reference
martin-frbg 6735872
drop the cpu=apple-m4 part as nonessential
martin-frbg d3e4b41
remove cpu=apple-m4 as not required and less portable
martin-frbg 88c583e
Update Makefile
martin-frbg 7ffce1c
fix spurious change of (S)BGEMM parameters for NeoverseV1
martin-frbg d49df4c
force linking to clang_rt_builtins when using LLVM for AppleM4
martin-frbg 93cd7b9
Force linking to clang_rt_builtins when using LLVM for AppleM4
martin-frbg 7acf919
typo
martin-frbg faa1875
typo fix
martin-frbg 5133aac
Make VORTEXM4 available in DYNAMIC_ARCH on Apple
martin-frbg 55a10c7
Make VortexM4 available in DYNAMIC_ARCH on MacOS only
martin-frbg 6f225da
make VORTEXM4 MacOS-only for now
martin-frbg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -111,6 +111,7 @@ THUNDERX2T99 | |
| TSV110 | ||
| THUNDERX3T110 | ||
| VORTEX | ||
| VORTEXM4 | ||
| A64FX | ||
| ARMV8SVE | ||
| ARMV9SME | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For RowMajor, shouldn't the leading dimension check be (lda==k && ldb==n && ldc==n) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normally yes but arguments have already been reshuffled at this point (I think - I'll recheck when I get back to this later this week)