Commit 532eab1
Bug Fixes in LPGEMM for AVX512(SkyLake) machine (#24)
* Bug Fixes in LPGEMM for AVX512(SkyLake) machine
- B-matrix in bf16bf16f32obf16/f32 API is re-ordered. For machines that
doesn't support BF16 instructions, the BF16 input is unre-ordered and
converted to FP32 to use FP32 kernels.
- For n = 1 and k = 1 sized matrices, re-ordering in BF16 is copying the
matrix to the re-ordered buffer array. But the un-reordering to FP32
requires the matrix to have size multiple of 16 along n and multiple
of 2 along k dimension.
- The entry condition to the above has been modified for AVX512 configuration.
- In bf16 API, the tiny path entry check has been modified to prevent
seg fault while AOCL_ENABLE_INSTRUCTIONS=AVX2 is set in BF16 supporting
machines.
- Modified existing store instructions in FP32 AVX512 kernels to support
execution in machines that has AVX512 support but not BF16/VNNI(SkyLake).
- Added Bf16 beta and store types in FP32 avx512_256 kernels
AMD Internal: [SWLCSG-3552]
* Bug Fixes in LPGEMM for AVX512(SkyLake) machine
- B-matrix in bf16bf16f32obf16/f32 API is re-ordered. For machines that
doesn't support BF16 instructions, the BF16 input is unre-ordered and
converted to FP32 to use FP32 kernels.
- For n = 1 and k = 1 sized matrices, re-ordering in BF16 is copying the
matrix to the re-ordered buffer array. But the un-reordering to FP32
requires the matrix to have size multiple of 16 along n and multiple
of 2 along k dimension.
- The entry condition to the above has been modified for AVX512 configuration.
- In bf16 API, the tiny path entry check has been modified to prevent
seg fault while AOCL_ENABLE_INSTRUCTIONS=AVX2 is set in BF16 supporting
machines.
- Modified existing store instructions in FP32 AVX512 kernels to support
execution in machines that has AVX512 support but not BF16/VNNI(SkyLake).
- Added Bf16 beta and store types, along with BIAS and ZP in FP32 avx512_256
kernels
AMD Internal: [SWLCSG-3552]
* Bug Fixes in LPGEMM for AVX512(SkyLake) machine
- Support added in FP32 512_256 kerenls for : Beta, BIAS, Zero-point and
BF16 store types for bf16bf16f32obf16 API execution in AVX2 mode.
- B-matrix in bf16bf16f32obf16/f32 API is re-ordered. For machines that
doesn't support BF16 instructions, the BF16 input is unre-ordered and
converted to FP32 type to use FP32 kernels.
- For n = 1 and k = 1 sized matrices, re-ordering in BF16 is copying the
matrix to the re-ordered buffer array. But the un-reordering to FP32
requires the matrix to have size multiple of 16 along n and multiple
of 2 along k dimension. The entry condition here has been modified for
AVX512 configuration.
- Fix for seg fault with AOCL_ENABLE_INSTRUCTIONS=AVX2 mode in BF16/VNNI
ISA supporting configruations:
- BF16 tiny path entry check has been modified to take into account arch_id
to ensure improper entry into the tiny kernel.
- The store in BF16->FP32 col-major for m = 1 conditions were updated to
correct storage pattern,
- BF16 beta load macro was modified to account for data in unaligned memory.
- Modified existing store instructions in FP32 AVX512 kernels to support
execution in machines that has AVX512 support but not BF16/VNNI(SkyLake)
AMD Internal: [SWLCSG-3552]
---------
Co-authored-by: VarshaV <varshav2@amd.com>1 parent 62d4fcb commit 532eab1
12 files changed
Lines changed: 1403 additions & 751 deletions
File tree
- addon/aocl_gemm
- frame/bf16bf16f32
- kernels
- zen4/lpgemm/f32f32f32
- zen/lpgemm/f32f32f32
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
103 | | - | |
| 103 | + | |
104 | 104 | | |
105 | 105 | | |
106 | 106 | | |
| |||
260 | 260 | | |
261 | 261 | | |
262 | 262 | | |
263 | | - | |
| 263 | + | |
264 | 264 | | |
265 | 265 | | |
266 | 266 | | |
| |||
271 | 271 | | |
272 | 272 | | |
273 | 273 | | |
274 | | - | |
| 274 | + | |
275 | 275 | | |
276 | 276 | | |
277 | 277 | | |
| |||
342 | 342 | | |
343 | 343 | | |
344 | 344 | | |
345 | | - | |
346 | 345 | | |
347 | 346 | | |
348 | 347 | | |
| |||
755 | 754 | | |
756 | 755 | | |
757 | 756 | | |
758 | | - | |
759 | | - | |
| 757 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
229 | 229 | | |
230 | 230 | | |
231 | 231 | | |
232 | | - | |
233 | 232 | | |
234 | | - | |
235 | | - | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
236 | 240 | | |
237 | 241 | | |
238 | 242 | | |
| |||
326 | 330 | | |
327 | 331 | | |
328 | 332 | | |
329 | | - | |
| 333 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
235 | 235 | | |
236 | 236 | | |
237 | 237 | | |
238 | | - | |
239 | | - | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
240 | 245 | | |
241 | 246 | | |
242 | 247 | | |
| |||
315 | 320 | | |
316 | 321 | | |
317 | 322 | | |
318 | | - | |
| 323 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
799 | 799 | | |
800 | 800 | | |
801 | 801 | | |
| 802 | + | |
802 | 803 | | |
803 | 804 | | |
804 | 805 | | |
| |||
898 | 899 | | |
899 | 900 | | |
900 | 901 | | |
901 | | - | |
902 | 902 | | |
903 | 903 | | |
904 | 904 | | |
| |||
914 | 914 | | |
915 | 915 | | |
916 | 916 | | |
| 917 | + | |
917 | 918 | | |
918 | 919 | | |
919 | 920 | | |
| |||
1014 | 1015 | | |
1015 | 1016 | | |
1016 | 1017 | | |
1017 | | - | |
1018 | 1018 | | |
1019 | 1019 | | |
1020 | 1020 | | |
| |||
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
669 | 669 | | |
670 | 670 | | |
671 | 671 | | |
672 | | - | |
| 672 | + | |
673 | 673 | | |
674 | 674 | | |
675 | 675 | | |
| |||
683 | 683 | | |
684 | 684 | | |
685 | 685 | | |
686 | | - | |
| 686 | + | |
687 | 687 | | |
688 | 688 | | |
689 | 689 | | |
| |||
695 | 695 | | |
696 | 696 | | |
697 | 697 | | |
698 | | - | |
| 698 | + | |
699 | 699 | | |
700 | 700 | | |
701 | 701 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
195 | 195 | | |
196 | 196 | | |
197 | 197 | | |
198 | | - | |
| 198 | + | |
199 | 199 | | |
200 | 200 | | |
201 | 201 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
76 | 76 | | |
77 | 77 | | |
78 | 78 | | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | 79 | | |
86 | 80 | | |
87 | 81 | | |
| |||
0 commit comments