Commit 1c219af
committed
CANN: Support more functions in FA operator
Port commit fc86b5e from glitter4/master to current master.
- Support logitSoftcap in flash attention
- Support head_dim not aligned to 16 (padding)
- Support K/V head count mismatch (GQA/MQA)
- Support attention sinks (src4) via boolean mask
- Fallback path for head_dim > 512 using BatchMatMul+Softmax
- Adapted to current master RAII smart pointer API (acl_tensor_ptr)
- Adapted to BSND layout with transpose12 pattern
Test results (ACL_GRAPH=OFF):
- 640 OK (all non-sinks, non-alibi cases pass)
- 656 FAIL (sinks=1 cases, needs further debugging)
- 16 FAIL (alibi/max_bias=8 cases, pre-existing issue)1 parent 66d403c commit 1c219af
2 files changed
Lines changed: 594 additions & 115 deletions
0 commit comments