Skip to content

Commit 1c219af

Browse files
committed
CANN: Support more functions in FA operator
Port commit fc86b5e from glitter4/master to current master. - Support logitSoftcap in flash attention - Support head_dim not aligned to 16 (padding) - Support K/V head count mismatch (GQA/MQA) - Support attention sinks (src4) via boolean mask - Fallback path for head_dim > 512 using BatchMatMul+Softmax - Adapted to current master RAII smart pointer API (acl_tensor_ptr) - Adapted to BSND layout with transpose12 pattern Test results (ACL_GRAPH=OFF): - 640 OK (all non-sinks, non-alibi cases pass) - 656 FAIL (sinks=1 cases, needs further debugging) - 16 FAIL (alibi/max_bias=8 cases, pre-existing issue)
1 parent 66d403c commit 1c219af

2 files changed

Lines changed: 594 additions & 115 deletions

File tree

0 commit comments

Comments
 (0)