feat(skip-softmax): Add skip-softmax support for KV-blocked attention by vbaddi · Pull Request #1009 · quic/efficient-transformers

vbaddi · 2026-05-25T09:25:43Z

Summary

Adds BLASS-style skip-softmax support on top of KV-blocked attention.

Changes

Added skip_softmax knobs to attention blocking config.
Wired skip-softmax through KV/QKV/HQKV/BHQKV and KV-MLA blocked paths.
Added online-softmax block skip predicate using: lambda_eff = skip_softmax_scale / ctx_len
Added tests for threshold selection, skip mask behavior, and accumulator preservation.
Added example script supporting:
- plain non-blocked baseline
- KV-blocked baseline
- KV-blocked + skip-softmax

Example

KV-Blocking baseline:

python examples/text_generation/skip_softmax_kv_blocking_inference.py \
    --model-name meta-llama/Llama-3.2-1B \
    --prompt "Can you generate a detailed case study for Mumbai city?" \
    --prefill-seq-len 1 \
    --ctx-len 65536 \
    --generation-len 64000 \
    --num-kv-blocks 8 \
    --no-skip-softmax \
    --num-cores 16 \
    --aic-hw-version ai100 \
    --device-group '[0,1,2,3]'

KV-Blocking + skip-softmax:

python examples/text_generation/skip_softmax_kv_blocking_inference.py \
--model-name meta-llama/Llama-3.2-1B \
--prompt "Can you generate a detailed case study for Mumbai city?" \
--prefill-seq-len 1 \
--ctx-len 65536 \
--generation-len 64000 \
--num-kv-blocks 8 \
--skip-softmax \
--skip-softmax-decode-scale 16.0 \
--skip-softmax-prefill-scale 1.0 \
--skip-softmax-min-keep-blocks 1 \
--num-cores 16 \
--aic-hw-version ai100 \
--device-group '[0,1,2,3]'

Validation

python -m pytest -q tests/test_skip_softmax_kv_blocking.py
Result:
7 passed

cc: @anujgupt-github @kdulla

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

feat(0525): add skip-softmax for KV-blocked attention

f385d86

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

vbaddi assigned vbaddi and quic-amitraj May 25, 2026

vbaddi added the enhancement New feature or request label May 25, 2026

vbaddi assigned kdulla May 25, 2026

vbaddi marked this pull request as draft May 25, 2026 09:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skip-softmax): Add skip-softmax support for KV-blocked attention#1009

feat(skip-softmax): Add skip-softmax support for KV-blocked attention#1009
vbaddi wants to merge 1 commit into
quic:mainfrom
vbaddi:feat/enable-skip-softmax

vbaddi commented May 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vbaddi commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Example

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vbaddi commented May 25, 2026 •

edited

Loading