Enable ffn blocking for dense models with automatic blocking configurator by kdulla · Pull Request #958 · quic/efficient-transformers

kdulla · 2026-05-04T10:23:48Z

Summary

This PR introduces QEfficient ffn blocking for dense causal language models, adding token and weight blocking strategies for both MLPs with and without up projections.

Key Features

Token Blocking: Blocked compute over hidden_states sequence length
Weight Blocking: Blocked compute over MLP weights
Currently, Configurable via qaic_config (explicit or auto), adjust relevant compiler flags based on ffn blocking strategy as well.
Updated performance teams ffn blocking configurator to be calculation based instead of requiring the creation of dummy MLPs.

Signed-off-by: Kushal Dulla <kdulla@qti.qualcomm.com>

kdulla added 2 commits May 4, 2026 14:48

Added dense FFN blocking support with blocking configurator

89223c0

Signed-off-by: Kushal Dulla <kdulla@qti.qualcomm.com>

fixed minor qwen3vl bug and added ffn blocking tests

bbdf300

Signed-off-by: Kushal Dulla <kdulla@qti.qualcomm.com>

kdulla requested review from ochougul, quic-hemagnih, quic-rishinr and vbaddi May 4, 2026 10:23

kdulla self-assigned this May 4, 2026

kdulla added enhancement New feature or request qeff.blocking labels May 4, 2026

kdulla added 2 commits May 5, 2026 12:00

changed variable names to be consistent and minor fixes for base tests

bdc1ecb

Signed-off-by: Kushal Dulla <kdulla@qti.qualcomm.com>

minor typo fix

6c9e203

Signed-off-by: Kushal Dulla <kdulla@qti.qualcomm.com>

Provide feedback