Skip to content

Enable ffn blocking for dense models with automatic blocking configurator#958

Open
kdulla wants to merge 4 commits into
quic:mainfrom
kdulla:enable_ffn_blocking
Open

Enable ffn blocking for dense models with automatic blocking configurator#958
kdulla wants to merge 4 commits into
quic:mainfrom
kdulla:enable_ffn_blocking

Conversation

@kdulla
Copy link
Copy Markdown
Contributor

@kdulla kdulla commented May 4, 2026

Summary

  • This PR introduces QEfficient ffn blocking for dense causal language models, adding token and weight blocking strategies for both MLPs with and without up projections.

Key Features

  • Token Blocking: Blocked compute over hidden_states sequence length
  • Weight Blocking: Blocked compute over MLP weights
  • Currently, Configurable via qaic_config (explicit or auto), adjust relevant compiler flags based on ffn blocking strategy as well.
  • Updated performance teams ffn blocking configurator to be calculation based instead of requiring the creation of dummy MLPs.

kdulla added 2 commits May 4, 2026 14:48
Signed-off-by: Kushal Dulla <kdulla@qti.qualcomm.com>
Signed-off-by: Kushal Dulla <kdulla@qti.qualcomm.com>
@kdulla kdulla self-assigned this May 4, 2026
@kdulla kdulla added enhancement New feature or request qeff.blocking labels May 4, 2026
kdulla added 2 commits May 5, 2026 12:00
Signed-off-by: Kushal Dulla <kdulla@qti.qualcomm.com>
Signed-off-by: Kushal Dulla <kdulla@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request qeff.blocking

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant