add smooth K for sagev1 by luoyu-intel · Pull Request #1806 · intel/auto-round

luoyu-intel · 2026-05-13T01:49:57Z

Description

support smooth K before dynamic quant
add sdpa or SageAttantionV1 patch:

patch_torch_sdpa(
        strict=True,
        backend=backend, # sdpa or sagev1
        quant_block_size=quant_block_size, # 64 128 256
    )

Copilot

Pull request overview

This PR adds a “smooth K” path for SAGEv1 by optionally subtracting a per-(row, head_dim) mean bias from K before INT8 dynamic quantization, and introduces a Torch SDPA patch + lm-eval launcher so models can be evaluated with ARK attention without editing model code.

Changes:

Add mean-bias computation (compute_seq_mean_bias) and bias-aware dynamic quantization for SAGEv1’s K path (env-controlled).
Add a Torch SDPA global patch (patch_torch_sdpa_with_ark) and a helper launcher to run lm-eval with the patch enabled.
Add/adjust ARK UT scaffolding and CMake wiring for SDPA/SAGE-related tests and benchmarks.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
auto_round_extension/ark/tools/lm_eval_with_ark_sdpa.py	Helper launcher that patches Torch SDPA then runs `lm_eval`.
auto_round_extension/ark/README.md	Documents using the SDPA patch + lm-eval helper.
auto_round_extension/ark/auto_round_kernel/wrapper/test/test_sdpa.hpp	Adds SAGEv1 SDPA-focused UT/benchmark code.
auto_round_extension/ark/auto_round_kernel/wrapper/test/test_quant.hpp	Adds UT/bench coverage for mean-bias + dynamic quantization.
auto_round_extension/ark/auto_round_kernel/wrapper/test/test_main.cpp	Switches UT entrypoint to construct selected test suites.
auto_round_extension/ark/auto_round_kernel/wrapper/test/test_gemm.hpp	Removes static test auto-run behavior.
auto_round_extension/ark/auto_round_kernel/wrapper/include/xpu_wrapper.hpp	Implements mean-bias and bias-aware K quantization in SAGEv1.
auto_round_extension/ark/auto_round_kernel/wrapper/include/utils.hpp	Adds env toggles for mean-bias and optional bias distribution logging.
auto_round_extension/ark/auto_round_kernel/torch_sdpa_patch.py	Implements global patching of `torch.nn.functional.scaled_dot_product_attention`.
auto_round_extension/ark/auto_round_kernel/CMakeLists.txt	Updates UT build sources/includes/options for SYCL/TLA SDPA.
auto_round_extension/ark/auto_round_kernel/ark.cpp	Extends `sage_dynamic_quant` pybind API to accept an optional bias buffer.
auto_round_extension/ark/auto_round_kernel/init.py	Exposes patch/unpatch helpers; updates `sage_dynamic_quant` calls for new signature.
auto_round_extension/ark/.gitignore	Ignores local build artifacts and CSV outputs in the ARK subdir.

for more information, see https://pre-commit.ci

a32543254

LGTM

add smooth K for sagev1

dafadad

luoyu-intel marked this pull request as ready for review May 13, 2026 03:25

Copilot AI review requested due to automatic review settings May 13, 2026 03:25

Copilot started reviewing on behalf of luoyu-intel May 13, 2026 03:25 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread auto_round_extension/ark/auto_round_kernel/torch_sdpa_patch.py

Comment thread auto_round_extension/ark/auto_round_kernel/wrapper/test/test_sdpa.hpp

Comment thread auto_round_extension/ark/auto_round_kernel/wrapper/test/test_main.cpp

Comment thread auto_round_extension/ark/README.md

luoyu-intel requested a review from a32543254 May 13, 2026 06:22

luoyu-intel and others added 2 commits May 13, 2026 06:24

update lm-eval usage

307be41

[pre-commit.ci] auto fixes from pre-commit.com hooks

8e4d397

for more information, see https://pre-commit.ci

a32543254 approved these changes May 13, 2026

View reviewed changes

luoyu-intel and others added 4 commits May 13, 2026 07:40

add print log

01ae508

Merge branch 'main' into sage_v1

255afb4

Merge branch 'main' into sage_v1

cccd153

Merge branch 'main' into sage_v1

41634b3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add smooth K for sagev1#1806

add smooth K for sagev1#1806
luoyu-intel wants to merge 7 commits into
mainfrom
sage_v1

luoyu-intel commented May 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

a32543254 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

luoyu-intel commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

a32543254 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

luoyu-intel commented May 13, 2026 •

edited

Loading