Skip to content

add smooth K for sagev1#1806

Open
luoyu-intel wants to merge 7 commits into
mainfrom
sage_v1
Open

add smooth K for sagev1#1806
luoyu-intel wants to merge 7 commits into
mainfrom
sage_v1

Conversation

@luoyu-intel
Copy link
Copy Markdown
Contributor

@luoyu-intel luoyu-intel commented May 13, 2026

Description

  1. support smooth K before dynamic quant
  2. add sdpa or SageAttantionV1 patch:
patch_torch_sdpa(
        strict=True,
        backend=backend, # sdpa or sagev1
        quant_block_size=quant_block_size, # 64 128 256
    )

@luoyu-intel luoyu-intel marked this pull request as ready for review May 13, 2026 03:25
Copilot AI review requested due to automatic review settings May 13, 2026 03:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a “smooth K” path for SAGEv1 by optionally subtracting a per-(row, head_dim) mean bias from K before INT8 dynamic quantization, and introduces a Torch SDPA patch + lm-eval launcher so models can be evaluated with ARK attention without editing model code.

Changes:

  • Add mean-bias computation (compute_seq_mean_bias) and bias-aware dynamic quantization for SAGEv1’s K path (env-controlled).
  • Add a Torch SDPA global patch (patch_torch_sdpa_with_ark) and a helper launcher to run lm-eval with the patch enabled.
  • Add/adjust ARK UT scaffolding and CMake wiring for SDPA/SAGE-related tests and benchmarks.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
auto_round_extension/ark/tools/lm_eval_with_ark_sdpa.py Helper launcher that patches Torch SDPA then runs lm_eval.
auto_round_extension/ark/README.md Documents using the SDPA patch + lm-eval helper.
auto_round_extension/ark/auto_round_kernel/wrapper/test/test_sdpa.hpp Adds SAGEv1 SDPA-focused UT/benchmark code.
auto_round_extension/ark/auto_round_kernel/wrapper/test/test_quant.hpp Adds UT/bench coverage for mean-bias + dynamic quantization.
auto_round_extension/ark/auto_round_kernel/wrapper/test/test_main.cpp Switches UT entrypoint to construct selected test suites.
auto_round_extension/ark/auto_round_kernel/wrapper/test/test_gemm.hpp Removes static test auto-run behavior.
auto_round_extension/ark/auto_round_kernel/wrapper/include/xpu_wrapper.hpp Implements mean-bias and bias-aware K quantization in SAGEv1.
auto_round_extension/ark/auto_round_kernel/wrapper/include/utils.hpp Adds env toggles for mean-bias and optional bias distribution logging.
auto_round_extension/ark/auto_round_kernel/torch_sdpa_patch.py Implements global patching of torch.nn.functional.scaled_dot_product_attention.
auto_round_extension/ark/auto_round_kernel/CMakeLists.txt Updates UT build sources/includes/options for SYCL/TLA SDPA.
auto_round_extension/ark/auto_round_kernel/ark.cpp Extends sage_dynamic_quant pybind API to accept an optional bias buffer.
auto_round_extension/ark/auto_round_kernel/init.py Exposes patch/unpatch helpers; updates sage_dynamic_quant calls for new signature.
auto_round_extension/ark/.gitignore Ignores local build artifacts and CSV outputs in the ARK subdir.

Comment thread auto_round_extension/ark/auto_round_kernel/torch_sdpa_patch.py
Comment thread auto_round_extension/ark/README.md
@luoyu-intel luoyu-intel requested a review from a32543254 May 13, 2026 06:22
Copy link
Copy Markdown

@a32543254 a32543254 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants