Skip to content

Sage fast sfm#1843

Merged
luoyu-intel merged 15 commits into
mainfrom
sage_fast_sfm
May 25, 2026
Merged

Sage fast sfm#1843
luoyu-intel merged 15 commits into
mainfrom
sage_fast_sfm

Conversation

@luoyu-intel
Copy link
Copy Markdown
Contributor

@luoyu-intel luoyu-intel commented May 21, 2026

~5% performance speedup, 95TOPS->100TOPS

@luoyu-intel luoyu-intel force-pushed the sage_fast_sfm branch 2 times, most recently from d8644ba to 370a957 Compare May 21, 2026 09:13
@luoyu-intel luoyu-intel marked this pull request as ready for review May 21, 2026 09:18
Copilot AI review requested due to automatic review settings May 21, 2026 09:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR updates SAGEv1 SDPA benchmarking and optimizes parts of the SAGEv1 forward softmax path using SIMD32-packed inline vISA assembly helpers.

Changes:

  • Added new SAGEv1 accuracy case and expanded benchmarks, including an int8(Q/K)+pvhalf(V) kernel benchmark path.
  • Added dynamic quantization + scale buffers to support benchmarking sdpa_impl_qks8_pvhalf.
  • Introduced SIMD32 “pairwise” inline-assembly helpers and used them in the softmax step to reduce instruction count.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
auto_round_extension/ark/auto_round_kernel/wrapper/test/test_sdpa.hpp Extends tests/benchmarks to include causal accuracy coverage and an additional int8(Q/K) kernel benchmark path (with dynamic quant + scale buffers).
auto_round_extension/ark/auto_round_kernel/wrapper/include/stla/xe_sagev1_fwd_mainloop.hpp Adds SIMD32-packed inline assembly helpers (exp/max/mul/add) and integrates them into softmax to speed up per-row max/rescale and exponentiation.

@luoyu-intel
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@luoyu-intel luoyu-intel requested a review from a32543254 May 25, 2026 05:59
Copy link
Copy Markdown

@a32543254 a32543254 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luoyu-intel luoyu-intel merged commit d52c2a1 into main May 25, 2026
38 checks passed
@luoyu-intel luoyu-intel deleted the sage_fast_sfm branch May 25, 2026 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants