Skip to content

mobcat40/sageattention-blackwell

Repository files navigation

SageAttention 2.2.0 for RTX 50-Series (Blackwell) + PyTorch 2.11 Nightly

Prebuilt wheels and build instructions for SageAttention 2.2.0 on Blackwell GPUs (sm_120).

Last updated: January 28, 2026 Built against: PyTorch 2.11.0.dev20260127

Compatibility Matrix

CUDA Version PyTorch Wheel Available Notes
cu128 (12.8) 2.11.x sageattention-2.2.0+cu128.torch2.11-cp311-cp311-win_amd64.whl Included in this repo
cu130 (13.x) 2.11.x sageattention-2.2.0+cu130.torch2.11-cp311-cp311-win_amd64.whl Included in this repo

Why cu130? ComfyUI's comfy-kitchen package (required for NVFP4/FP8 model support) requires CUDA 13+. If you want to run FP4-quantized models like qwen_image_nvfp4.safetensors, you need cu130.


Quick Start - cu128 (prebuilt wheel)

If you don't need FP4 model support:

# For venv installations:
path\to\venv\Scripts\python.exe -m pip install sageattention-2.2.0+cu128.torch2.11-cp311-cp311-win_amd64.whl

# For ComfyUI portable:
.\python_embeded\python.exe -m pip install sageattention-2.2.0+cu128.torch2.11-cp311-cp311-win_amd64.whl

Quick Start - cu130 (prebuilt wheel)

If you need FP4/FP8 model support with comfy-kitchen:

# For venv installations:
path\to\venv\Scripts\python.exe -m pip install sageattention-2.2.0+cu130.torch2.11-cp311-cp311-win_amd64.whl

# For ComfyUI portable:
.\python_embeded\python.exe -m pip install sageattention-2.2.0+cu130.torch2.11-cp311-cp311-win_amd64.whl

Building from Source (CUDA 13.x)

If the prebuilt wheel doesn't work, you can build from source.

Prerequisites

  1. CUDA Toolkit 13.x - Download from NVIDIA
  2. VS 2022 Build Tools - CUDA 13 doesn't support VS 2025 yet
  3. PyTorch 2.11 nightly cu130:
    pip install torch torchvision torchaudio --pre --index-url https://download.pytorch.org/whl/nightly/cu130

Step 1: Patch PyTorch Header

PyTorch 2.11 nightly has a bug that causes MSVC C2872: 'std' ambiguous symbol error.

Edit venv\Lib\site-packages\torch\include\torch\csrc\dynamo\compiled_autograd.h

Find lines ~1135-1136:

    } else if constexpr (::std::is_same_v<T, ::std::string>) {
      return at::StringType::get();

Comment them out:

    // PATCHED: commented out to fix MSVC C2872 ambiguous symbol error
    // } else if constexpr (::std::is_same_v<T, ::std::string>) {
    //   return at::StringType::get();

Step 2: Clone SageAttention

git clone https://github.com/thu-ml/SageAttention.git

Step 3: Build

Create build_sage.bat:

@echo off
cd /d "%~dp0"

REM Use VS 2022 Build Tools (not VS 2025)
call "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvars64.bat"

REM Set CUDA - change version as needed
set CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.1
set PATH=%CUDA_HOME%\bin;%PATH%

REM Fix for VC environment
set DISTUTILS_USE_SDK=1

REM Build SageAttention
cd SageAttention
D:\ComfyUI\venv\Scripts\python.exe -m pip install . --no-build-isolation

pause

Run it from a regular command prompt (not admin).

Step 4: Install comfy-kitchen

pip install comfy-kitchen

Step 5: Verify

python -c "import sageattention; print('SageAttention OK')"
python -c "import comfy_kitchen; print('comfy_kitchen OK')"

Using in ComfyUI

Option A - Global (all workflows): Add --use-sage-attention to your ComfyUI launch command.

Warning: This uses Triton backend which causes black output with some models (Qwen, Wan).

Option B - Per-workflow (recommended):

  1. Install ComfyUI-KJNodes
  2. Add "Patch Sage Attention" node to your workflow
  3. Set backend to sageattn_qk_int8_pv_fp16_cuda
  4. Connect it before your sampler

Performance

Tested on RTX 5090 Laptop (24GB):

Metric Without SageAttention With SageAttention
Speedup - ~35%

Troubleshooting

"DLL load failed" error

The wheel must match your exact PyTorch nightly version. These wheels were built against 2.11.0.dev20260127. If you're on a different nightly date, you'll need to rebuild from source (see instructions above).

A cu128 wheel won't work on cu130 PyTorch and vice versa.

Black/corrupted output with Qwen or Wan models

Don't use --use-sage-attention flag. Use KJNodes "Patch Sage Attention" node with sageattn_qk_int8_pv_fp16_cuda backend instead.

MSVC "ambiguous symbol 'std'" during build

Apply the PyTorch header patch described above.

nvcc not found during build

Make sure CUDA_HOME in your build script points to the correct CUDA version directory.


Why This Exists

RTX 50-series (Blackwell, sm_120) requires PyTorch 2.11 nightly. The official SageAttention wheels are built against older PyTorch versions and fail with DLL load errors on 2.11.

Additionally, NVFP4 model support in ComfyUI requires comfy-kitchen, which requires CUDA 13+. This repo provides prebuilt wheels for both cu128 and cu130 configurations.


Credits

License

SageAttention is licensed under Apache 2.0.

About

Prebuilt SageAttention 2.2.0 wheel for RTX 5090/5080/5070 (Blackwell sm_120) with PyTorch 2.11 nightly and CUDA 12.8. ~35% faster diffusion sampling. Includes build instructions for the PyTorch header patch fix.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors