Skip to content

Releases: dougeeai/llama-cpp-python-wheels

v0.3.20 — Ada Lovelace (sm_89), CUDA 13.0, Python 3.10–3.13

18 Apr 19:40
806790a

Choose a tag to compare

Pre-built llama-cpp-python wheel for Windows with CUDA 13.0 support — Ada Lovelace (sm_89).

Skip the build process entirely. This wheel is compiled and ready to install.

Requirements

  • OS: Windows 10/11 64-bit
  • Python: 3.10, 3.11, 3.12, or 3.13 (single wheel works across all — py3-none ABI, loaded via ctypes)
  • CUDA: CUDA 13.0 Toolkit installed (required for cublas64_13.dll — the NVIDIA driver alone does not ship this DLL)
  • Driver: NVIDIA Driver 580 or higher
  • GPU: NVIDIA RTX 4060, RTX 4060 Ti, RTX 4070, RTX 4070 Ti, RTX 4070 Ti Super, RTX 4080, RTX 4080 Super, RTX 4090, RTX 6000 Ada, RTX 5000 Ada, RTX 4500 Ada, RTX 4000 Ada, RTX 4000 SFF Ada, L40, L40S, L4
  • Architecture: Ada Lovelace — sm_89
  • VRAM: 8GB+ recommended

Installation

pip install llama_cpp_python-0.3.20+cuda13.0.sm89.ada-py3-none-win_amd64.whl

What This Solves

  • No Visual Studio required
  • No compilation errors
  • No "No CUDA toolset found" issues
  • No need to pick a wheel per Python version (3.10 → 3.13 all use this one)
  • Works immediately with GGUF models

Keywords

llama-cpp-python, 0.3.20, CUDA 13.0, Python 3.10, 3.11, 3.12, 3.13, Windows, RTX 4060, RTX 4060 Ti, RTX 4070, RTX 4070 Ti, RTX 4080, RTX 4090, RTX 6000 Ada, RTX 5000 Ada, RTX 4500 Ada, RTX 4000 Ada, L40, L40S, L4, Ada Lovelace, Ada, sm89, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10

v0.3.20 — Ampere (sm_86), CUDA 13.0, Python 3.10–3.13

18 Apr 19:39
806790a

Choose a tag to compare

Pre-built llama-cpp-python wheel for Windows with CUDA 13.0 support — Ampere (sm_86).

Skip the build process entirely. This wheel is compiled and ready to install.

Requirements

  • OS: Windows 10/11 64-bit
  • Python: 3.10, 3.11, 3.12, or 3.13 (single wheel works across all — py3-none ABI, loaded via ctypes)
  • CUDA: CUDA 13.0 Toolkit installed (required for cublas64_13.dll — the NVIDIA driver alone does not ship this DLL)
  • Driver: NVIDIA Driver 580 or higher
  • GPU: NVIDIA RTX 3060, RTX 3060 Ti, RTX 3070, RTX 3070 Ti, RTX 3080, RTX 3080 Ti, RTX 3090, RTX 3090 Ti, RTX A2000, RTX A4000, RTX A4500, RTX A5000, RTX A5500, RTX A6000
  • Architecture: Ampere — sm_86
  • VRAM: 8GB+ recommended

Installation

pip install llama_cpp_python-0.3.20+cuda13.0.sm86.ampere-py3-none-win_amd64.whl

What This Solves

  • No Visual Studio required
  • No compilation errors
  • No "No CUDA toolset found" issues
  • No need to pick a wheel per Python version (3.10 → 3.13 all use this one)
  • Works immediately with GGUF models

Keywords

llama-cpp-python, 0.3.20, CUDA 13.0, Python 3.10, 3.11, 3.12, 3.13, Windows, RTX 3060, RTX 3060 Ti, RTX 3070, RTX 3070 Ti, RTX 3080, RTX 3080 Ti, RTX 3090, RTX 3090 Ti, RTX A2000, RTX A4000, RTX A4500, RTX A5000, RTX A5500, RTX A6000, Ampere, sm86, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10

v0.3.20 — Turing (sm_75), CUDA 13.0, Python 3.10–3.13

18 Apr 19:38
806790a

Choose a tag to compare

Pre-built llama-cpp-python wheel for Windows with CUDA 13.0 support — Turing (sm_75).

Skip the build process entirely. This wheel is compiled and ready to install.

Requirements

  • OS: Windows 10/11 64-bit
  • Python: 3.10, 3.11, 3.12, or 3.13 (single wheel works across all — py3-none ABI, loaded via ctypes)
  • CUDA: CUDA 13.0 Toolkit installed (required for cublas64_13.dll — the NVIDIA driver alone does not ship this DLL)
  • Driver: NVIDIA Driver 580 or higher
  • GPU: NVIDIA RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060, TITAN RTX, GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, Quadro RTX 8000, RTX 6000, RTX 5000, RTX 4000, Tesla T4
  • Architecture: Turing — sm_75
  • VRAM: 8GB+ recommended

Installation

pip install llama_cpp_python-0.3.20+cuda13.0.sm75.turing-py3-none-win_amd64.whl

What This Solves

  • No Visual Studio required
  • No compilation errors
  • No "No CUDA toolset found" issues
  • No need to pick a wheel per Python version (3.10 → 3.13 all use this one)
  • Works immediately with GGUF models

Keywords

llama-cpp-python, 0.3.20, CUDA 13.0, Python 3.10, 3.11, 3.12, 3.13, Windows, RTX 2080 Ti, RTX 2080, RTX 2070, RTX 2060, TITAN RTX, GTX 1660, GTX 1650, Quadro RTX 8000, Tesla T4, Turing, sm75, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10

v0.3.20 — Blackwell (sm_100 + sm_120), CUDA 13.0, Python 3.10–3.13

18 Apr 19:43
806790a

Choose a tag to compare

Pre-built llama-cpp-python wheel for Windows with CUDA 13.0 support — all Blackwell GPUs in a single wheel (datacenter sm_100 + consumer/workstation sm_120).

Skip the build process entirely. This wheel is compiled and ready to install.

Requirements

  • OS: Windows 10/11 64-bit
  • Python: 3.10, 3.11, 3.12, or 3.13 (single wheel works across all — py3-none ABI, loaded via ctypes)
  • CUDA: CUDA 13.0 Toolkit installed (required for cublas64_13.dll — the NVIDIA driver alone does not ship this DLL)
  • Driver: NVIDIA Driver 580 or higher
  • GPU: NVIDIA RTX 5090, 5080, 5070 Ti, 5070, 5060 Ti, 5060, 5050, RTX 5090 Laptop, RTX 5080 Laptop, RTX 5070 Ti Laptop, RTX 5070 Laptop, RTX 5060 Laptop, RTX 5050 Laptop, RTX PRO 6000 Blackwell Workstation Edition, RTX PRO 6000 Blackwell Max-Q, RTX PRO 6000 Blackwell Server Edition, RTX PRO 5000 Blackwell, RTX PRO 4500 Blackwell, RTX PRO 4000 Blackwell, RTX PRO 4000 SFF Blackwell, RTX PRO 2000 Blackwell, RTX PRO 5000 Blackwell Laptop, RTX PRO 4000 Blackwell Laptop, RTX PRO 3000 Blackwell Laptop, RTX PRO 2000 Blackwell Laptop, RTX PRO 1000 Blackwell Laptop, RTX PRO 500 Blackwell Laptop, B100, B200, B300 (Blackwell Ultra), GB200, GB300
  • Architecture: Blackwell — sm_100 (datacenter) + sm_120 (consumer/workstation) compiled into one wheel
  • VRAM: 8GB+ recommended

Installation

pip install llama_cpp_python-0.3.20+cuda13.0.sm100.sm120.blackwell-py3-none-win_amd64.whl

What This Solves

  • No Visual Studio required
  • No compilation errors
  • No "No CUDA toolset found" issues
  • No need to pick a wheel per Python version (3.10 → 3.13 all use this one)
  • No need to pick a wheel per Blackwell product line (consumer, workstation, datacenter all supported)
  • Works immediately with GGUF models

Keywords

llama-cpp-python, 0.3.20, CUDA 13.0, Python 3.10, 3.11, 3.12, 3.13, Windows, RTX 5090, RTX 5080, RTX 5070 Ti, RTX 5070, RTX 5060, RTX 5050, RTX PRO 6000 Blackwell, RTX PRO 5000 Blackwell, RTX PRO 4500 Blackwell, RTX PRO 4000 Blackwell, RTX PRO 2000 Blackwell, B100, B200, B300, Blackwell Ultra, GB200, GB300, Blackwell, sm100, sm120, consumer Blackwell, datacenter Blackwell, workstation Blackwell, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10

v0.3.20 — Ampere (sm_86), CUDA 12.1, Python 3.10–3.13

18 Apr 21:00
e94ee47

Choose a tag to compare

Pre-built llama-cpp-python wheel for Windows with CUDA 12.1 support - Ampere (sm_86). Replaces the earlier 0.3.16 cuda12.1.sm86.ampere wheels (cp310/cp311/cp312) that were accidentally linked against cublas64_13.dll.

Skip the build process entirely. This wheel is compiled and ready to install.

Requirements

  • OS: Windows 10/11 64-bit
  • Python: 3.10, 3.11, 3.12, or 3.13 (single wheel works across all — py3-none ABI, loaded via ctypes)
  • CUDA: any CUDA 12.x Toolkit installed (12.1, 12.4, 12.6, 12.8, or 12.9 — all ship the compatible cublas64_12.dll)
  • Driver: NVIDIA Driver 525.60.13 or higher
  • GPU: NVIDIA RTX 3060, RTX 3060 Ti, RTX 3070, RTX 3070 Ti, RTX 3080, RTX 3080 Ti, RTX 3090, RTX 3090 Ti, RTX A2000, RTX A4000, RTX A4500, RTX A5000, RTX A5500, RTX A6000
  • Architecture: Ampere — sm_86
  • VRAM: 8GB+ recommended

Installation

pip install llama_cpp_python-0.3.20+cuda12.1.sm86.ampere-py3-none-win_amd64.whl

What This Solves

  • No Visual Studio required
  • No compilation errors
  • No "No CUDA toolset found" issues
  • No need to pick a wheel per Python version (3.10 → 3.13 all use this one)
  • Works immediately with GGUF models
  • Correctly links cublas64_12.dll (the prior 0.3.16 cuda12.1 wheels mistakenly linked cublas64_13.dll — this wheel fixes that)

Keywords

llama-cpp-python, 0.3.20, CUDA 12.1, CUDA 12.x, Python 3.10, 3.11, 3.12, 3.13, Windows, RTX 3060, RTX 3060 Ti, RTX 3070, RTX 3070 Ti, RTX 3080, RTX 3080 Ti, RTX 3090, RTX 3090 Ti, RTX A2000, RTX A4000, RTX A4500, RTX A5000, RTX A5500, RTX A6000, Ampere, sm86, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10


Build note

Compiled with the CUDA 12.8 Toolkit (rather than 12.1) because VS 2022's current MSVC STL (14.44+) requires CUDA 12.4 or newer and rejects 12.1 at compile time with STL1002. The resulting binary links cublas64_12.dll, whose SONAME is stable across the entire CUDA 12.x line — so this wheel loads correctly on any CUDA 12.x install, including 12.1. Labeled as cuda12.1 to match the wheel it replaces.

llama-cpp-python 0.3.16 + CUDA 13.0 sm75 Turing - Python 3.13 - Windows x64

09 Nov 20:36
8ad6f64

Choose a tag to compare

Pre-built llama-cpp-python wheel for Windows with CUDA 13.0 support

Skip the build process entirely. This wheel is compiled and ready to install.

Requirements

  • OS: Windows 10/11 64-bit
  • Python: 3.13.x (exact version required)
  • CUDA: 13.0 (Toolkit not needed, just driver)
  • Driver: NVIDIA Driver 580 or higher
  • GPU: NVIDIA GeForce RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060 12GB, RTX 2060, TITAN RTX, GeForce RTX 2080 Super Laptop, RTX 2080 Super Max-Q, RTX 2080 Laptop, RTX 2080 Max-Q, RTX 2070 Super Laptop, RTX 2070 Super Max-Q, RTX 2070 Laptop, RTX 2070 Max-Q, RTX 2060 Laptop, RTX 2060 Max-Q, GeForce GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, GTX 1660 Ti Laptop, GTX 1660 Ti Max-Q, GTX 1650 Ti Laptop, GTX 1650 Ti Max-Q, GTX 1650 Laptop, GTX 1650 Max-Q, GTX 1630 Laptop, Quadro RTX 8000, RTX 6000, RTX 5000, RTX 4000, RTX 3000, Quadro T2000, T1200, T1000, T600, T550, T500, T400, Tesla T40, T10, T4
  • Architecture: Turing (sm_75)
  • VRAM: 4GB+ recommended

Installation

pip install llama_cpp_python-0.3.16+cuda13.0.sm75.turing-cp313-cp313-win_amd64.whl

What This Solves

  • No Visual Studio required
  • No CUDA Toolkit installation needed
  • No compilation errors
  • No "No CUDA toolset found" issues
  • Works immediately with GGUF models

Keywords

llama-cpp-python, CUDA 13.0, Python 3.13, Windows, RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060, TITAN RTX, GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, Quadro RTX 8000, Quadro RTX 6000, Quadro RTX 5000, Quadro RTX 4000, Tesla T4, Turing, sm75, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10

llama-cpp-python 0.3.16 + CUDA 13.0 sm75 Turing - Python 3.12 - Windows x64

09 Nov 20:35
8ad6f64

Choose a tag to compare

Pre-built llama-cpp-python wheel for Windows with CUDA 13.0 support

Skip the build process entirely. This wheel is compiled and ready to install.

Requirements

  • OS: Windows 10/11 64-bit
  • Python: 3.12.x (exact version required)
  • CUDA: 13.0 (Toolkit not needed, just driver)
  • Driver: NVIDIA Driver 580 or higher
  • GPU: NVIDIA GeForce RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060 12GB, RTX 2060, TITAN RTX, GeForce RTX 2080 Super Laptop, RTX 2080 Super Max-Q, RTX 2080 Laptop, RTX 2080 Max-Q, RTX 2070 Super Laptop, RTX 2070 Super Max-Q, RTX 2070 Laptop, RTX 2070 Max-Q, RTX 2060 Laptop, RTX 2060 Max-Q, GeForce GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, GTX 1660 Ti Laptop, GTX 1660 Ti Max-Q, GTX 1650 Ti Laptop, GTX 1650 Ti Max-Q, GTX 1650 Laptop, GTX 1650 Max-Q, GTX 1630 Laptop, Quadro RTX 8000, RTX 6000, RTX 5000, RTX 4000, RTX 3000, Quadro T2000, T1200, T1000, T600, T550, T500, T400, Tesla T40, T10, T4
  • Architecture: Turing (sm_75)
  • VRAM: 4GB+ recommended

Installation

pip install llama_cpp_python-0.3.16+cuda13.0.sm75.turing-cp312-cp312-win_amd64.whl

What This Solves

  • No Visual Studio required
  • No CUDA Toolkit installation needed
  • No compilation errors
  • No "No CUDA toolset found" issues
  • Works immediately with GGUF models

Keywords

llama-cpp-python, CUDA 13.0, Python 3.12, Windows, RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060, TITAN RTX, GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, Quadro RTX 8000, Quadro RTX 6000, Quadro RTX 5000, Quadro RTX 4000, Tesla T4, Turing, sm75, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10

llama-cpp-python 0.3.16 + CUDA 13.0 sm75 Turing - Python 3.11 - Windows x64

09 Nov 20:34
8ad6f64

Choose a tag to compare

Pre-built llama-cpp-python wheel for Windows with CUDA 13.0 support

Skip the build process entirely. This wheel is compiled and ready to install.

Requirements

  • OS: Windows 10/11 64-bit
  • Python: 3.11.x (exact version required)
  • CUDA: 13.0 (Toolkit not needed, just driver)
  • Driver: NVIDIA Driver 580 or higher
  • GPU: NVIDIA GeForce RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060 12GB, RTX 2060, TITAN RTX, GeForce RTX 2080 Super Laptop, RTX 2080 Super Max-Q, RTX 2080 Laptop, RTX 2080 Max-Q, RTX 2070 Super Laptop, RTX 2070 Super Max-Q, RTX 2070 Laptop, RTX 2070 Max-Q, RTX 2060 Laptop, RTX 2060 Max-Q, GeForce GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, GTX 1660 Ti Laptop, GTX 1660 Ti Max-Q, GTX 1650 Ti Laptop, GTX 1650 Ti Max-Q, GTX 1650 Laptop, GTX 1650 Max-Q, GTX 1630 Laptop, Quadro RTX 8000, RTX 6000, RTX 5000, RTX 4000, RTX 3000, Quadro T2000, T1200, T1000, T600, T550, T500, T400, Tesla T40, T10, T4
  • Architecture: Turing (sm_75)
  • VRAM: 4GB+ recommended

Installation

pip install llama_cpp_python-0.3.16+cuda13.0.sm75.turing-cp311-cp311-win_amd64.whl

What This Solves

  • No Visual Studio required
  • No CUDA Toolkit installation needed
  • No compilation errors
  • No "No CUDA toolset found" issues
  • Works immediately with GGUF models

Keywords

llama-cpp-python, CUDA 13.0, Python 3.11, Windows, RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060, TITAN RTX, GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, Quadro RTX 8000, Quadro RTX 6000, Quadro RTX 5000, Quadro RTX 4000, Tesla T4, Turing, sm75, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10

llama-cpp-python 0.3.16 + CUDA 13.0 sm75 Turing - Python 3.10 - Windows x64

09 Nov 20:33
8ad6f64

Choose a tag to compare

Pre-built llama-cpp-python wheel for Windows with CUDA 13.0 support

Skip the build process entirely. This wheel is compiled and ready to install.

Requirements

  • OS: Windows 10/11 64-bit
  • Python: 3.10.x (exact version required)
  • CUDA: 13.0 (Toolkit not needed, just driver)
  • Driver: NVIDIA Driver 580 or higher
  • GPU: NVIDIA GeForce RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060 12GB, RTX 2060, TITAN RTX, GeForce RTX 2080 Super Laptop, RTX 2080 Super Max-Q, RTX 2080 Laptop, RTX 2080 Max-Q, RTX 2070 Super Laptop, RTX 2070 Super Max-Q, RTX 2070 Laptop, RTX 2070 Max-Q, RTX 2060 Laptop, RTX 2060 Max-Q, GeForce GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, GTX 1660 Ti Laptop, GTX 1660 Ti Max-Q, GTX 1650 Ti Laptop, GTX 1650 Ti Max-Q, GTX 1650 Laptop, GTX 1650 Max-Q, GTX 1630 Laptop, Quadro RTX 8000, RTX 6000, RTX 5000, RTX 4000, RTX 3000, Quadro T2000, T1200, T1000, T600, T550, T500, T400, Tesla T40, T10, T4
  • Architecture: Turing (sm_75)
  • VRAM: 4GB+ recommended

Installation

pip install llama_cpp_python-0.3.16+cuda13.0.sm75.turing-cp310-cp310-win_amd64.whl

What This Solves

  • No Visual Studio required
  • No CUDA Toolkit installation needed
  • No compilation errors
  • No "No CUDA toolset found" issues
  • Works immediately with GGUF models

Keywords

llama-cpp-python, CUDA 13.0, Python 3.10, Windows, RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060, TITAN RTX, GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, Quadro RTX 8000, Quadro RTX 6000, Quadro RTX 5000, Quadro RTX 4000, Tesla T4, Turing, sm75, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10

llama-cpp-python 0.3.16 + CUDA 13.0 sm100 Blackwell - Python 3.13 - Windows x64

09 Nov 20:24
8ad6f64

Choose a tag to compare

Pre-built llama-cpp-python wheel for Windows with CUDA 13.0 support — Datacenter Blackwell (sm_100) only.

Skip the build process entirely. This wheel is compiled and ready to install.

Requirements

  • OS: Windows 10/11 64-bit
  • Python: 3.13.x (exact version required)
  • CUDA: CUDA 13.0 Toolkit installed (required for cublas64_13.dll — the NVIDIA driver alone does not ship this DLL)
  • Driver: NVIDIA Driver 580 or higher
  • GPU: NVIDIA B100, B200, B300 (Blackwell Ultra), GB200, GB300
  • Architecture: Blackwell — sm_100 (datacenter only)
  • VRAM: 8GB+ recommended

Note: Consumer and workstation Blackwell cards (RTX 50 series, RTX PRO 6000/5000/4500/4000 Blackwell) use sm_120, not sm_100, and will not run natively on this wheel. A separate sm_120 wheel is provided in its own release.

Installation

pip install llama_cpp_python-0.3.16+cuda13.0.sm100.blackwell-cp313-cp313-win_amd64.whl

What This Solves

  • No Visual Studio required
  • No compilation errors
  • No "No CUDA toolset found" issues
  • Works immediately with GGUF models

Keywords

llama-cpp-python, CUDA 13.0, Python 3.13, Windows, B100, B200, B300, Blackwell Ultra, GB200, GB300, datacenter Blackwell, Blackwell, sm100, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10