llama.cpp CUDA Builds

This repository automatically builds llama.cpp with CUDA support for multiple NVIDIA GPU architectures and CUDA versions.

Why This Repository?

The official llama.cpp repository does not provide pre-built CUDA binaries. This repository fills that gap by:

Building llama.cpp with CUDA support for multiple CUDA toolkit versions
Supporting a wide range of NVIDIA GPU architectures (compute capability 7.5+)
Automatically tracking upstream llama.cpp releases
Providing ready-to-use binaries via GitHub releases

Supported Configurations

CUDA Versions

CUDA 12.8

GPU Architectures

| Compute Capability | GPU Examples | |-------------------|--------------|----------------|------------| | 6.1 | Titan XP, Tesla P40, GTX 10xx | | 7.0 | Tesla V100 | | 7.5 | Tesla T4, RTX 2000 series, Quadro RTX | | 8.0 | A100 | | 8.6 | RTX 3000 series | | 8.9 | RTX 4000 series, L4, L40 | | 9.0 | H100, H200 | | 10.0 | B200 | | 12.0 | RTX Pro series, RTX 5000 series |

Usage

Download

Go to the Releases page
Download the tarball (e.g., llama.cpp-bXXXX-cuda-12.8.tar.gz)
Extract the archive:

tar -xzf llama.cpp-bXXXX-cuda-12.8.tar.gz
cd cuda-12.6

Run

The extracted directory contains all llama.cpp binaries:

# Run the main CLI
./llama-cli --help

# Run the server
./llama-server --help

# Other utilities
./llama-bench
./llama-quantize
./llama-embedding

Check Version

Each release includes a VERSION.txt file with build information:

cat VERSION.txt

System Requirements

NVIDIA GPU with compute capability 7.5 or higher
Appropriate NVIDIA driver for your CUDA version:
- CUDA 12.8+: Driver >= 570.15
Linux x86_64 (Ubuntu 22.04 compatible)

Build Process

Builds are triggered automatically:

Daily at 00:00 UTC
Only if a new llama.cpp release is detected
Can be manually triggered via GitHub Actions

Each build:

Checks for new llama.cpp releases
Clones llama.cpp at the exact release commit
Builds with CMake using CUDA Docker images
Packages binaries for each CUDA version
Creates a GitHub release with all build artifacts

Choosing Your CUDA Version

Select based on:

Your GPU architecture - Blackwell GPUs require CUDA 12.8+
Your installed CUDA toolkit - Match the version if possible
Your NVIDIA driver - Ensure your driver supports the CUDA version

If unsure, CUDA 12.6.3 offers the widest compatibility with modern GPUs (except Blackwell).

Manual Building

If you need a custom build:

git clone https://github.com/ai-dock/llama.cpp-cuda
cd llama.cpp-cuda

# Edit .github/workflows/build-cuda.yml to customize architectures or CUDA versions
# Then trigger a manual workflow run

License

This repository contains build scripts only. The llama.cpp binaries are subject to the llama.cpp MIT License.

Support

For issues with:

Build process or binaries: Open an issue in this repository
llama.cpp functionality: Open an issue in the upstream repository

Credits

llama.cpp by Georgi Gerganov and contributors
Built and maintained by ai-dock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.cpp CUDA Builds

Why This Repository?

Supported Configurations

CUDA Versions

GPU Architectures

Usage

Download

Run

Check Version

System Requirements

Build Process

Choosing Your CUDA Version

Manual Building

License

Links

Support

Credits

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

llama.cpp CUDA Builds

Why This Repository?

Supported Configurations

CUDA Versions

GPU Architectures

Usage

Download

Run

Check Version

System Requirements

Build Process

Choosing Your CUDA Version

Manual Building

License

Links

Support

Credits