python-wheels

A GitHub-hosted PyPI index for pre-compiled Python wheels of CUDA-enabled packages. Build once, install anywhere - no local compilation required.

Why?

Building packages like flash-attn and xformers from source takes 30+ minutes and requires CUDA toolkit, build tools, and significant CPU/memory. This repository:

✅ Pre-builds wheels for multiple Python and CUDA versions
✅ Hosts a PEP 503 index on GitHub Pages
✅ Eliminates local compilation - just pip install
✅ Supports CUDA 12.x and 13.x via PyTorch nightly builds

Supported Packages

Current packages available:

flash-attn 2.8.3

GPU Compatibility: Wheels are compiled for CUDA compute capability 8.6 ONLY (RTX 3080/3090/3090Ti, A100). This limitation is necessary to fit compilation within GitHub Actions runner memory constraints (7GB RAM). Other GPUs will NOT work.

See config/packages.yml for full configuration.

Installation

Install pre-built wheels using pip's --extra-index-url:

# Install PyTorch first (required for CUDA packages)
# For CUDA 12.9:
pip install torch --index-url https://download.pytorch.org/whl/cu129

# For CUDA 13.0:
pip install torch --index-url https://download.pytorch.org/whl/cu130

# Then install the package from this index
pip install flash-attn --extra-index-url https://DEVtheOPS.github.io/python-wheels/simple/

Available Configurations

Package	Version	Python	CUDA
flash-attn	2.8.3	3.12, 3.13	12.9.1, 13.0.2

Triggering a Build

Via GitHub Actions UI

Go to the Actions tab
Click "Build Wheels" workflow
Click "Run workflow" button
(Optional) Customize parameters:
- Package: Leave empty to build all, or specify one (e.g., flash-attn)
- Python versions: Comma-separated (default: 3.12,3.13)
- CUDA versions: Comma-separated (default: 12.9.1,13.0.2)
Click "Run workflow"

Automatic Builds

Builds trigger automatically when you:

Push changes to config/packages.yml on the main branch

Via GitHub CLI

# Build all packages with defaults
gh workflow run build-wheels.yml

# Build specific package
gh workflow run build-wheels.yml -f package=flash-attn

# Build for specific Python version
gh workflow run build-wheels.yml -f python_versions=3.12

# Build for specific CUDA version
gh workflow run build-wheels.yml -f cuda_versions=13.0.2

# Combine options
gh workflow run build-wheels.yml \
  -f package=flash-attn \
  -f python_versions=3.12 \
  -f cuda_versions=13.0.2

Adding New Packages

Edit config/packages.yml:

packages:
  my-package:
    versions: ["1.0.0"]
    build_args: "--no-build-isolation"  # Optional
    extra_deps: ["torch", "ninja"]       # Build dependencies
    test_import: "my_package"            # Module name for import test
    description: "My CUDA package"       # Human-readable description

Commit and push to main branch (triggers automatic build)
Or manually trigger workflow via Actions UI

Project Structure

python-wheels/
├── .github/workflows/
│   └── build-wheels.yml      # CI/CD workflow
├── config/
│   └── packages.yml           # Package definitions
├── scripts/
│   ├── build_in_docker.sh     # Docker build script
│   ├── generate_index.py      # PyPI index generator
│   └── test_build_local.sh    # Local testing script
├── tests/
│   └── test_generate_index.py # Unit tests
├── AGENTS.md                  # Technical documentation for AI assistants
├── CONTRIBUTING.md            # Development guide
└── README.md                  # This file

How It Works

Matrix Generation: Workflow reads config/packages.yml and generates build matrix
Docker Build: Each combination builds in CUDA Docker container
Wheel Creation: pip wheel compiles package with CUDA support
Import Test: Verifies wheel loads successfully in clean environment
Release: Creates GitHub release with wheels attached
Index Generation: Generates PEP 503 index HTML
GitHub Pages: Deploys index for pip consumption

Architecture

Build Environment: nvidia/cuda:{VERSION}-devel-ubuntu22.04
Test Environment: nvidia/cuda:{VERSION}-runtime-ubuntu22.04
Python: Installed from deadsnakes PPA
PyTorch: Version-matched to CUDA (12.x stable, 13.x nightly)
Index Format: PEP 503 compliant, hosted on GitHub Pages

Troubleshooting

Build fails with disk space error

GitHub Actions runners have limited space. The workflow includes cleanup steps, but you can:

Build fewer packages at once using the package parameter
Reduce Python/CUDA version combinations

Build timeout / runner lost communication

flash-attn compilation is extremely CPU/memory intensive. The workflow runs builds sequentially (max-parallel: 1) to prevent overwhelming runners. This means:

⏱️ Each build takes 60-90 minutes
⏱️ All 4 combinations take ~4-6 hours total
✅ Builds complete successfully without timeouts

To speed up for testing:

Build one package at a time using the package parameter
Build one Python version at a time
Build one CUDA version at a time

Import test fails

Common causes:

Missing runtime dependencies: Package needs PyTorch + numpy installed
CUDA version mismatch: Ensure PyTorch CUDA version matches wheel's CUDA version
Wrong PyTorch version: CUDA 13.x requires PyTorch nightly from cu130 index

See AGENTS.md for detailed troubleshooting.

Local Testing

You can test wheel builds locally before pushing to CI:

Prerequisites

Docker or Podman installed
Python 3.12+ with PyYAML (pip install pyyaml)

Build and Test a Wheel

# Using Docker (default)
./scripts/test_build_local.sh docker flash-attn 2.8.3 3.12 12.9.1

# Using Podman
./scripts/test_build_local.sh podman flash-attn 2.8.3 3.12 13.0.2

# With defaults (flash-attn 2.8.3, Python 3.12, CUDA 12.9.1)
./scripts/test_build_local.sh

The script will:

✅ Read package config from config/packages.yml
✅ Pull CUDA Docker image
✅ Build the wheel in Docker
✅ Run import test in clean container
✅ Report success/failure

Wheels are output to ./wheels/ directory.

Test Index Generation

# Generate index HTML from wheels directory
python scripts/generate_index.py \
  --wheels-dir wheels \
  --output-dir index/simple \
  --base-url "https://github.com/USER/REPO/releases/download/TAG"

# View generated index
ls -R index/simple/

Run Unit Tests

# Run all tests
python -m unittest discover -s tests -p "test_*.py"

# Run specific test
python -m unittest tests.test_generate_index.ParseWheelFilenameTests

Contributing

See CONTRIBUTING.md for development guidelines.

License

See LICENSE file for details.

Related Projects

flash-attention - The original flash-attn implementation
xformers - Memory-efficient transformers
pytorch - The foundation for all CUDA packages

Support

Issues: GitHub Issues
Releases: GitHub Releases
Index: GitHub Pages

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python-wheels

Why?

Supported Packages

Installation

Available Configurations

Triggering a Build

Via GitHub Actions UI

Automatic Builds

Via GitHub CLI

Adding New Packages

Project Structure

How It Works

Architecture

Troubleshooting

Build fails with disk space error

Build timeout / runner lost communication

Import test fails

Local Testing

Prerequisites

Build and Test a Wheel

Test Index Generation

Run Unit Tests

Contributing

License

Related Projects

Support

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

python-wheels

Why?

Supported Packages

Installation

Available Configurations

Triggering a Build

Via GitHub Actions UI

Automatic Builds

Via GitHub CLI

Adding New Packages

Project Structure

How It Works

Architecture

Troubleshooting

Build fails with disk space error

Build timeout / runner lost communication

Import test fails

Local Testing

Prerequisites

Build and Test a Wheel

Test Index Generation

Run Unit Tests

Contributing

License

Related Projects

Support