python-wheels

A GitHub-hosted PyPI index for pre-compiled Python wheels of CUDA-enabled packages. Build once, install anywhere - no local compilation required.

Why?

Building packages like flash-attn and xformers from source takes 30+ minutes and requires CUDA toolkit, build tools, and significant CPU/memory. This repository:

✅ Pre-builds wheels for multiple Python and CUDA versions
✅ Hosts a PEP 503 index on GitHub Pages
✅ Eliminates local compilation - just pip install
✅ Supports CUDA 12.x and 13.x via PyTorch nightly builds

Supported Packages

Current packages available:

flash-attn 2.8.3

GPU Compatibility: Wheels are compiled for CUDA compute capability 8.6 ONLY (RTX 3080/3090/3090Ti, A100). This limitation is necessary to fit compilation within GitHub Actions runner memory constraints (7GB RAM). Other GPUs will NOT work.

See config/packages.yml for full configuration.

Installation

Install pre-built wheels using pip's --extra-index-url:

# Install PyTorch first (required for CUDA packages)
# For CUDA 12.9:
pip install torch --index-url https://download.pytorch.org/whl/cu129

# For CUDA 13.0:
pip install torch --index-url https://download.pytorch.org/whl/cu130

# Then install the package from this index
pip install flash-attn --extra-index-url https://DEVtheOPS.github.io/python-wheels/simple/

Available Configurations

Package	Version	Python	CUDA
flash-attn	2.8.3	3.12, 3.13	12.9.1, 13.0.2

Triggering a Build

Via GitHub Actions UI

Go to the Actions tab
Click "Build Wheels" workflow
Click "Run workflow" button
(Optional) Customize parameters:
- Package: Leave empty to build all, or specify one (e.g., flash-attn)
- Python versions: Comma-separated (default: 3.12,3.13)
- CUDA versions: Comma-separated (default: 12.9.1,13.0.2)
Click "Run workflow"

Automatic Builds

Builds trigger automatically when you:

Push changes to config/packages.yml on the main branch

Via GitHub CLI

# Build all packages with defaults
gh workflow run build-wheels.yml

# Build specific package
gh workflow run build-wheels.yml -f package=flash-attn

# Build for specific Python version
gh workflow run build-wheels.yml -f python_versions=3.12

# Build for specific CUDA version
gh workflow run build-wheels.yml -f cuda_versions=13.0.2

# Combine options
gh workflow run build-wheels.yml \
  -f package=flash-attn \
  -f python_versions=3.12 \
  -f cuda_versions=13.0.2

Adding New Packages

Edit config/packages.yml:

packages:
  my-package:
    versions: ["1.0.0"]
    build_args: "--no-build-isolation"  # Optional
    extra_deps: ["torch", "ninja"]       # Build dependencies
    test_import: "my_package"            # Module name for import test
    description: "My CUDA package"       # Human-readable description

Commit and push to main branch (triggers automatic build)
Or manually trigger workflow via Actions UI

Project Structure

python-wheels/
├── .github/workflows/
│   └── build-wheels.yml      # CI/CD workflow
├── config/
│   └── packages.yml           # Package definitions
├── scripts/
│   ├── build_in_docker.sh     # Docker build script
│   ├── generate_index.py      # PyPI index generator
│   └── test_build_local.sh    # Local testing script
├── tests/
│   └── test_generate_index.py # Unit tests
├── AGENTS.md                  # Technical documentation for AI assistants
├── CONTRIBUTING.md            # Development guide
└── README.md                  # This file

How It Works

Matrix Generation: Workflow reads config/packages.yml and generates build matrix
Docker Build: Each combination builds in CUDA Docker container
Wheel Creation: pip wheel compiles package with CUDA support
Import Test: Verifies wheel loads successfully in clean environment
Release: Creates GitHub release with wheels attached
Index Generation: Generates PEP 503 index HTML
GitHub Pages: Deploys index for pip consumption

Architecture

Build Environment: nvidia/cuda:{VERSION}-devel-ubuntu22.04
Test Environment: nvidia/cuda:{VERSION}-runtime-ubuntu22.04
Python: Installed from deadsnakes PPA
PyTorch: Version-matched to CUDA (12.x stable, 13.x nightly)
Index Format: PEP 503 compliant, hosted on GitHub Pages

Troubleshooting

Build fails with disk space error

GitHub Actions runners have limited space. The workflow includes cleanup steps, but you can:

Build fewer packages at once using the package parameter
Reduce Python/CUDA version combinations

Build timeout / runner lost communication

flash-attn compilation is extremely CPU/memory intensive. The workflow runs builds sequentially (max-parallel: 1) to prevent overwhelming runners. This means:

⏱️ Each build takes 60-90 minutes
⏱️ All 4 combinations take ~4-6 hours total
✅ Builds complete successfully without timeouts

To speed up for testing:

Build one package at a time using the package parameter
Build one Python version at a time
Build one CUDA version at a time

Import test fails

Common causes:

Missing runtime dependencies: Package needs PyTorch + numpy installed
CUDA version mismatch: Ensure PyTorch CUDA version matches wheel's CUDA version
Wrong PyTorch version: CUDA 13.x requires PyTorch nightly from cu130 index

See AGENTS.md for detailed troubleshooting.

Local Testing

You can test wheel builds locally before pushing to CI:

Prerequisites

Docker or Podman installed
Python 3.12+ with PyYAML (pip install pyyaml)

Build and Test a Wheel

# Using Docker (default)
./scripts/test_build_local.sh docker flash-attn 2.8.3 3.12 12.9.1

# Using Podman
./scripts/test_build_local.sh podman flash-attn 2.8.3 3.12 13.0.2

# With defaults (flash-attn 2.8.3, Python 3.12, CUDA 12.9.1)
./scripts/test_build_local.sh

The script will:

✅ Read package config from config/packages.yml
✅ Pull CUDA Docker image
✅ Build the wheel in Docker
✅ Run import test in clean container
✅ Report success/failure

Wheels are output to ./wheels/ directory.

Test Index Generation

# Generate index HTML from wheels directory
python scripts/generate_index.py \
  --wheels-dir wheels \
  --output-dir index/simple \
  --base-url "https://github.com/USER/REPO/releases/download/TAG"

# View generated index
ls -R index/simple/

Run Unit Tests

# Run all tests
python -m unittest discover -s tests -p "test_*.py"

# Run specific test
python -m unittest tests.test_generate_index.ParseWheelFilenameTests

Contributing

See CONTRIBUTING.md for development guidelines.

License

See LICENSE file for details.

Related Projects

flash-attention - The original flash-attn implementation
xformers - Memory-efficient transformers
pytorch - The foundation for all CUDA packages

Support

Issues: GitHub Issues
Releases: GitHub Releases
Index: GitHub Pages

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

python-wheels

Why?

Supported Packages

Installation

Available Configurations

Triggering a Build

Via GitHub Actions UI

Automatic Builds

Via GitHub CLI

Adding New Packages

Project Structure

How It Works

Architecture

Troubleshooting

Build fails with disk space error

Build timeout / runner lost communication

Import test fails

Local Testing

Prerequisites

Build and Test a Wheel

Test Index Generation

Run Unit Tests

Contributing

License

Related Projects

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.beads		.beads
.github/workflows		.github/workflows
config		config
docs/plans		docs/plans
index		index
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

python-wheels

Why?

Supported Packages

Installation

Available Configurations

Triggering a Build

Via GitHub Actions UI

Automatic Builds

Via GitHub CLI

Adding New Packages

Project Structure

How It Works

Architecture

Troubleshooting

Build fails with disk space error

Build timeout / runner lost communication

Import test fails

Local Testing

Prerequisites

Build and Test a Wheel

Test Index Generation

Run Unit Tests

Contributing

License

Related Projects

Support

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages