Skip to content

Lallapallooza/gpt.rs

Repository files navigation

gpt.rs

Rust-first research project for portable tensor programs (PTIR) and model runtimes.

PTIR lets you keep model math portable while backends compete on execution:

  • Layers call portable functionals (no open-coded kernels).
  • Functionals capture PTIR graphs and can be overridden at runtime.
  • Backends execute PTIR and can rewrite/fuse graphs into custom kernels.
  • Parameters have stable u128 ids and can be streamed lazily from checkpoints.

Warning: expect churn. APIs, formats, and model coverage change quickly.

Demo

demo.mp4

Benchmarks (early)

Single-thread CPU numbers from scripts/eval.py --workload bench --backend c on AMD Ryzen 9 9950X3D (Linux):

Workload gpt.rs torch Speedup
ResNet34 (batch=1, image=224) 29.93 images/s 28.20 images/s 1.06x
GPT-2 (generate 64 tokens) 74.71 tokens/s 46.16 tokens/s 1.62x

Reproduce (C backend):

# Build the Python extension with the C backend enabled:
uv sync
uv pip install maturin
uv --project "$(pwd)" --directory "$(pwd)/crates/gpt-rs-py" run \
  maturin develop --release --features conversion-c

# Export fresh v2 checkpoints:
uv run python scripts/export_gpt2.py --model gpt2 --checkpoint-out checkpoints/gpt2.bin \
  --config-out configs/gpt2_model.json --tokenizer-out configs/gpt2_tokenizer.json
uv run python scripts/export_model_weights.py --model resnet34 --out checkpoints/resnet34.bin

# Bench:
GPTRS_C_CACHE_DIR=./.cache/gpt_rs_c_backend uv run python scripts/eval.py \
  --model resnet34 --workload bench --backend c --threads 1 --warmup 1 --iters 2 --batch 1 --image-size 224
GPTRS_C_CACHE_DIR=./.cache/gpt_rs_c_backend uv run python scripts/eval.py \
  --model gpt2 --workload bench --backend c --threads 1 --warmup 1 --iters 2 --bench-tokens 64

Highlights

  • Capability-based runtime: runtime::load_model returns a dynamic model handle; the CLI runs generate / forward without hardcoding model kinds. See docs/runtime.md.
  • Parameter streaming + stable ids: checkpoint-backed ParamSource loads weights on demand; backends can memoize derived parameter formats by stable id. See docs/howto.md and docs/formats.md.
  • Backend rewrites: pattern-driven PTIR rewrites via #[ptir_pattern] views and backend optimizer passes. See docs/backend_optimizer.md (and crates/gpt-rs-backend-c/src/optimizer/conv2d.rs for a real example).
  • Correctness tooling: Torch parity at the kernel level and end-to-end model baselines via Python runners. See docs/testing.md.
  • Debuggability: PTIR dumps (--dump-dir), profiling (--profile with -F gpt-rs/profiler), and eager debugging (GPTRS_EAGER=1).

Docs

Start here:

Reference:

Scripts:

Roadmap (TODO)

Backends:

  • Triton backend
  • NVIDIA GPU (CUDA + cuDNN) backend
  • XLA backend
  • IREE backend

Training:

  • Full training loop
  • Autograd
  • Distributed training

Inference:

  • Quantized models (end-to-end)
  • Paged attention
  • Speculative decoding

Tooling / interop:

  • Chrome trace export (profiler visualization)
  • Full Python interop (define/train models in Python)
  • PyTorch importer

Models:

  • Qwen
  • Llama
  • DeepSeek
  • GPT-OSS
  • Diffusion models
  • Speech models

Platforms:

  • Windows
  • MacOS (partially works)

Misc

  • WebUI

Repository layout

  • crates/gpt-rs: core library (tensors, PTIR capture, layers, models, tokenizer, checkpoints, runtime).
  • crates/gpt-rs-cli: model runner CLI (generate / forward) with dump/profile hooks.
  • crates/gpt-rs-backend-*: backend implementations (faer, ref-cpu, optional C backend).
  • crates/gpt-rs-backend-tests: backend suite + Torch parity harness (via tch / libtorch).
  • scripts/: Python baselines and exporters (scripts/eval.py, scripts/export_gpt2.py, ...).

Architecture (high level)

inputs (tokens / images)
        |
        v
  model::* (Gpt, ResNet34, MobileNetV2, ...)
        |
        v
   nn::layers::* (Linear, LayerNorm, Attention, Conv2d, ...)
        |
        v
ops::functional::* (matmul, layer_norm, conv2d_nhwc, ...)
        |                  \
        |                   +-- runtime overrides (FunctionalRegistry / FunctionalOverrides)
        v
backend::spec::PortableBackend (dot_general, reduce_window, gather, elementwise, ...)
        |
        v
backend impl (faer, ref-cpu, ...)

Quick Start

# build the workspace (requires Rust toolchain)
cargo build

# explore the CLI surface
cargo run -p gpt-rs-cli -- --help

# run GPT-2 generation (checkpoint + tokenizer)
# export the checkpoint/tokenizer with: `uv run python scripts/export_gpt2.py --help`
cargo run --release -p gpt-rs-cli -- generate --prompt "Hello" --max-tokens 64 \
  --checkpoint checkpoints/gpt2.bin --tokenizer configs/gpt2_tokenizer.json

# export torchvision weights (gpt.rs checkpoint)
uv sync
uv run python scripts/export_model_weights.py --model resnet34 --out checkpoints/resnet34.bin

# run an image model (deterministic random input by default)
cargo run --release -p gpt-rs-cli -- forward --checkpoint checkpoints/resnet34.bin

Torch Baselines (Python)

The canonical entrypoint is scripts/eval.py (validate / bench) which runs both:

uv sync
uv pip install maturin
cd crates/gpt-rs-py && uv run maturin develop --release --features faer && cd ../..

uv run python scripts/eval.py --model resnet34 --workload validate
uv run python scripts/eval.py --model gpt2 --workload bench --threads 1 4 --bench-tokens 1 64

Notes:

  • Select a backend with --backend (or GPTRS_BACKEND, default: faer).
  • --profile prints tables only when built with -F gpt-rs/profiler.
  • Use --dump-dir / --dump-mode to capture PTIR programs (Rust CLI or Python runner).

Testing

cargo test  # workspace unit + integration tests (no Torch)

cargo test -p gpt-rs-backend-faer --test backend_suite  # backend smoke tests
cargo test -p gpt-rs-backend-faer --features torch --test backend_suite -- --nocapture  # Torch parity + timings (libtorch via tch)
cargo test -p gpt-rs --features torch --test torch_parity  # smaller parity set (ref-cpu backend)

Torch parity tests live under crates/gpt-rs-backend-tests/src/torch_parity/ and are wired into each backend via define_backend_tests! (behind the torch feature). See docs/testing.md.

Status

Forward inference for GPT-2 generation and image classification models (ResNet34, MobileNetV2) is implemented, with portable PTIR kernels and Torch baselines for correctness.

The functional layer exposes portable math (elementwise ops, matmul, normalization, attention, conv/pool) via the DeviceTensorOps extension trait while still delegating to backend PTIR execution.

License

Apache License 2.0. See LICENSE.

About

Rust LLM playground: build, train, generate on pluggable backends

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published