Rust-first research project for portable tensor programs (PTIR) and model runtimes.
PTIR lets you keep model math portable while backends compete on execution:
- Layers call portable functionals (no open-coded kernels).
- Functionals capture PTIR graphs and can be overridden at runtime.
- Backends execute PTIR and can rewrite/fuse graphs into custom kernels.
- Parameters have stable
u128ids and can be streamed lazily from checkpoints.
Warning: expect churn. APIs, formats, and model coverage change quickly.
demo.mp4
Single-thread CPU numbers from scripts/eval.py --workload bench --backend c on AMD Ryzen 9 9950X3D (Linux):
| Workload | gpt.rs | torch | Speedup |
|---|---|---|---|
| ResNet34 (batch=1, image=224) | 29.93 images/s | 28.20 images/s | 1.06x |
| GPT-2 (generate 64 tokens) | 74.71 tokens/s | 46.16 tokens/s | 1.62x |
Reproduce (C backend):
# Build the Python extension with the C backend enabled:
uv sync
uv pip install maturin
uv --project "$(pwd)" --directory "$(pwd)/crates/gpt-rs-py" run \
maturin develop --release --features conversion-c
# Export fresh v2 checkpoints:
uv run python scripts/export_gpt2.py --model gpt2 --checkpoint-out checkpoints/gpt2.bin \
--config-out configs/gpt2_model.json --tokenizer-out configs/gpt2_tokenizer.json
uv run python scripts/export_model_weights.py --model resnet34 --out checkpoints/resnet34.bin
# Bench:
GPTRS_C_CACHE_DIR=./.cache/gpt_rs_c_backend uv run python scripts/eval.py \
--model resnet34 --workload bench --backend c --threads 1 --warmup 1 --iters 2 --batch 1 --image-size 224
GPTRS_C_CACHE_DIR=./.cache/gpt_rs_c_backend uv run python scripts/eval.py \
--model gpt2 --workload bench --backend c --threads 1 --warmup 1 --iters 2 --bench-tokens 64- Capability-based runtime:
runtime::load_modelreturns a dynamic model handle; the CLI runsgenerate/forwardwithout hardcoding model kinds. See docs/runtime.md. - Parameter streaming + stable ids: checkpoint-backed
ParamSourceloads weights on demand; backends can memoize derived parameter formats by stable id. See docs/howto.md and docs/formats.md. - Backend rewrites: pattern-driven PTIR rewrites via
#[ptir_pattern]views and backend optimizer passes. See docs/backend_optimizer.md (and crates/gpt-rs-backend-c/src/optimizer/conv2d.rs for a real example). - Correctness tooling: Torch parity at the kernel level and end-to-end model baselines via Python runners. See docs/testing.md.
- Debuggability: PTIR dumps (
--dump-dir), profiling (--profilewith-F gpt-rs/profiler), and eager debugging (GPTRS_EAGER=1).
Start here:
- docs/README.md (doc index + policy)
- docs/howto.md (add models/layers/functionals/backends)
- docs/runtime.md (loader, capability dispatch, overrides)
- docs/testing.md (Torch parity + dumps/profiling + Python baselines)
- docs/formats.md (checkpoint + tensor archive formats)
Reference:
- docs/backends/README.md (how each backend works: ref-cpu, faer, c)
- docs/backend.md (PTIR backend contract, ptir.v0.4)
- docs/backend_optimizer.md (optimizer pipeline + patterns)
- docs/ops.md (PTIR capture/graphs/execution)
- docs/frontend.md (frontend layering + runtime overrides)
Scripts:
- scripts/README.md (Python utilities: export + eval)
Backends:
- Triton backend
- NVIDIA GPU (CUDA + cuDNN) backend
- XLA backend
- IREE backend
Training:
- Full training loop
- Autograd
- Distributed training
Inference:
- Quantized models (end-to-end)
- Paged attention
- Speculative decoding
Tooling / interop:
- Chrome trace export (profiler visualization)
- Full Python interop (define/train models in Python)
- PyTorch importer
Models:
- Qwen
- Llama
- DeepSeek
- GPT-OSS
- Diffusion models
- Speech models
Platforms:
- Windows
- MacOS (partially works)
Misc
- WebUI
crates/gpt-rs: core library (tensors, PTIR capture, layers, models, tokenizer, checkpoints, runtime).crates/gpt-rs-cli: model runner CLI (generate/forward) with dump/profile hooks.crates/gpt-rs-backend-*: backend implementations (faer, ref-cpu, optional C backend).crates/gpt-rs-backend-tests: backend suite + Torch parity harness (viatch/ libtorch).scripts/: Python baselines and exporters (scripts/eval.py,scripts/export_gpt2.py, ...).
inputs (tokens / images)
|
v
model::* (Gpt, ResNet34, MobileNetV2, ...)
|
v
nn::layers::* (Linear, LayerNorm, Attention, Conv2d, ...)
|
v
ops::functional::* (matmul, layer_norm, conv2d_nhwc, ...)
| \
| +-- runtime overrides (FunctionalRegistry / FunctionalOverrides)
v
backend::spec::PortableBackend (dot_general, reduce_window, gather, elementwise, ...)
|
v
backend impl (faer, ref-cpu, ...)
# build the workspace (requires Rust toolchain)
cargo build
# explore the CLI surface
cargo run -p gpt-rs-cli -- --help
# run GPT-2 generation (checkpoint + tokenizer)
# export the checkpoint/tokenizer with: `uv run python scripts/export_gpt2.py --help`
cargo run --release -p gpt-rs-cli -- generate --prompt "Hello" --max-tokens 64 \
--checkpoint checkpoints/gpt2.bin --tokenizer configs/gpt2_tokenizer.json
# export torchvision weights (gpt.rs checkpoint)
uv sync
uv run python scripts/export_model_weights.py --model resnet34 --out checkpoints/resnet34.bin
# run an image model (deterministic random input by default)
cargo run --release -p gpt-rs-cli -- forward --checkpoint checkpoints/resnet34.binThe canonical entrypoint is scripts/eval.py (validate / bench) which runs both:
uv sync
uv pip install maturin
cd crates/gpt-rs-py && uv run maturin develop --release --features faer && cd ../..
uv run python scripts/eval.py --model resnet34 --workload validate
uv run python scripts/eval.py --model gpt2 --workload bench --threads 1 4 --bench-tokens 1 64Notes:
- Select a backend with
--backend(orGPTRS_BACKEND, default:faer). --profileprints tables only when built with-F gpt-rs/profiler.- Use
--dump-dir/--dump-modeto capture PTIR programs (Rust CLI or Python runner).
cargo test # workspace unit + integration tests (no Torch)
cargo test -p gpt-rs-backend-faer --test backend_suite # backend smoke tests
cargo test -p gpt-rs-backend-faer --features torch --test backend_suite -- --nocapture # Torch parity + timings (libtorch via tch)
cargo test -p gpt-rs --features torch --test torch_parity # smaller parity set (ref-cpu backend)Torch parity tests live under crates/gpt-rs-backend-tests/src/torch_parity/ and are wired into each backend via
define_backend_tests! (behind the torch feature). See docs/testing.md.
Forward inference for GPT-2 generation and image classification models (ResNet34, MobileNetV2) is implemented, with portable PTIR kernels and Torch baselines for correctness.
The functional layer exposes portable math (elementwise ops, matmul, normalization, attention, conv/pool) via the
DeviceTensorOps extension trait while still delegating to backend PTIR execution.
Apache License 2.0. See LICENSE.