gpt.rs

Rust-first research project for portable tensor programs (PTIR) and model runtimes.

PTIR lets you keep model math portable while backends compete on execution:

Layers call portable functionals (no open-coded kernels).
Functionals capture PTIR graphs and can be overridden at runtime.
Backends execute PTIR and can rewrite/fuse graphs into custom kernels.
Parameters have stable u128 ids and can be streamed lazily from checkpoints.

Warning: expect churn. APIs, formats, and model coverage change quickly.

Demo

demo.mp4

Benchmarks (early)

Single-thread CPU numbers from scripts/eval.py --workload bench --backend c on AMD Ryzen 9 9950X3D (Linux):

Workload	gpt.rs	torch	Speedup
ResNet34 (batch=1, image=224)	29.93 images/s	28.20 images/s	1.06x
GPT-2 (generate 64 tokens)	74.71 tokens/s	46.16 tokens/s	1.62x

Reproduce (C backend):

# Build the Python extension with the C backend enabled:
uv sync
uv pip install maturin
uv --project "$(pwd)" --directory "$(pwd)/crates/gpt-rs-py" run \
  maturin develop --release --features conversion-c

# Export fresh v2 checkpoints:
uv run python scripts/export_gpt2.py --model gpt2 --checkpoint-out checkpoints/gpt2.bin \
  --config-out configs/gpt2_model.json --tokenizer-out configs/gpt2_tokenizer.json
uv run python scripts/export_model_weights.py --model resnet34 --out checkpoints/resnet34.bin

# Bench:
GPTRS_C_CACHE_DIR=./.cache/gpt_rs_c_backend uv run python scripts/eval.py \
  --model resnet34 --workload bench --backend c --threads 1 --warmup 1 --iters 2 --batch 1 --image-size 224
GPTRS_C_CACHE_DIR=./.cache/gpt_rs_c_backend uv run python scripts/eval.py \
  --model gpt2 --workload bench --backend c --threads 1 --warmup 1 --iters 2 --bench-tokens 64

Highlights

Capability-based runtime: runtime::load_model returns a dynamic model handle; the CLI runs generate / forward without hardcoding model kinds. See docs/runtime.md.
Parameter streaming + stable ids: checkpoint-backed ParamSource loads weights on demand; backends can memoize derived parameter formats by stable id. See docs/howto.md and docs/formats.md.
Backend rewrites: pattern-driven PTIR rewrites via #[ptir_pattern] views and backend optimizer passes. See docs/backend_optimizer.md (and crates/gpt-rs-backend-c/src/optimizer/conv2d.rs for a real example).
Correctness tooling: Torch parity at the kernel level and end-to-end model baselines via Python runners. See docs/testing.md.
Debuggability: PTIR dumps (--dump-dir), profiling (--profile with -F gpt-rs/profiler), and eager debugging (GPTRS_EAGER=1).

Docs

Start here:

docs/README.md (doc index + policy)
docs/howto.md (add models/layers/functionals/backends)
docs/runtime.md (loader, capability dispatch, overrides)
docs/testing.md (Torch parity + dumps/profiling + Python baselines)
docs/formats.md (checkpoint + tensor archive formats)

Reference:

docs/backends/README.md (how each backend works: ref-cpu, faer, c)
docs/backend.md (PTIR backend contract, ptir.v0.4)
docs/backend_optimizer.md (optimizer pipeline + patterns)
docs/ops.md (PTIR capture/graphs/execution)
docs/frontend.md (frontend layering + runtime overrides)

Scripts:

scripts/README.md (Python utilities: export + eval)

Roadmap (TODO)

Backends:

Triton backend
NVIDIA GPU (CUDA + cuDNN) backend
XLA backend
IREE backend

Training:

Full training loop
Autograd
Distributed training

Inference:

Quantized models (end-to-end)
Paged attention
Speculative decoding

Tooling / interop:

Chrome trace export (profiler visualization)
Full Python interop (define/train models in Python)
PyTorch importer

Models:

Platforms:

Windows
MacOS (partially works)

Misc

WebUI

Repository layout

crates/gpt-rs: core library (tensors, PTIR capture, layers, models, tokenizer, checkpoints, runtime).
crates/gpt-rs-cli: model runner CLI (generate / forward) with dump/profile hooks.
crates/gpt-rs-backend-*: backend implementations (faer, ref-cpu, optional C backend).
crates/gpt-rs-backend-tests: backend suite + Torch parity harness (via tch / libtorch).
scripts/: Python baselines and exporters (scripts/eval.py, scripts/export_gpt2.py, ...).

Architecture (high level)

inputs (tokens / images)
        |
        v
  model::* (Gpt, ResNet34, MobileNetV2, ...)
        |
        v
   nn::layers::* (Linear, LayerNorm, Attention, Conv2d, ...)
        |
        v
ops::functional::* (matmul, layer_norm, conv2d_nhwc, ...)
        |                  \
        |                   +-- runtime overrides (FunctionalRegistry / FunctionalOverrides)
        v
backend::spec::PortableBackend (dot_general, reduce_window, gather, elementwise, ...)
        |
        v
backend impl (faer, ref-cpu, ...)

Quick Start

# build the workspace (requires Rust toolchain)
cargo build

# explore the CLI surface
cargo run -p gpt-rs-cli -- --help

# run GPT-2 generation (checkpoint + tokenizer)
# export the checkpoint/tokenizer with: `uv run python scripts/export_gpt2.py --help`
cargo run --release -p gpt-rs-cli -- generate --prompt "Hello" --max-tokens 64 \
  --checkpoint checkpoints/gpt2.bin --tokenizer configs/gpt2_tokenizer.json

# export torchvision weights (gpt.rs checkpoint)
uv sync
uv run python scripts/export_model_weights.py --model resnet34 --out checkpoints/resnet34.bin

# run an image model (deterministic random input by default)
cargo run --release -p gpt-rs-cli -- forward --checkpoint checkpoints/resnet34.bin

Torch Baselines (Python)

The canonical entrypoint is scripts/eval.py (validate / bench) which runs both:

uv sync
uv pip install maturin
cd crates/gpt-rs-py && uv run maturin develop --release --features faer && cd ../..

uv run python scripts/eval.py --model resnet34 --workload validate
uv run python scripts/eval.py --model gpt2 --workload bench --threads 1 4 --bench-tokens 1 64

Notes:

Select a backend with --backend (or GPTRS_BACKEND, default: faer).
--profile prints tables only when built with -F gpt-rs/profiler.
Use --dump-dir / --dump-mode to capture PTIR programs (Rust CLI or Python runner).

Testing

cargo test  # workspace unit + integration tests (no Torch)

cargo test -p gpt-rs-backend-faer --test backend_suite  # backend smoke tests
cargo test -p gpt-rs-backend-faer --features torch --test backend_suite -- --nocapture  # Torch parity + timings (libtorch via tch)
cargo test -p gpt-rs --features torch --test torch_parity  # smaller parity set (ref-cpu backend)

Torch parity tests live under crates/gpt-rs-backend-tests/src/torch_parity/ and are wired into each backend via define_backend_tests! (behind the torch feature). See docs/testing.md.

Status

Forward inference for GPT-2 generation and image classification models (ResNet34, MobileNetV2) is implemented, with portable PTIR kernels and Torch baselines for correctness.

The functional layer exposes portable math (elementwise ops, matmul, normalization, attention, conv/pool) via the DeviceTensorOps extension trait while still delegating to backend PTIR execution.

License

Apache License 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.github/workflows		.github/workflows
checkpoints		checkpoints
configs		configs
crates		crates
data		data
docs		docs
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

gpt.rs

Demo

Benchmarks (early)

Highlights

Docs

Roadmap (TODO)

Repository layout

Architecture (high level)

Quick Start

Torch Baselines (Python)

Testing

Status

License

About

Uh oh!

Releases

Packages

Languages

License

Lallapallooza/gpt.rs

Folders and files

Latest commit

History

Repository files navigation

gpt.rs

Demo

Benchmarks (early)

Highlights

Docs

Roadmap (TODO)

Repository layout

Architecture (high level)

Quick Start

Torch Baselines (Python)

Testing

Status

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages