Skip to content

Commit 83c30d0

Browse files
committed
docs: add project README and language reference
1 parent 7f82e8f commit 83c30d0

File tree

4 files changed

+428
-3
lines changed

4 files changed

+428
-3
lines changed

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ uv run ruff format gpu_test/
8787

8888
- **Stack Type**: `!forth.stack` - untyped stack, programmer ensures type safety
8989
- **Operations**: All take stack as input and produce stack as output (except `forth.stack`)
90-
- **Supported Words**: literals (integer `42` and float `3.14`), `DUP DROP SWAP OVER ROT NIP TUCK PICK ROLL`, `+ - * / MOD`, `F+ F- F* F/` (float arithmetic), `FEXP FSQRT FLOG FABS FNEG` (float math intrinsics), `FMAX FMIN` (float min/max), `AND OR XOR NOT LSHIFT RSHIFT`, `= < > <> <= >= 0=`, `F= F< F> F<> F<= F>=` (float comparison), `S>F F>S` (int/float conversion), `@ !` (global memory), `F@ F!` (float global memory), `S@ S!` (shared memory), `SF@ SF!` (float shared memory), `I8@ I8! SI8@ SI8!` (i8 memory), `I16@ I16! SI16@ SI16!` (i16 memory), `I32@ I32! SI32@ SI32!` (i32 memory), `HF@ HF! SHF@ SHF!` (f16 memory), `BF@ BF! SBF@ SBF!` (bf16 memory), `F32@ F32! SF32@ SF32!` (f32 memory), `CELLS`, `IF ELSE THEN`, `BEGIN UNTIL`, `BEGIN WHILE REPEAT`, `DO LOOP +LOOP I J K`, `LEAVE UNLOOP EXIT`, `{ a b -- }` (local variables in word definitions), `TID-X/Y/Z BID-X/Y/Z BDIM-X/Y/Z GDIM-X/Y/Z GLOBAL-ID` (GPU indexing).
90+
- **Supported Words**: literals (integer `42` and float `3.14`), `DUP DROP SWAP OVER ROT NIP TUCK PICK ROLL`, `+ - * / MOD`, `F+ F- F* F/` (float arithmetic), `FEXP FSQRT FLOG FABS FNEG` (float math intrinsics), `FMAX FMIN` (float min/max), `AND OR XOR NOT LSHIFT RSHIFT`, `= < > <> <= >= 0=`, `F= F< F> F<> F<= F>=` (float comparison), `S>F F>S` (int/float conversion), `@ !` (global memory), `F@ F!` (float global memory), `S@ S!` (shared memory), `SF@ SF!` (float shared memory), `I8@ I8! SI8@ SI8!` (i8 memory), `I16@ I16! SI16@ SI16!` (i16 memory), `I32@ I32! SI32@ SI32!` (i32 memory), `HF@ HF! SHF@ SHF!` (f16 memory), `BF@ BF! SBF@ SBF!` (bf16 memory), `F32@ F32! SF32@ SF32!` (f32 memory), `CELLS`, `IF ELSE THEN`, `BEGIN UNTIL`, `BEGIN WHILE REPEAT`, `DO LOOP +LOOP I J K`, `LEAVE UNLOOP EXIT`, `{ a b -- }` (local variables in word definitions), `TID-X/Y/Z BID-X/Y/Z BDIM-X/Y/Z GDIM-X/Y/Z GLOBAL-ID` (GPU indexing), `BARRIER` (thread block synchronization).
9191
- **Float Literals**: Numbers containing `.` or `e`/`E` are parsed as f64 (e.g. `3.14`, `-2.0`, `1.0e-5`, `1e3`). Stored on the stack as i64 bit patterns; F-prefixed words perform bitcast before/after operations.
9292
- **Kernel Parameters**: Declared in the `\!` header. `\! kernel <name>` is required and must appear first. `\! param <name> i64[<N>]` becomes a `memref<Nxi64>` argument; `\! param <name> i64` becomes an `i64` argument. `\! param <name> f64[<N>]` becomes a `memref<Nxf64>` argument; `\! param <name> f64` becomes an `f64` argument (bitcast to i64 when pushed to stack). Using a param name in code emits `forth.param_ref` (arrays push address; scalars push value).
9393
- **Shared Memory**: `\! shared <name> i64[<N>]` or `\! shared <name> f64[<N>]` declares GPU shared (workgroup) memory. Emits a tagged `memref.alloca` at kernel entry; ForthToGPU converts it to a `gpu.func` workgroup attribution. Using the shared name in code pushes its base address onto the stack. Use `S@`/`S!` for i64 or `SF@`/`SF!` for f64 shared accesses. Cannot be referenced inside word definitions.

README.md

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# WarpForth
2+
3+
An MLIR-based Forth compiler for programming GPU kernels. WarpForth defines a custom MLIR dialect for Forth stack operations and lowers through a pipeline of passes to PTX assembly.
4+
5+
## Dependencies
6+
7+
- LLVM/MLIR
8+
- CMake
9+
- C++17 compiler
10+
- CUDA toolkit (for GPU execution)
11+
- [uv](https://github.com/astral-sh/uv) (for Python test tooling)
12+
13+
## Building
14+
15+
```bash
16+
# Configure
17+
cmake -B build -G Ninja \
18+
-DMLIR_DIR=/path/to/llvm/lib/cmake/mlir \
19+
-DLLVM_DIR=/path/to/llvm/lib/cmake/llvm
20+
21+
# Build
22+
cmake --build build
23+
```
24+
25+
## Quick Start
26+
27+
Write a naive integer matrix multiply kernel (M=2, N=3, K=4, one thread per output element):
28+
29+
```forth
30+
\! kernel main
31+
\! param A i64[8]
32+
\! param B i64[12]
33+
\! param C i64[6]
34+
35+
\ One thread computes C[row, col] where gid = row*N + col.
36+
GLOBAL-ID
37+
DUP 3 /
38+
SWAP 3 MOD
39+
0
40+
4 0 DO
41+
2 PICK
42+
I SWAP 4 * +
43+
CELLS A + @
44+
I 3 * 3 PICK + CELLS B + @
45+
* +
46+
LOOP
47+
2 PICK 3 * 2 PICK +
48+
CELLS C + !
49+
```
50+
51+
Compile to PTX:
52+
53+
```bash
54+
./build/bin/warpforthc matmul.forth -o matmul.ptx
55+
```
56+
57+
Test on a GPU (A is 2x4 row-major, B is 4x3 row-major, C is 2x3 output):
58+
59+
```bash
60+
./build/bin/warpforth-runner matmul.ptx \
61+
--param 'i64[]:1,2,3,4,5,6,7,8' \
62+
--param 'i64[]:1,2,3,4,5,6,7,8,9,10,11,12' \
63+
--param 'i64[]:0,0,0,0,0,0' \
64+
--grid 6,1,1 --block 1,1,1 \
65+
--output-param 2 --output-count 6
66+
```
67+
68+
## Toolchain
69+
70+
| Tool | Description |
71+
|------|-------------|
72+
| `warpforthc` | Compiles Forth source to PTX |
73+
| `warpforth-translate` | Translates from Forth source to MLIR and MLIR to PTX assembly |
74+
| `warpforth-opt` | Runs individual MLIR passes or entire pipeline |
75+
| `warpforth-runner` | Executes PTX kernels on a GPU for testing |
76+
77+
These tools can be composed for debugging or inspecting intermediate stages:
78+
79+
```bash
80+
./build/bin/warpforth-translate --forth-to-mlir kernel.forth | \
81+
./build/bin/warpforth-opt --warpforth-pipeline | \
82+
./build/bin/warpforth-translate --mlir-to-ptx
83+
```
84+
85+
## Language Reference
86+
87+
WarpForth supports stack operations, integer and float arithmetic, control flow, global and shared memory access, reduced-width memory types, user-defined words with local variables, and GPU-specific operations.
88+
89+
See [docs/language.md](docs/language.md) for the full language reference.
90+
91+
## Architecture
92+
93+
WarpForth compiles Forth through a series of MLIR dialect lowerings, each replacing higher-level abstractions with lower-level ones until the program is expressed entirely in LLVM IR and can be handed to the NVPTX backend.
94+
95+
| Stage | Pass | Description |
96+
|-------|-------------|-------------|
97+
| **Parsing** | `warpforth-translate --forth-to-mlir` | Parses Forth source into the `forth` dialect. The kernel is represented as a series of stack ops on an abstract `!forth.stack` type. |
98+
| **Stack lowering** | `warpforth-opt --convert-forth-to-memref` | The abstract `!forth.stack` type is materialized as a `memref<256xi64>` buffer and `index` pair. Stack ops become explicit loads, stores, and pointer arithmetic. |
99+
| **GPU wrapping** | `warpforth-opt --convert-forth-to-gpu` | Functions are wrapped in a `gpu.module`, the kernel entry point is marked as a `gpu.kernel` and GPU intrinsic words are lowered to `gpu` ops. |
100+
| **NVVM/LLVM lowering** | Standard MLIR passes | GPU→NVVM, math→LLVM intrinsics and NVVM→LLVM. |
101+
| **Code generation** | `warpforth-translate --mlir-to-ptx` | The GPU module is serialized to PTX assembly via LLVM's NVPTX backend. |
102+
103+
## Demo
104+
105+
The `demo/` directory contains a GPT-2 text generation demo that routes scaled dot-product attention through a WarpForth-compiled kernel. See [demo/README.md](demo/README.md) for setup instructions.
106+
107+
## Testing
108+
109+
```bash
110+
# Run the LIT test suite
111+
cmake --build build --target check-warpforth
112+
113+
# Run end-to-end GPU tests (requires Vast.ai API key)
114+
VASTAI_API_KEY=xxx uv run pytest -v -m gpu
115+
```

demo/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ A pre-compiled `attention.ptx` is included in this directory.
1818
## Step 2: Upload to GPU Instance
1919

2020
```bash
21-
scp -r demo/ demo/gpt2_generate.py root@HOST:/workspace
21+
scp -r demo/ root@HOST:/workspace
2222
```
2323

2424
## Step 3: Install Dependencies (Remote)
@@ -30,7 +30,7 @@ pip install pycuda transformers
3030
## Step 4: Generate Text (Remote)
3131

3232
```bash
33-
python /workspace/gpt2_generate.py --ptx /workspace/attention.ptx --prompt "The meaning of life is"
33+
python /workspace/demo/gpt2_generate.py --ptx /workspace/demo/attention.ptx --prompt "The meaning of life is"
3434
```
3535

3636
| Flag | Default | Description |

0 commit comments

Comments
 (0)