Bonsai 8B running on AMD Strix Halo (gfx1151) via ROCm/HIP — 76.7 t/s

## Bonsai 8B on AMD Strix Halo — ROCm/HIP Build & Benchmark

Successfully built and ran Bonsai 8B on AMD Strix Halo (gfx1151) using the PrismML llama.cpp fork with ROCm/HIP. Sharing the build fix and benchmark results for anyone else on AMD hardware.

## Hardware

| Component | Detail |
|-----------|--------|
| APU | AMD Strix Halo (Ryzen AI Max+ 395) |
| GPU | Radeon 8060S (gfx1151) |
| Memory | 128GB unified (GPU/system shared) |
| ROCm | 7.2 (AMD clang 22.0.0) |
| PyTorch | 2.9.1+rocm7.11.0 |
| OS | Arch Linux (kernel 6.19.9) |

## Benchmark Results

| Metric | CPU-only | ROCm/HIP (GPU) |
|--------|----------|-----------------|
| Prompt | 3.3 t/s | **81.4 t/s** |
| Generation | 2.3 t/s | **76.7 t/s** |

Model loads entirely into GPU memory at 1.15 GB — leaves plenty of headroom on the 128GB unified pool.

## Build Fix for ROCm/HIP

The PrismML llama.cpp fork doesn't officially support ROCm yet, but the CUDA sources hipify cleanly with two fixes:

### Problem 1: Missing `hip_fp16.h`
```
fatal error: 'hip/hip_fp16.h' file not found
```
**Fix:** Add `-I/opt/rocm/include` to HIP compiler flags.

### Problem 2: Missing `libamdhip64`
```
ld.lld: error: unable to find library -lamdhip64
```
**Fix:** Add `-L/opt/rocm/lib` to linker flags.

### Full CMake Command
```bash
CMAKE_PREFIX_PATH=/opt/rocm HIP_PATH=/opt/rocm cmake -B build \
  -DGGML_HIP=ON \
  -DAMDGPU_TARGETS=gfx1151 \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_HIP_COMPILER=/opt/rocm/llvm/bin/clang++ \
  -DCMAKE_HIP_FLAGS="-I/opt/rocm/include" \
  -DCMAKE_EXE_LINKER_FLAGS="-L/opt/rocm/lib" \
  -DCMAKE_SHARED_LINKER_FLAGS="-L/opt/rocm/lib"

cmake --build build -j$(nproc)
```

### Run Inference
```bash
./build/bin/llama-cli \
  -m Bonsai-8B-Q1_0_g128.gguf \
  -p "Your prompt here" \
  -n 256 \
  --temp 0.5 \
  --top-p 0.85 \
  --top-k 20 \
  -ngl 99
```

## Notes

- All Q1_0_g128 CUDA kernels compiled cleanly through HIP without modification
- AOTriton is used for attention (confirmed in logs)
- The 1-bit model at 1.15 GB is an ideal fit for unified memory architectures like Strix Halo
- NPU (XDNA 2, 50 TOPS) could be an interesting future target for 1-bit inference

## Feature Request

Consider adding official ROCm/HIP support to the build scripts. The two fixes above are trivial and would open Bonsai to the entire AMD GPU ecosystem. Happy to contribute a PR if interested.

Designed and built by the architect

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bonsai 8B running on AMD Strix Halo (gfx1151) via ROCm/HIP — 76.7 t/s #19

Bonsai 8B on AMD Strix Halo — ROCm/HIP Build & Benchmark

Hardware

Benchmark Results

Build Fix for ROCm/HIP

Problem 1: Missing `hip_fp16.h`

Problem 2: Missing `libamdhip64`

Full CMake Command

Run Inference

Notes

Feature Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Component	Detail
APU	AMD Strix Halo (Ryzen AI Max+ 395)
GPU	Radeon 8060S (gfx1151)
Memory	128GB unified (GPU/system shared)
ROCm	7.2 (AMD clang 22.0.0)
PyTorch	2.9.1+rocm7.11.0
OS	Arch Linux (kernel 6.19.9)

Bonsai 8B running on AMD Strix Halo (gfx1151) via ROCm/HIP — 76.7 t/s #19

Description

Bonsai 8B on AMD Strix Halo — ROCm/HIP Build & Benchmark

Hardware

Benchmark Results

Build Fix for ROCm/HIP

Problem 1: Missing hip_fp16.h

Problem 2: Missing libamdhip64

Full CMake Command

Run Inference

Notes

Feature Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Problem 1: Missing `hip_fp16.h`

Problem 2: Missing `libamdhip64`