Bonsai 8B on AMD Strix Halo — ROCm/HIP Build & Benchmark
Successfully built and ran Bonsai 8B on AMD Strix Halo (gfx1151) using the PrismML llama.cpp fork with ROCm/HIP. Sharing the build fix and benchmark results for anyone else on AMD hardware.
Hardware
| Component |
Detail |
| APU |
AMD Strix Halo (Ryzen AI Max+ 395) |
| GPU |
Radeon 8060S (gfx1151) |
| Memory |
128GB unified (GPU/system shared) |
| ROCm |
7.2 (AMD clang 22.0.0) |
| PyTorch |
2.9.1+rocm7.11.0 |
| OS |
Arch Linux (kernel 6.19.9) |
Benchmark Results
| Metric |
CPU-only |
ROCm/HIP (GPU) |
| Prompt |
3.3 t/s |
81.4 t/s |
| Generation |
2.3 t/s |
76.7 t/s |
Model loads entirely into GPU memory at 1.15 GB — leaves plenty of headroom on the 128GB unified pool.
Build Fix for ROCm/HIP
The PrismML llama.cpp fork doesn't officially support ROCm yet, but the CUDA sources hipify cleanly with two fixes:
Problem 1: Missing hip_fp16.h
fatal error: 'hip/hip_fp16.h' file not found
Fix: Add -I/opt/rocm/include to HIP compiler flags.
Problem 2: Missing libamdhip64
ld.lld: error: unable to find library -lamdhip64
Fix: Add -L/opt/rocm/lib to linker flags.
Full CMake Command
CMAKE_PREFIX_PATH=/opt/rocm HIP_PATH=/opt/rocm cmake -B build \
-DGGML_HIP=ON \
-DAMDGPU_TARGETS=gfx1151 \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_HIP_COMPILER=/opt/rocm/llvm/bin/clang++ \
-DCMAKE_HIP_FLAGS="-I/opt/rocm/include" \
-DCMAKE_EXE_LINKER_FLAGS="-L/opt/rocm/lib" \
-DCMAKE_SHARED_LINKER_FLAGS="-L/opt/rocm/lib"
cmake --build build -j$(nproc)
Run Inference
./build/bin/llama-cli \
-m Bonsai-8B-Q1_0_g128.gguf \
-p "Your prompt here" \
-n 256 \
--temp 0.5 \
--top-p 0.85 \
--top-k 20 \
-ngl 99
Notes
- All Q1_0_g128 CUDA kernels compiled cleanly through HIP without modification
- AOTriton is used for attention (confirmed in logs)
- The 1-bit model at 1.15 GB is an ideal fit for unified memory architectures like Strix Halo
- NPU (XDNA 2, 50 TOPS) could be an interesting future target for 1-bit inference
Feature Request
Consider adding official ROCm/HIP support to the build scripts. The two fixes above are trivial and would open Bonsai to the entire AMD GPU ecosystem. Happy to contribute a PR if interested.
Designed and built by the architect
Bonsai 8B on AMD Strix Halo — ROCm/HIP Build & Benchmark
Successfully built and ran Bonsai 8B on AMD Strix Halo (gfx1151) using the PrismML llama.cpp fork with ROCm/HIP. Sharing the build fix and benchmark results for anyone else on AMD hardware.
Hardware
Benchmark Results
Model loads entirely into GPU memory at 1.15 GB — leaves plenty of headroom on the 128GB unified pool.
Build Fix for ROCm/HIP
The PrismML llama.cpp fork doesn't officially support ROCm yet, but the CUDA sources hipify cleanly with two fixes:
Problem 1: Missing
hip_fp16.hFix: Add
-I/opt/rocm/includeto HIP compiler flags.Problem 2: Missing
libamdhip64Fix: Add
-L/opt/rocm/libto linker flags.Full CMake Command
Run Inference
./build/bin/llama-cli \ -m Bonsai-8B-Q1_0_g128.gguf \ -p "Your prompt here" \ -n 256 \ --temp 0.5 \ --top-p 0.85 \ --top-k 20 \ -ngl 99Notes
Feature Request
Consider adding official ROCm/HIP support to the build scripts. The two fixes above are trivial and would open Bonsai to the entire AMD GPU ecosystem. Happy to contribute a PR if interested.
Designed and built by the architect