Skip to content

Bonsai 8B running on AMD Strix Halo (gfx1151) via ROCm/HIP — 76.7 t/s #19

@ghost

Description

Bonsai 8B on AMD Strix Halo — ROCm/HIP Build & Benchmark

Successfully built and ran Bonsai 8B on AMD Strix Halo (gfx1151) using the PrismML llama.cpp fork with ROCm/HIP. Sharing the build fix and benchmark results for anyone else on AMD hardware.

Hardware

Component Detail
APU AMD Strix Halo (Ryzen AI Max+ 395)
GPU Radeon 8060S (gfx1151)
Memory 128GB unified (GPU/system shared)
ROCm 7.2 (AMD clang 22.0.0)
PyTorch 2.9.1+rocm7.11.0
OS Arch Linux (kernel 6.19.9)

Benchmark Results

Metric CPU-only ROCm/HIP (GPU)
Prompt 3.3 t/s 81.4 t/s
Generation 2.3 t/s 76.7 t/s

Model loads entirely into GPU memory at 1.15 GB — leaves plenty of headroom on the 128GB unified pool.

Build Fix for ROCm/HIP

The PrismML llama.cpp fork doesn't officially support ROCm yet, but the CUDA sources hipify cleanly with two fixes:

Problem 1: Missing hip_fp16.h

fatal error: 'hip/hip_fp16.h' file not found

Fix: Add -I/opt/rocm/include to HIP compiler flags.

Problem 2: Missing libamdhip64

ld.lld: error: unable to find library -lamdhip64

Fix: Add -L/opt/rocm/lib to linker flags.

Full CMake Command

CMAKE_PREFIX_PATH=/opt/rocm HIP_PATH=/opt/rocm cmake -B build \
  -DGGML_HIP=ON \
  -DAMDGPU_TARGETS=gfx1151 \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_HIP_COMPILER=/opt/rocm/llvm/bin/clang++ \
  -DCMAKE_HIP_FLAGS="-I/opt/rocm/include" \
  -DCMAKE_EXE_LINKER_FLAGS="-L/opt/rocm/lib" \
  -DCMAKE_SHARED_LINKER_FLAGS="-L/opt/rocm/lib"

cmake --build build -j$(nproc)

Run Inference

./build/bin/llama-cli \
  -m Bonsai-8B-Q1_0_g128.gguf \
  -p "Your prompt here" \
  -n 256 \
  --temp 0.5 \
  --top-p 0.85 \
  --top-k 20 \
  -ngl 99

Notes

  • All Q1_0_g128 CUDA kernels compiled cleanly through HIP without modification
  • AOTriton is used for attention (confirmed in logs)
  • The 1-bit model at 1.15 GB is an ideal fit for unified memory architectures like Strix Halo
  • NPU (XDNA 2, 50 TOPS) could be an interesting future target for 1-bit inference

Feature Request

Consider adding official ROCm/HIP support to the build scripts. The two fixes above are trivial and would open Bonsai to the entire AMD GPU ecosystem. Happy to contribute a PR if interested.

Designed and built by the architect

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions