This guide explains how to use Local-Bench with AMD Ryzen AI Max "Strix Halo" GPUs using the STRIX Halo toolbox integration.
Local-Bench now supports benchmarking LLMs on AMD STRIX Halo GPUs through integration with the AMD STRIX Halo Llama.cpp Toolboxes. This provides optimized llama.cpp builds with both ROCm and Vulkan backends specifically tuned for STRIX Halo hardware.
- AMD Ryzen AI Max "Strix Halo" APU with integrated GPU
- Operating System: Fedora 42/43 or Ubuntu 24.04 (Debian-based systems supported)
- Linux Kernel: 6.18.3-200 or compatible (recommended for stability)
- Linux Firmware: 20251111 or newer (avoid 20251125 - known to cause issues)
- Toolbox: Container management system (podman-toolbox)
sudo dnf install toolboxsudo apt install podman-toolboxFirst, verify that your STRIX Halo GPU is detected:
npm run strix-halo detectThis command will:
- Detect AMD STRIX Halo GPU
- Check for ROCm installation
- Check for Vulkan support
- Display system specifications
View all available STRIX Halo toolboxes:
npm run strix-halo list-toolboxesAvailable toolboxes include:
ROCm Backends:
llama-rocm-6.4.4- ROCm 6.4.4llama-rocm-7.1.1- ROCm 7.1.1llama-rocm-7.2- ROCm 7.2 (recommended)llama-rocm7-nightlies- ROCm 7 nightly builds
Vulkan Backends:
llama-vulkan-radv- RADV Vulkan driverllama-vulkan-amdvlk- AMDVLK Vulkan driver
Create and configure a toolbox (recommended: ROCm 7.2):
npm run strix-halo setup llama-rocm-7.2Or setup all toolboxes at once:
npm run strix-halo setup-allNote: Setting up toolboxes requires downloading container images, which may take several minutes depending on your internet connection.
Download a GGUF model from HuggingFace. Example using Qwen model:
# Create models directory
mkdir -p models/qwen3-coder-30B
# Download model (requires huggingface-cli)
pip install huggingface-hub hf-transfer
# Download with accelerated transfer
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download \
unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF \
BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf \
BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00002-of-00002.gguf \
--local-dir models/qwen3-coder-30B/Benchmark the model with STRIX Halo optimizations:
npm run strix-halo benchmark models/qwen3-coder-30B/BF16/Qwen3-Coder-30B-A3B-Instruct-BF16-00001-of-00002.gguf \
--toolbox llama-rocm-7.2 \
--flash-attentionnpm run strix-halo detectDetects AMD STRIX Halo GPU and displays system information including:
- GPU model
- ROCm availability
- Vulkan support
- CPU, memory, and OS details
npm run strix-halo list-toolboxesShows all available STRIX Halo toolboxes and their installation status.
npm run strix-halo setup <toolbox-name>Creates and configures a specific toolbox. Example:
npm run strix-halo setup llama-rocm-7.2npm run strix-halo setup-allCreates all STRIX Halo toolboxes. Useful for testing across different backends.
npm run strix-halo benchmark <model-path> [options]Options:
--toolbox <name>- Specify which toolbox to use (default: llama-rocm-7.2)--context <size>- Context size (default: 8192)--flash-attention- Enable flash attention (recommended, enabled by default)--no-mmap- Disable memory mapping (recommended, enabled by default)
Examples:
# Basic benchmark with default settings
npm run strix-halo benchmark models/model.gguf
# Benchmark with specific toolbox
npm run strix-halo benchmark models/model.gguf --toolbox llama-vulkan-radv
# Benchmark with custom context size
npm run strix-halo benchmark models/model.gguf --context 16384
# Benchmark with Vulkan backend
npm run strix-halo benchmark models/model.gguf --toolbox llama-vulkan-radvSTRIX Halo requires specific settings for optimal performance and stability:
- Flash Attention: Must be enabled (
-fa 1) - No Memory Mapping: Must be disabled (
--no-mmap) - GPU Layers: Offload maximum layers to GPU (
-ngl 99or-ngl 999)
These settings are enabled by default in the STRIX Halo benchmark commands.
ROCm Backends (recommended):
- Better performance for most workloads
- Supports more advanced features
- Use
llama-rocm-7.2for best stability
Vulkan Backends:
- Better compatibility
- Easier setup (no ROCm driver required)
- Use
llama-vulkan-radvfor best performance
- 8K context: Good balance for most tasks
- 16K context: For longer conversations
- 32K+ context: For very long documents (requires more VRAM)
If STRIX Halo GPU is not detected:
-
Check GPU is properly installed:
lspci | grep -i vga -
Ensure drivers are loaded:
lsmod | grep amdgpu -
Update firmware if needed (avoid 20251125):
sudo dnf downgrade linux-firmware
If toolbox creation fails:
-
Check podman is running:
systemctl --user status podman
-
Ensure you have network connectivity to pull images
-
Check disk space for container images
If benchmarks crash or hang:
- Ensure flash attention is enabled
- Enable no-mmap mode
- Reduce context size
- Try a different toolbox/backend
- Check system logs:
journalctl -xe
If ROCm backend doesn't work:
-
Check ROCm installation:
rocm-smi
-
Verify kernel version compatibility (6.18.3-200 recommended)
-
Try Vulkan backend as alternative:
npm run strix-halo setup llama-vulkan-radv
Critical: Do not use linux-firmware-20251125 - it breaks ROCm support.
To downgrade firmware on Fedora:
sudo dnf downgrade linux-firmware
# Select version 20251111 or earlierSTRIX Halo benchmarks are automatically integrated with Local-Bench:
- Results are saved to the same database as Ollama benchmarks
- System specifications include STRIX Halo detection
- Web interface displays all benchmark results together
To view results:
npm start
# Open http://localhost:3000You can run commands directly in toolboxes:
# Enter toolbox shell
toolbox enter llama-rocm-7.2
# Inside the toolbox, you can run llama.cpp commands directly
llama-cli --list-devices
llama-server -m /path/to/model.gguf -c 8192 -ngl 999 -fa 1 --no-mmapYou can create custom benchmark scripts using the STRIX Halo module:
import { benchmarkWithToolbox } from './strixHalo';
const result = await benchmarkWithToolbox({
modelPath: '/path/to/model.gguf',
toolboxName: 'llama-rocm-7.2',
contextSize: 8192,
flashAttention: true,
noMmap: true
});
console.log(`Tokens/sec: ${result.tokensPerSecond}`);- AMD STRIX Halo Toolboxes GitHub
- AMD STRIX Halo Toolboxes Documentation
- llama.cpp GitHub
- ROCm Documentation
- OS: Fedora 42/43 (recommended) or Ubuntu 24.04
- Kernel: 6.18.3-200
- Firmware: 20251111 (avoid 20251125)
- Toolbox: Latest version via dnf/apt
- AMD Ryzen AI Max (STRIX Halo) APU
- Integrated RDNA GPU (16GB+ unified memory recommended)
For issues specific to:
- STRIX Halo toolboxes: See kyuz0/amd-strix-halo-toolboxes
- Local-Bench integration: Create an issue in this repository
- llama.cpp: See ggerganov/llama.cpp