gfx1151
Here are 17 public repositories matching this topic...
The definitive Strix Halo LLM guide — 65 t/s on a $2,999 mini PC. Live benchmarks, tested optimizations, and everything that doesn't work.
-
Updated
Apr 26, 2026 - Shell
vLLM Qwen 3.6-27B (AWQ-INT4) + DFlash speculative decoding on AMD Strix Halo (gfx1151 iGPU, 128 GB UMA, ROCm 7.13). 24.8 t/s single-stream, vision, tool calling, 256K context, OpenAI-compatible, Docker. Matches DGX Spark FP8+DFlash+MTP at a third of the cost. No CUDA.
-
Updated
Apr 27, 2026 - Python
vLLM + Qwen3.6-27B (BF16) OpenAI-compatible inference server on AMD Strix Halo (Ryzen AI Max+ 395, gfx1151). Vision input, 256K context, /v1/responses with separated reasoning, via TheRock ROCm.
-
Updated
Apr 26, 2026 - Python
llama.cpp + Qwen3.6-27B (Q8_0 GGUF) OpenAI-compatible inference server on AMD Strix Halo (Ryzen AI Max+ 395, gfx1151). 256K context, ~7.5 t/s decode via TheRock ROCm Docker.
-
Updated
Apr 26, 2026 - Python
Local, ternary-weight LLM inference on AMD Strix Halo. Rust above the kernels, HIP below, zero Python at runtime. https://discord.gg/EhQgmNePg
-
Updated
Apr 28, 2026 - HTML
Claude Code skill for AMD Strix Halo (Ryzen AI MAX+ 395) ML setup. Handles PyTorch installation (official wheels don't work with gfx1151), GTT memory config, and environment setup. Enables 30B parameter models.
-
Updated
Jan 23, 2026 - Python
ComfyUI on AMD Strix Halo (RDNA 3.5 / gfx1151) via Docker. Ubuntu Rolling + UV-managed Python 3.12 + ROCm preview wheels. Solves the silent CPU fallback Debian/Python 3.13 images hit on gfx1151.
-
Updated
Apr 26, 2026 - Dockerfile
Production-oriented Docker Compose stack serving openai/gpt-oss-20b via vLLM on AMD Strix Halo (gfx1151, ROCm 7.2). OpenAI Responses API, host-mounted weights, hard-capped KV cache. Verified, no source build.
-
Updated
Apr 26, 2026
Native ROCm C++ kernels for Strix Halo (gfx1151): ternary BitNet GEMV, RMSNorm, RoPE, split-KV Flash-Decoding attention. Zero hipBLAS, zero Python.
-
Updated
Apr 27, 2026 - C++
Docker infrastructure for AMD Strix Halo (RDNA 3.5 / gfx1151): PyTorch + ROCm base container and a separate Ollama LLM service. Two folders, two Compose files, one Strix Halo box.
-
Updated
Apr 26, 2026 - Shell
OpenAI-compatible /v1/embeddings server (BAAI/bge-m3, 1024 dims, 100+ langs) on AMD Strix Halo via ROCm. Drop-in replacement for OpenAI text-embedding-3, Docker, no API keys, ~47ms single-text latency.
-
Updated
Apr 26, 2026 - Python
Docker stack: Ollama v0.21.0 built from source against ROCm 7.2.2 with native gfx1151 (Strix Halo) — serves Gemma 4 up to 256K context on AMD Ryzen AI MAX+ 395 / Radeon 8060S. Includes a 9-layer make validate ladder for the host firmware, ROCm runtime, container, and long-context inference.
-
Updated
Apr 22, 2026 - Shell
Experimental local LLM API for AMD Strix Halo (gfx1151) on ROCm 7.10 (TheRock). Two-service split: vLLM inference engine + FastAPI gateway with OpenAI protocol normalization, auth, management. Docker Compose.
-
Updated
Apr 26, 2026 - Python
Unlock fast, local LLM inference on AMD-powered mini PCs delivering 65-87 t/s for large models without cloud or subscription costs
-
Updated
Apr 28, 2026 - Shell
Improve this page
Add a description, image, and links to the gfx1151 topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the gfx1151 topic, visit your repo's landing page and select "manage topics."