Releases: sumitchatterjee13/uno.cpp
Releases · sumitchatterjee13/uno.cpp
Uno.cpp v1.1.2
What's Changed
Fixed: App no longer requires CUDA to launch — Previously, llama-server.exe would fail to start without cublas64_13.dll and cublasLt64_13.dll present, even when running in CPU-only mode (-ngl 0). The CUDA backend is now loaded dynamically at runtime, so the app launches cleanly on systems without NVIDIA GPUs or CUDA installed.
Improved: Automatic CPU optimization — The build now ships multiple CPU backend variants that auto-detect your processor's best instruction set at runtime:
| CPU Feature | Example CPUs |
|---|---|
| SSE 4.2 | Core 2nd/3rd gen |
| AVX | Sandy Bridge+ |
| AVX2 + FMA | Haswell, Ryzen+ |
| AVX-512 | Skylake-X, EPYC |
| AVX-VNNI | Alder Lake+ |
Download
Grab Unocpp-Setup-v1.1.2.exe below and run the installer.
First time? You'll also need a GGUF model file — download one from HuggingFace.
Build Info
- Built with
GGML_BACKEND_DL=ONandGGML_CPU_ALL_VARIANTS=ON - CUDA 13.0 / MSVC 14.44
Uno.cpp v1.0.1
First release of Uno.cpp — Un-official llama.cpp with Sarvam-30B (sarvam_moe) architecture support.
How to use
- Download and run
Unocpp-Setup-v1.0.1.exebelow - Download a model from https://huggingface.co/Sumitc13/sarvam-30b-GGUF
- Launch Uno.cpp → pick the .gguf file → chat!
What's included
- llama-server.exe built with CUDA 13.0 (RTX 50-series native support)
- GUI launcher with model picker and settings
- Sarvam MoE architecture support (PR #20275)