Skip to content

Releases: sumitchatterjee13/uno.cpp

Uno.cpp v1.1.2

23 Mar 04:42

Choose a tag to compare

What's Changed

Fixed: App no longer requires CUDA to launch — Previously, llama-server.exe would fail to start without cublas64_13.dll and cublasLt64_13.dll present, even when running in CPU-only mode (-ngl 0). The CUDA backend is now loaded dynamically at runtime, so the app launches cleanly on systems without NVIDIA GPUs or CUDA installed.

Improved: Automatic CPU optimization — The build now ships multiple CPU backend variants that auto-detect your processor's best instruction set at runtime:

CPU Feature Example CPUs
SSE 4.2 Core 2nd/3rd gen
AVX Sandy Bridge+
AVX2 + FMA Haswell, Ryzen+
AVX-512 Skylake-X, EPYC
AVX-VNNI Alder Lake+

Download

Grab Unocpp-Setup-v1.1.2.exe below and run the installer.

First time? You'll also need a GGUF model file — download one from HuggingFace.

Build Info

  • Built with GGML_BACKEND_DL=ON and GGML_CPU_ALL_VARIANTS=ON
  • CUDA 13.0 / MSVC 14.44

Uno.cpp v1.0.1

15 Mar 11:52

Choose a tag to compare

First release of Uno.cpp — Un-official llama.cpp with Sarvam-30B (sarvam_moe) architecture support.

How to use

  1. Download and run Unocpp-Setup-v1.0.1.exe below
  2. Download a model from https://huggingface.co/Sumitc13/sarvam-30b-GGUF
  3. Launch Uno.cpp → pick the .gguf file → chat!

What's included

  • llama-server.exe built with CUDA 13.0 (RTX 50-series native support)
  • GUI launcher with model picker and settings
  • Sarvam MoE architecture support (PR #20275)