feat: quant-server-unified — server built directly on quant.h by unamedkr · Pull Request #79 · quantumaikr/quant.cpp

unamedkr · 2026-04-12T07:48:14Z

Summary

New server binary that compiles against quant.h directly, eliminating the sync-divergence bug between quant.h and libturboquant (#77, #78).

Problem

quant-server (libturboquant) produces garbage for SmolLM2-1.7B due to numerical instability at layer 7 (max=18,359). The same model works perfectly via quant.h. Root cause: split sources have diverged from quant.h.

Solution

Single-file server (tools/quant_server_unified.c) that #include "quant.h" directly:

cc -O2 -o quant-server-unified tools/quant_server_unified.c -lm -lpthread
./quant-server-unified model.gguf -p 8080 -j 8

Benchmark (Apple M3, 16GB)

Model	libturboquant server	unified server
SmolLM2-1.7B	GARBAGE (layer 7 explosion)	23 tok/s
Phi-3.5-mini	CRASH / garbage	6.5 tok/s

Features

Test plan

SmolLM2-1.7B: 56 tokens, coherent output, 23 tok/s
Phi-3.5-mini: 60 tokens, coherent output, 6.5 tok/s
SSE streaming: correct chunk format, [DONE] signal
Health check: returns version
CORS: preflight OPTIONS returns 204

Fixes #77
Refs #78

🤖 Generated with Claude Code

New server binary that compiles against quant.h instead of libturboquant, eliminating the sync-divergence bug (#77, #78). Key results (Apple M3, 16GB): SmolLM2-1.7B: 23 tok/s (was: garbage via libturboquant) Phi-3.5-mini: 6.5 tok/s (was: crash or garbage via libturboquant) Build: cc -O2 -o quant-server-unified tools/quant_server_unified.c -lm -lpthread Features: - OpenAI-compatible API (/v1/chat/completions, /v1/models, /health) - SSE streaming (stream: true) - CORS headers - Auto-detect Phi-3 chat template vs ChatML - Template token filtering (<|im_end|>, <|end|>, etc.) - Mutex-serialized inference (safe for concurrent HTTP clients) - Graceful port-in-use error No libturboquant dependency. No Metal/CUDA (pure CPU NEON). Single file, zero external dependencies beyond libc. Fixes #77 (SmolLM2 numerical instability in libturboquant) Refs #78 (quant.h as single source of truth) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

unamedkr merged commit 27671f5 into main Apr 12, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: quant-server-unified — server built directly on quant.h#79

feat: quant-server-unified — server built directly on quant.h#79
unamedkr merged 1 commit intomainfrom
feat/unified-server

unamedkr commented Apr 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

unamedkr commented Apr 12, 2026

Summary

Problem

Solution

Benchmark (Apple M3, 16GB)

Features

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant