Problem
quant.h (single-header) and src/engine/*.c (split sources / libturboquant) implement the same inference logic independently. They have diverged, causing:
Root cause of #77: tq_forward() in src/engine/tq_transformer.c has numerical instability at layer 7 (hidden state max=18,359) that does not exist in quant.h's identical function. This proves the two implementations have diverged.
Proposal: SQLite-style amalgamation
Make quant.h the single source of truth. Auto-generate split sources from it.
Current (broken)
quant.h ←── manual edits
↕ manual sync (humans port changes)
src/engine/*.c ←── manual edits, diverged
Proposed
quant.h ←── single source of truth
↓ tools/split_header.py (automated)
src/engine/tq_transformer.c ←── auto-generated
src/engine/tq_generate.c ←── auto-generated
src/engine/tq_model.c ←── auto-generated
...
Implementation sketch
# tools/split_header.py
# 1. Parse quant.h sections by marker comments:
# // --- SECTION: transformer ---
# // --- END SECTION ---
# 2. Extract each section into corresponding .c file
# 3. Generate appropriate #include headers
# 4. Add "DO NOT EDIT — auto-generated from quant.h" header
Benefits
- Zero sync bugs — split sources are always identical to quant.h
- Single place to fix bugs — fix in quant.h, regenerate
- GPU backends unaffected — Metal/CUDA kernels stay separate (they call into the generated code)
- CI can verify —
python tools/split_header.py && git diff --exit-code src/engine/
Migration path
- Add section markers to
quant.h (non-breaking)
- Write
split_header.py
- Regenerate
src/engine/*.c from quant.h
- Verify: same binary output for SmolLM2, Phi-3.5
- Add CI step: fail if generated files are stale
- Delete manual
src/engine/*.c from version control (generated only)
Precedent
- SQLite:
sqlite3.c amalgamation is the canonical source; split files are for development
- stb libraries: single-header is the distribution format
- Dear ImGui: single compilation unit is the recommended build
Impact
This would immediately fix #77 and prevent all future sync-divergence bugs. The quant.h path is proven correct (35/35 tests, coherent output for SmolLM2 + Phi-3.5 + Llama). Making it authoritative eliminates an entire class of bugs.
Proposed by ClawTeam based on root-cause analysis of #77
Problem
quant.h(single-header) andsrc/engine/*.c(split sources / libturboquant) implement the same inference logic independently. They have diverged, causing:quant-server(libturboquant) but works perfectly viaquant.hquant.hto split sources, introducing a regressionRoot cause of #77:
tq_forward()insrc/engine/tq_transformer.chas numerical instability at layer 7 (hidden state max=18,359) that does not exist inquant.h's identical function. This proves the two implementations have diverged.Proposal: SQLite-style amalgamation
Make
quant.hthe single source of truth. Auto-generate split sources from it.Current (broken)
Proposed
Implementation sketch
Benefits
python tools/split_header.py && git diff --exit-code src/engine/Migration path
quant.h(non-breaking)split_header.pysrc/engine/*.cfromquant.hsrc/engine/*.cfrom version control (generated only)Precedent
sqlite3.camalgamation is the canonical source; split files are for developmentImpact
This would immediately fix #77 and prevent all future sync-divergence bugs. The quant.h path is proven correct (35/35 tests, coherent output for SmolLM2 + Phi-3.5 + Llama). Making it authoritative eliminates an entire class of bugs.
Proposed by ClawTeam based on root-cause analysis of #77