Extreme Low-Bit Inference in Reasoning Models: Failure Modes and Targeted Recovery
loop-detection 2-bit chain-of-thought llm-inference llm-quantization hybrid-inference low-bit-reasoning quantized-reasoning reasoning-acceleration
-
Updated
Jul 3, 2026 - Python