Skip to content

Qwen3.5-4B: quant_generate works but quant_ask produces empty/garbage output #69

@unamedkr

Description

@unamedkr

Description

Qwen3.5-4B (DeltaNet hybrid) loads successfully and produces coherent output via quant_generate, but quant_ask returns empty/garbage tokens. This breaks the server API since quant-server uses quant_ask for non-streaming requests.

Evidence

quant_generate (CLI) — WORKS

$ ./qwen35_test Qwen3.5-4B-Q4_K_M.gguf "Hello, who are you?"

--- response ---
I am Qwen3.5, a large language model developed by Alibaba Cloud.
I can help you with various tasks such as answering questions,
solving problems, and generating content.

quant_ask (server) — BROKEN

$ curl localhost:8080/v1/chat/completions \
  -d '{"messages":[{"role":"user","content":"What is gravity?"}]}'

{"content": "  -\n.\n- \n1. -  "}   # empty/whitespace tokens

Streaming via server — ALSO BROKEN

data: {"delta":{"content":"!"}}
data: {"delta":{"content":"!"}}
data: {"delta":{"content":"!"}}
# Repeating "!" tokens

Root Cause Hypothesis

quant_ask and quant_generate may handle the prompt/tokenization differently. Possible causes:

  1. quant_ask may apply its own chat template that conflicts with the ChatML template already in the prompt
  2. quant_ask may not properly initialize DeltaNet state (conv buffer, delta state) between calls
  3. The BOS token handling may differ between the two paths

Steps to Reproduce

// This works:
quant_generate(ctx, "<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\n", cb, NULL);

// This produces garbage:
char* result = quant_ask(ctx, "<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\n");

Environment

  • quant.cpp: latest main (1e1ea2c)
  • Model: unsloth/Qwen3.5-4B-GGUF (Q4_K_M, 2.6GB)
  • Architecture: qwen35 (DeltaNet hybrid, 8 attn + 24 DeltaNet layers)
  • OS: macOS 15 (Apple M3, 16GB)

Reported by ClawTeam Claw-4 (Optimizer) + Claw-5 (Researcher)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions