EOS is triggered after only several output tokens for a run of Qwen3.5 - possible bug

**Describe the Issue**
I post it here cause it might be due to a bug. I have currently running a model Qwen3.5 which outputs only one or several words (tokens) at a time only and in terminal I see e.g.:

"Generating (11 / 4086 tokens) " "EOS token triggered!" "CtxLimit 5606/20480"; then  I've unchecked 'Trim Sentences' in web GUI and got 2 tokens instead of 11 for the same prompt.

I've opened same model file in another window (network address) and it outputs long text (ok). 

I recall I set some parameters in GUI but don't recall which. So I've copied from terminal initialization outputs (up to "Please connect to custom...") from problematic run and run with default settings to text files and run `git diff`. The only differences are available memory (more for problematic - started earlier), more context for problematic (I recall increasing context size) and start_cache=5 in problematic vs =0 in default. (there were also git differences in folder name in /tmp/, lines with port number obviously and sched_reserve took xxx ms and number of CPU threads)

I've run another instance with start cache enabled and context set to same value of 20480 and observed long output (ok). I don't know what else to try to pinpoint possible bug. I think I can keep "buggy" run open for couple of days more to investigate, what do you advice?

**Additional Information:**
Linux Mint 21
KoboldCpp no-cuda v1.110
unsloth-Qwen3.5-9B-UD-Q5_K_XL
hardware: use CPU


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EOS is triggered after only several output tokens for a run of Qwen3.5 - possible bug #2094

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

EOS is triggered after only several output tokens for a run of Qwen3.5 - possible bug #2094

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions