Skip to content

Decode-quality primitives: scaffolding-token mask and argmax-is-EOG stop signal#10

Merged
FuJacob merged 1 commit into
mainfrom
engine/decode-quality-primitives
Jun 12, 2026
Merged

Decode-quality primitives: scaffolding-token mask and argmax-is-EOG stop signal#10
FuJacob merged 1 commit into
mainfrom
engine/decode-quality-primitives

Conversation

@FuJacob

@FuJacob FuJacob commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Summary

Two additions for autocomplete decode quality, both zero cost on the hot sampling path and both source-compatible with the app's existing call sites (SamplingConfig untouched; SampleResult gains an appended field that callers only read).

Scaffolding-token mask (load time). buildTokenMasks now probes each token's special-rendered piece and adds single-token chat/instruct/FIM scaffolding (<|im_end|>, <start_of_turn>, [INST], FIM marker families) to the existing -inf logit-bias table when the GGUF ships them without the control attribute, plus an unflagged BOS. EOG tokens stay exempt so natural stops keep firing. Well-formed GGUFs flag these as control already, so the expected count is 0 on the catalog models; this is insurance against vocabularies that do not, surfaced via getMaskedScaffoldingTokenCount for tests and diagnostics.

argmax_is_eog on SampleResult. True when the raw distribution's single most-likely token at this position is an end-of-generation token. Stochastic sampling can draw past the point where the model wants to stop; this lets the caller detect the stop intent on the exact step it appears and finalize cleanly. Computed in C++ while the logits row is hot: one O(vocab) pass per sampled token (tens of microseconds; the row is unmutated because llama_sampler_sample works on a copied candidate array). The seed token's verdict is captured at decodePrompt while its logits row is still resident, mirroring seed_logprob.

Validation

swift build                       # clean
COTABBY_TEST_MODEL_PATH=<SmolLM2-135M> swift test
# 25 tests, 0 failures (includes 3 new: masked-count default,
# scaffolding-marker pieces never sampled at temp 1.8,
# argmax_is_eog == isEndOfGenerationToken(token) under greedy)
COTABBY_TEST_MODEL_PATH=<Qwen3.5-0.8B-Base> swift test --filter 'Argmax|Scaffolding'
# both new model-bound tests pass on a catalog base model

Note: testEndToEndWithModel has a pre-existing, model-specific failure on Qwen3.5-0.8B-Base (partial trimKV returns false on that model's attention layout; KV count assertion follows). Verified identical on unmodified main; unrelated to this change and the app already falls back to a fresh prompt build when trimKV reports failure.

Risk / rollout

Mask additions only ever remove tokens that should never appear as autocomplete text; EOG exemption is covered by tests. The argmax flag is informational until a consumer opts in app-side.

…top signal

Two zero-hot-loop-cost additions for autocomplete decode quality:

- buildTokenMasks now probes each token's special-rendered piece and hard-masks
  single-token chat/instruct/FIM scaffolding (<|im_end|>, <start_of_turn>, [INST],
  FIM families) that the GGUF did not flag as a control token, plus an unflagged
  BOS. EOG tokens stay exempt so natural stops keep firing. Well-formed GGUFs
  already flag these as control, so the common count is 0; the rule is insurance
  against vocabularies that ship them unflagged. Exposed via
  getMaskedScaffoldingTokenCount for tests/diagnostics.

- SampleResult gains argmax_is_eog: whether the raw distribution's single
  most-likely token at this position is an end-of-generation token. Stochastic
  sampling can draw past the point where the model wants to stop; this lets
  callers detect that stop intent on the exact step it appears. Computed in C++
  while the logits row is hot (one O(vocab) pass per token, tens of microseconds);
  the seed token's verdict is captured at decodePrompt while its row is still
  resident. Field is appended, so existing Swift call sites that only read
  members keep compiling; SamplingConfig is untouched.
@FuJacob FuJacob merged commit ee7a496 into main Jun 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant