fix(decode): ship the confidence floor OFF by default (#688 regression) by FuJacob · Pull Request #693 · FuJacob/cotabby

FuJacob · 2026-06-12T05:37:15Z

Summary

#688 turned on a -1.5 mean-per-token-logprob confidence floor by default, which silently suppresses completions the model was "unsure" about. It won an offline eval sweep, but that golden set badly under-represented real free-form typing. In production logs the same floor withheld ~56% of completions the moment a build with it started running:

window	generations	suppressed `lowConfidence`
06-11 (build without it active)	434	0%
06-12 (rebuilt off latest main)	97	56%

Most suppressed completions were perfectly usable (passed and dropped completions sat on either side of -1.5 with no quality difference). Under streaming the gate also paints a partial and then clears it, which reads as suggestions flickering away. The floor additionally forces per-token logprob computation, adding latency.

This sets defaultConfidenceFloor to -.infinity, so the gate and its logprob cost are off unless explicitly opted into via the existing cotabbyConfidenceFloorOverride default. The eval harness still sets that key to measure with the gate on. The floor should be recalibrated against a representative real-usage distribution before being re-enabled by default.

Validation

xcodebuild ... build -derivedDataPath build/DerivedData      # ** BUILD SUCCEEDED **
xcodebuild ... test -only-testing:CotabbyTests/LlamaDecodeGateDefaultsTests \
  -only-testing:CotabbyTests/LlamaSuggestionEngineCancellationTests \
  -only-testing:CotabbyTests/ModelAndPresentationValueTests \
  -only-testing:CotabbyTests/SuggestionQualityMetricsStoreTests \
  CODE_SIGNING_ALLOWED=NO CODE_SIGNING_REQUIRED=NO          # ** TEST SUCCEEDED **
swiftlint lint --quiet                                       # exit 0

Added test_confidenceFloor_shippedOff_byDefault to lock the off-by-default decision (asserts both the constant and the resolved value are -.infinity).

Linked issues

Risk / rollout notes

Pure default change: defaultConfidenceFloor -1.5 → -.infinity. No schema/settings migration. The lowConfidence suppression path, the override key, and the eval harness wiring are all untouched and still function when the override is set.
Behavior change vs. current main: the confidence gate stops firing for everyone by default, so coverage goes back up (no more vanishing/flickering suggestions) and the per-token logprob cost is removed.

Greptile Summary

Reverts the confidence gate from an aggressive on-by-default -1.5 floor (introduced in #688) back to disabled (-.infinity) after production data showed it suppressing ~56% of completions with no measurable quality benefit.

LlamaSuggestionEngine.swift: defaultConfidenceFloor changed from -1.5 to -.infinity; the doc comment is updated with the production evidence that motivated the revert, the eval context, and the opt-in path via cotabbyConfidenceFloorOverride.
LlamaDecodeGateDefaultsTests.swift: New test_confidenceFloor_shippedOff_byDefault regression-locks both the raw constant and resolvedConfidenceFloor to -.infinity, so any future accidental re-enable of the default will surface immediately.

Confidence Score: 5/5

Safe to merge — one constant flipped to -.infinity, all other paths untouched.

The change is a single-constant revert backed by production log evidence and a dedicated regression-lock test. The suppression path, the override key, and the eval harness wiring are all unchanged. The new test correctly uses XCTAssertEqual with -.infinity, which compares exactly under IEEE 754. No logic paths were added or removed.

No files require special attention.

Important Files Changed

Filename	Overview
Cotabby/Services/Runtime/LlamaSuggestionEngine.swift	Changed `defaultConfidenceFloor` from -1.5 to -.infinity (disabling the gate by default), with updated doc comment explaining the production regression. No logic changes beyond the constant.
CotabbyTests/LlamaDecodeGateDefaultsTests.swift	Added `test_confidenceFloor_shippedOff_byDefault` to lock both the constant and the resolved value at -.infinity, preventing silent reversion.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[makeGenerationOptions] --> B[resolvedConfidenceFloor]
    B --> C{cotabbyConfidenceFloorOverride set in UserDefaults?}
    C -- No --> D["defaultConfidenceFloor\n(-.infinity  ← this PR)\nwas: -1.5"]
    C -- Yes --> E[Use override value\ne.g. -0.8 for eval harness]
    D --> F{floor == -.infinity?}
    E --> F
    F -- Yes --> G[Gate OFF: no logprob computation, all completions passed through]
    F -- No --> H[Gate ON: per-token logprob computed, completions below floor suppressed]

_{Reviews (1): Last reviewed commit: "fix(decode): ship the confidence floor O..." | Re-trigger Greptile}

#688 enabled a -1.5 mean-logprob confidence floor by default. It won an offline eval sweep, but that golden set under-represented real typing: production logs show the same floor withholding ~56% of completions, most of them perfectly usable, and under streaming it paints a partial then clears it (suggestions appear to flicker away). It also forces per-token logprob computation, adding latency. Set defaultConfidenceFloor to -.infinity so the gate (and its logprob cost) are off unless opted into via cotabbyConfidenceFloorOverride. The eval harness sets that key to measure with the gate on. Recalibrate against a representative real-usage distribution before re-enabling by default.

FuJacob merged commit f4d2db1 into main Jun 12, 2026
4 checks passed

FuJacob deleted the fix/confidence-floor-off-by-default branch June 12, 2026 05:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(decode): ship the confidence floor OFF by default (#688 regression)#693

fix(decode): ship the confidence floor OFF by default (#688 regression)#693
FuJacob merged 1 commit into
mainfrom
fix/confidence-floor-off-by-default

FuJacob commented Jun 12, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FuJacob commented Jun 12, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Linked issues

Risk / rollout notes

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FuJacob commented Jun 12, 2026 •

edited by greptile-apps Bot

Loading