Responsive lifecycle: anchor reuse cache + speculative post-acceptance prefetch by FuJacob · Pull Request #689 · FuJacob/cotabby

FuJacob · 2026-06-12T02:58:27Z

Summary

Stacked on #688; the final part of the quality stack. Two lifecycle changes that remove model round-trips from the moments users feel most, both behind kill switches:

Anchor reuse cache. A bounded, string-only memory of recent suggestions (16 entries, 180s expiry, SuggestionAnchorCache). When the live preceding-text tail equals a cached anchor's tail plus the first k characters of its suggestion (k strictly short of the whole text, so an accepted suggestion never re-offers its own tail), the prediction cycle re-shows the remainder instantly: no debounce, no generation. One match rule covers backspace rollback onto a covered position, retyping suggested characters after an unrelated invalidation, and returning to an unchanged field. Restores re-check only the guards that depend on current field state (trailing duplication, stale-accept echo, selection, secure); the text itself is a suffix of a suggestion that already passed the normalizer and seam guard on this exact path. Suggestions are recorded at display time and again at invalidation (the dying session is exactly what a rollback wants back); corrections are never cached. Kill switch: cotabbyAnchorReuseDisabled.
Speculative post-acceptance prefetch. Accepting the final chunk previously idled through the host-publish poll (10-400ms) before the next generation could even start, which is why the 0.8s post-exhaustion Tab window had to paper over a stall. Cotabby knows exactly what it just typed, so it now starts the continuation immediately against the expected post-insert snapshot (SpeculativeAcceptanceContext). The existing publish poll becomes the validator: a publish whose content signature matches the speculation stands down and lets the in-flight result land (apply admits it through the stale-generation guard via the same signature, single-use); any divergence (autocorrect, IME transformation, a sliding context window) schedules the normal regeneration, whose newer work id retires the speculation automatically. Wrong speculation costs one discarded background generation; right speculation removes the publish wait plus a debounce from the visible gap. Kill switch: cotabbySpeculativePrefetchDisabled.

Validation

xcodebuild build-for-testing ... CODE_SIGNING_ALLOWED=NO   # ** TEST BUILD SUCCEEDED **
xcodebuild test-without-building ...                       # FULL suite: 1146 tests, 1141 passed, 0 failed, 5 skipped (gated evals)
swiftlint lint --quiet <changed files>                     # exit 0
xcodegen generate                                          # pbxproj additions only

New coverage: 11 cache-rule tests (type-through, rollback, never-reoffer-consumed, divergence, cross-field isolation, deepest-match precedence, expiry, capacity, bounded-tail matching); 4 optimistic-snapshot tests (UTF-16 caret advance incl. surrogate pairs, signature equality with an identical real publish, signature divergence under host transformation); 3 coordinator tests (restore reaches .ready with the engine stub that throws on any generation, proving zero model work; a speculative result with a stale generation applies exactly when the live content matches its signature and the exemption is single-use; stale results without the exemption still drop). One existing test updated to the new contract: a final-chunk accept now leaves the coordinator generating (speculatively) rather than idle.

The suggestion eval is unaffected by construction (it drives the engine directly, not the coordinator lifecycle); end-to-end behavior needs dogfooding, which the kill switches and the anchor-restore / speculative-generating / speculation-validated JSONL stages make directly observable.

Linked issues

Refs #546 (perceived latency: both changes remove round-trips from the felt path).

Risk / rollout notes

The speculation changes the stale-result guard in apply from generation-equality to generation-equality OR single-use signature match; the signature covers selection, preceding text, trailing text, and secure state, so an admitted result is current by content even when the generation counter moved underneath it.
A speculative generation that loses its bet is cancelled-or-discarded work the old code did not do; on slow machines this is bounded by one in-flight generation and the same work-id supersession as any keystroke.
Restores log anchor-restore with the shown text; a bad restore is diagnosable from one JSONL line and disappears with defaults write com.jacobfu.tabby cotabbyAnchorReuseDisabled -bool true.
Dogfood before release: watch speculative-generating vs speculation-validated ratio in the debug JSONL; persistent mismatches in a host mean that app transforms inserts and speculation should be disabled there (follow-up if observed).

Greptile Summary

This PR adds two latency-reduction features behind kill switches: a SuggestionAnchorCache that re-shows cached suggestions instantly on backspace rollback, type-through re-entry, and field return; and a speculative post-acceptance prefetch that starts the next generation against an optimistic post-insert snapshot instead of waiting for the host-publish poll. It also ships adaptive debounce, argmax-EOG early stop, per-token confidence gating with LlamaGenerationOutput, layout-estimated overlay drift suppression, and an always-on quality metrics counter store surfaced in the Performance pane.

Anchor reuse cache (SuggestionAnchorCache): 16-entry, 180s TTL, identityKey-isolated; match rule is liveTail == tail(anchorTail + fullText.prefix(k)) for k < fullText.count; invalidated sessions are snapshotted before teardown so backspace rollback recovers them.
Speculative prefetch (SpeculativeAcceptanceContext, dispatchSpeculativePostAcceptanceGeneration): builds an optimistic snapshot from the accepted insertion, fires immediately at delay 0, and uses a content-signature exemption in apply to admit the result when the real publish matches; divergence safely falls through to normal regen with work-ID supersession.
Quality metrics (SuggestionQualityMetricsStore): always-on counters for generated/shown/suppressed-by-reason/accepted, persisted to UserDefaults, displayed in the Performance pane with a Reset button.

Confidence Score: 4/5

Safe to merge with the quality-metrics counting fix addressed; both new lifecycle features are behind kill switches and the speculative path degrades gracefully on divergence.

The acceptance-rate metric in SuggestionQualityMetricsStore can exceed 1.0 because anchor-restored suggestions increment acceptedSuggestions (via recordSuggestionAcceptedIfFirstChunk) but never increment shown (no recordShown() call in restoreSuggestionFromAnchorCache). As anchor hits accumulate the displayed acceptance rate becomes meaningless — the precise diagnostic signal the quality pane is built around. Everything else — the anchor-match algorithm, the speculative signature lifecycle, the stale-result guard in apply, and the overlay stability gate change — is well-reasoned and well-tested.

SuggestionCoordinator+Prediction.swift — specifically the restoreSuggestionFromAnchorCache function (missing recordShown call) and the dispatchGeneration error-path cleanup for pendingSpeculativeSignature.

Important Files Changed

Filename	Overview
Cotabby/App/Coordinators/SuggestionCoordinator+Prediction.swift	Adds anchor-cache restore path and speculative post-acceptance generation; anchor restore skips recordShown() causing acceptance-rate inflation, and pendingSpeculativeSignature is not cleared in the engine-error branch of dispatchGeneration.
Cotabby/Support/SuggestionAnchorCache.swift	New bounded LRU-style cache (16 entries, 180s TTL) for instant suggestion re-show; match logic is correct for typical inputs but first-match return within an entry is theoretically unsound for prefix-tail-length-width preceding text with repeated patterns.
Cotabby/Models/SuggestionQualityMetricsStore.swift	New always-on quality counters persisted to UserDefaults; counters are consistent across router, coordinator, and engine paths except that anchor-restored suggestions are not counted in shown but their acceptances are counted in acceptedSuggestions.
Cotabby/App/Coordinators/SuggestionCoordinator+Acceptance.swift	Wires dispatchSpeculativePostAcceptanceGeneration into the final-chunk accept path; correctly records invalidated sessions into the anchor cache before teardown; layoutEstimated overlay skip-slide logic is sound.
Cotabby/App/Coordinators/SuggestionCoordinator+Input.swift	Adds speculation-validation early-return in the publish poll; correctly keeps pendingSpeculativeSignature alive for apply() to consume while clearing it on any divergence before rescheduling.
Cotabby/Support/SpeculativeAcceptanceContext.swift	New utility that builds the optimistic post-insertion snapshot; correctly advances precedingText and selection by UTF-16 length; well-tested for surrogate pairs and autocorrect divergence.
Cotabby/Support/DebouncePolicy.swift	New adaptive debounce policy keyed on last-generation latency; three thresholds (15/25/55ms) are well-tested and fall back to the configured value without prior latency data.
Cotabby/Services/Runtime/LlamaRuntimeCore.swift	Adds argmax-EOG early stop and LlamaGenerationOutput struct wrapping text + logprob confidence signals; confidence-suppressed completions now carry the real suppression reason instead of appearing as empty output.
Cotabby/Support/SuggestionOverlayStabilityGate.swift	Skips caret-drift re-anchor check for layout-estimated geometry, preventing the post-accept jerk on TextKit-mirror hosts; frame/text/field changes still force re-anchor correctly.

Sequence Diagram

sequenceDiagram
    participant User
    participant Coordinator
    participant AnchorCache
    participant WorkController
    participant Engine
    participant HostPublishPoll

    Note over User,HostPublishPoll: Anchor Reuse Cache path
    User->>Coordinator: keystroke / focus return
    Coordinator->>AnchorCache: remainder(identityKey, precedingText)
    AnchorCache-->>Coordinator: remainder (cache hit)
    Coordinator->>Coordinator: restoreSuggestionFromAnchorCache()
    Coordinator-->>User: overlay shown instantly (no model round-trip)

    Note over User,HostPublishPoll: Speculative Post-Acceptance Prefetch path
    User->>Coordinator: Tab (accept final chunk)
    Coordinator->>Coordinator: armPostExhaustionAcceptance()
    Coordinator->>Coordinator: dispatchSpeculativePostAcceptanceGeneration()
    Coordinator->>AnchorCache: record(invalidated session)
    Coordinator->>WorkController: replaceDebouncedWork(delay:0)
    WorkController->>Engine: generateSuggestion(optimisticSnapshot)
    Note right of Coordinator: pendingSpeculativeSignature = contentSig

    HostPublishPoll->>Coordinator: reconcileHostPublish(baseline)
    alt Host published exactly the speculated text
        Coordinator->>Coordinator: speculation-validated, stand down
    else Host diverged (autocorrect / IME)
        Coordinator->>Coordinator: "pendingSpeculativeSignature = nil"
        Coordinator->>WorkController: schedulePrediction()
    end

    Engine-->>Coordinator: apply(result)
    alt isPaidOffSpeculation (sig matches)
        Coordinator-->>User: overlay shown (speculation won)
    else stale generation
        Coordinator->>Coordinator: discard result
    end

_{Reviews (3): Last reviewed commit: "review: keep apply within the complexity..." | Re-trigger Greptile}

…-entry Bounded, string-only memory of recent suggestions (16 entries, 180s expiry). One match rule covers the three common editing moments that previously paid debounce plus a full model round-trip: deleting a typo back onto a covered position, retyping the suggested characters after an unrelated invalidation, and returning to an unchanged field. The live preceding-text tail must equal a cached anchor's tail plus the first k characters of its suggestion, k strictly short of the whole text so a fully accepted suggestion never re-offers its own tail.

…d start the post-acceptance generation speculatively Two lifecycle changes that remove model round-trips from the visible path: Anchor restore: the prediction cycle consults the cache after the typo gate; a hit re-shows instantly with zero engine work, after re-checking only the guards that depend on current field state (trailing duplication, stale-accept echo, selection, secure). The restored text is a suffix of a suggestion that already passed the normalizer and seam guard on this exact text path. cotabbyAnchorReuseDisabled is the kill switch. Speculative post-acceptance prefetch: accepting the final chunk previously idled through the host-publish poll (10-400ms) before the next generation could even start, papered over by the 0.8s post-exhaustion Tab window. Cotabby knows exactly what it just typed, so it now starts the continuation immediately against the expected post-insert snapshot. The publish poll becomes the validator: a publish matching the speculation's content signature stands down and lets the in-flight result land (apply admits it via the same signature, single-use); any divergence (autocorrect, IME transformation, a sliding context window) schedules the normal regeneration, whose newer work id retires the speculation automatically. Wrong speculation costs one discarded generation; right speculation removes the publish wait plus a debounce from the gap. cotabbySpeculativePrefetchDisabled is the kill switch.

…tion-free anchor matching The speculative post-acceptance generation now applies the same pre-generation gates as the normal path, without their UI side effects: it declines on too-little text and on a trailing typo when suppression is enabled, leaving correction semantics to the post-publish regeneration. It also consumes the pinned clipboard verdict instead of re-filtering against the optimistic prefix, which could flip the verdict and rewrite the prompt head mid-session against the KV prefix continuity the pin exists to protect. The anchor match scan now builds its character buffers once per entry and compares slices, instead of allocating a candidate string per position per entry.

greptile-apps Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread Cotabby/Support/SuggestionAnchorCache.swift

Comment thread Cotabby/App/Coordinators/SuggestionCoordinator+Prediction.swift

FuJacob force-pushed the quality/responsive-lifecycle branch 2 times, most recently from c177719 to 851b720 Compare June 12, 2026 03:27

FuJacob force-pushed the quality/decode-gates-and-telemetry branch 3 times, most recently from 8f36e25 to 98ca097 Compare June 12, 2026 03:57

FuJacob added 3 commits June 11, 2026 21:59

FuJacob force-pushed the quality/responsive-lifecycle branch from 851b720 to 00f8073 Compare June 12, 2026 05:01

review: keep apply within the complexity budget as its guard chain grew

6823ff7

FuJacob changed the base branch from quality/decode-gates-and-telemetry to main June 12, 2026 05:03

FuJacob merged commit 0d3177f into main Jun 12, 2026

FuJacob mentioned this pull request Jun 12, 2026

Kill the runs-aligned accept bounce at all four layers #695

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Responsive lifecycle: anchor reuse cache + speculative post-acceptance prefetch#689

Responsive lifecycle: anchor reuse cache + speculative post-acceptance prefetch#689
FuJacob merged 4 commits into
mainfrom
quality/responsive-lifecycle

FuJacob commented Jun 12, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FuJacob commented Jun 12, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Linked issues

Risk / rollout notes

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FuJacob commented Jun 12, 2026 •

edited by greptile-apps Bot

Loading