Skip to content

Responsive lifecycle: anchor reuse cache + speculative post-acceptance prefetch#689

Merged
FuJacob merged 4 commits into
mainfrom
quality/responsive-lifecycle
Jun 12, 2026
Merged

Responsive lifecycle: anchor reuse cache + speculative post-acceptance prefetch#689
FuJacob merged 4 commits into
mainfrom
quality/responsive-lifecycle

Conversation

@FuJacob

@FuJacob FuJacob commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Summary

Stacked on #688; the final part of the quality stack. Two lifecycle changes that remove model round-trips from the moments users feel most, both behind kill switches:

  1. Anchor reuse cache. A bounded, string-only memory of recent suggestions (16 entries, 180s expiry, SuggestionAnchorCache). When the live preceding-text tail equals a cached anchor's tail plus the first k characters of its suggestion (k strictly short of the whole text, so an accepted suggestion never re-offers its own tail), the prediction cycle re-shows the remainder instantly: no debounce, no generation. One match rule covers backspace rollback onto a covered position, retyping suggested characters after an unrelated invalidation, and returning to an unchanged field. Restores re-check only the guards that depend on current field state (trailing duplication, stale-accept echo, selection, secure); the text itself is a suffix of a suggestion that already passed the normalizer and seam guard on this exact path. Suggestions are recorded at display time and again at invalidation (the dying session is exactly what a rollback wants back); corrections are never cached. Kill switch: cotabbyAnchorReuseDisabled.
  2. Speculative post-acceptance prefetch. Accepting the final chunk previously idled through the host-publish poll (10-400ms) before the next generation could even start, which is why the 0.8s post-exhaustion Tab window had to paper over a stall. Cotabby knows exactly what it just typed, so it now starts the continuation immediately against the expected post-insert snapshot (SpeculativeAcceptanceContext). The existing publish poll becomes the validator: a publish whose content signature matches the speculation stands down and lets the in-flight result land (apply admits it through the stale-generation guard via the same signature, single-use); any divergence (autocorrect, IME transformation, a sliding context window) schedules the normal regeneration, whose newer work id retires the speculation automatically. Wrong speculation costs one discarded background generation; right speculation removes the publish wait plus a debounce from the visible gap. Kill switch: cotabbySpeculativePrefetchDisabled.

Validation

xcodebuild build-for-testing ... CODE_SIGNING_ALLOWED=NO   # ** TEST BUILD SUCCEEDED **
xcodebuild test-without-building ...                       # FULL suite: 1146 tests, 1141 passed, 0 failed, 5 skipped (gated evals)
swiftlint lint --quiet <changed files>                     # exit 0
xcodegen generate                                          # pbxproj additions only

New coverage: 11 cache-rule tests (type-through, rollback, never-reoffer-consumed, divergence, cross-field isolation, deepest-match precedence, expiry, capacity, bounded-tail matching); 4 optimistic-snapshot tests (UTF-16 caret advance incl. surrogate pairs, signature equality with an identical real publish, signature divergence under host transformation); 3 coordinator tests (restore reaches .ready with the engine stub that throws on any generation, proving zero model work; a speculative result with a stale generation applies exactly when the live content matches its signature and the exemption is single-use; stale results without the exemption still drop). One existing test updated to the new contract: a final-chunk accept now leaves the coordinator generating (speculatively) rather than idle.

The suggestion eval is unaffected by construction (it drives the engine directly, not the coordinator lifecycle); end-to-end behavior needs dogfooding, which the kill switches and the anchor-restore / speculative-generating / speculation-validated JSONL stages make directly observable.

Linked issues

Refs #546 (perceived latency: both changes remove round-trips from the felt path).

Risk / rollout notes

  • The speculation changes the stale-result guard in apply from generation-equality to generation-equality OR single-use signature match; the signature covers selection, preceding text, trailing text, and secure state, so an admitted result is current by content even when the generation counter moved underneath it.
  • A speculative generation that loses its bet is cancelled-or-discarded work the old code did not do; on slow machines this is bounded by one in-flight generation and the same work-id supersession as any keystroke.
  • Restores log anchor-restore with the shown text; a bad restore is diagnosable from one JSONL line and disappears with defaults write com.jacobfu.tabby cotabbyAnchorReuseDisabled -bool true.
  • Dogfood before release: watch speculative-generating vs speculation-validated ratio in the debug JSONL; persistent mismatches in a host mean that app transforms inserts and speculation should be disabled there (follow-up if observed).

Greptile Summary

This PR adds two latency-reduction features behind kill switches: a SuggestionAnchorCache that re-shows cached suggestions instantly on backspace rollback, type-through re-entry, and field return; and a speculative post-acceptance prefetch that starts the next generation against an optimistic post-insert snapshot instead of waiting for the host-publish poll. It also ships adaptive debounce, argmax-EOG early stop, per-token confidence gating with LlamaGenerationOutput, layout-estimated overlay drift suppression, and an always-on quality metrics counter store surfaced in the Performance pane.

  • Anchor reuse cache (SuggestionAnchorCache): 16-entry, 180s TTL, identityKey-isolated; match rule is liveTail == tail(anchorTail + fullText.prefix(k)) for k < fullText.count; invalidated sessions are snapshotted before teardown so backspace rollback recovers them.
  • Speculative prefetch (SpeculativeAcceptanceContext, dispatchSpeculativePostAcceptanceGeneration): builds an optimistic snapshot from the accepted insertion, fires immediately at delay 0, and uses a content-signature exemption in apply to admit the result when the real publish matches; divergence safely falls through to normal regen with work-ID supersession.
  • Quality metrics (SuggestionQualityMetricsStore): always-on counters for generated/shown/suppressed-by-reason/accepted, persisted to UserDefaults, displayed in the Performance pane with a Reset button.

Confidence Score: 4/5

Safe to merge with the quality-metrics counting fix addressed; both new lifecycle features are behind kill switches and the speculative path degrades gracefully on divergence.

The acceptance-rate metric in SuggestionQualityMetricsStore can exceed 1.0 because anchor-restored suggestions increment acceptedSuggestions (via recordSuggestionAcceptedIfFirstChunk) but never increment shown (no recordShown() call in restoreSuggestionFromAnchorCache). As anchor hits accumulate the displayed acceptance rate becomes meaningless — the precise diagnostic signal the quality pane is built around. Everything else — the anchor-match algorithm, the speculative signature lifecycle, the stale-result guard in apply, and the overlay stability gate change — is well-reasoned and well-tested.

SuggestionCoordinator+Prediction.swift — specifically the restoreSuggestionFromAnchorCache function (missing recordShown call) and the dispatchGeneration error-path cleanup for pendingSpeculativeSignature.

Important Files Changed

Filename Overview
Cotabby/App/Coordinators/SuggestionCoordinator+Prediction.swift Adds anchor-cache restore path and speculative post-acceptance generation; anchor restore skips recordShown() causing acceptance-rate inflation, and pendingSpeculativeSignature is not cleared in the engine-error branch of dispatchGeneration.
Cotabby/Support/SuggestionAnchorCache.swift New bounded LRU-style cache (16 entries, 180s TTL) for instant suggestion re-show; match logic is correct for typical inputs but first-match return within an entry is theoretically unsound for prefix-tail-length-width preceding text with repeated patterns.
Cotabby/Models/SuggestionQualityMetricsStore.swift New always-on quality counters persisted to UserDefaults; counters are consistent across router, coordinator, and engine paths except that anchor-restored suggestions are not counted in shown but their acceptances are counted in acceptedSuggestions.
Cotabby/App/Coordinators/SuggestionCoordinator+Acceptance.swift Wires dispatchSpeculativePostAcceptanceGeneration into the final-chunk accept path; correctly records invalidated sessions into the anchor cache before teardown; layoutEstimated overlay skip-slide logic is sound.
Cotabby/App/Coordinators/SuggestionCoordinator+Input.swift Adds speculation-validation early-return in the publish poll; correctly keeps pendingSpeculativeSignature alive for apply() to consume while clearing it on any divergence before rescheduling.
Cotabby/Support/SpeculativeAcceptanceContext.swift New utility that builds the optimistic post-insertion snapshot; correctly advances precedingText and selection by UTF-16 length; well-tested for surrogate pairs and autocorrect divergence.
Cotabby/Support/DebouncePolicy.swift New adaptive debounce policy keyed on last-generation latency; three thresholds (15/25/55ms) are well-tested and fall back to the configured value without prior latency data.
Cotabby/Services/Runtime/LlamaRuntimeCore.swift Adds argmax-EOG early stop and LlamaGenerationOutput struct wrapping text + logprob confidence signals; confidence-suppressed completions now carry the real suppression reason instead of appearing as empty output.
Cotabby/Support/SuggestionOverlayStabilityGate.swift Skips caret-drift re-anchor check for layout-estimated geometry, preventing the post-accept jerk on TextKit-mirror hosts; frame/text/field changes still force re-anchor correctly.

Sequence Diagram

sequenceDiagram
    participant User
    participant Coordinator
    participant AnchorCache
    participant WorkController
    participant Engine
    participant HostPublishPoll

    Note over User,HostPublishPoll: Anchor Reuse Cache path
    User->>Coordinator: keystroke / focus return
    Coordinator->>AnchorCache: remainder(identityKey, precedingText)
    AnchorCache-->>Coordinator: remainder (cache hit)
    Coordinator->>Coordinator: restoreSuggestionFromAnchorCache()
    Coordinator-->>User: overlay shown instantly (no model round-trip)

    Note over User,HostPublishPoll: Speculative Post-Acceptance Prefetch path
    User->>Coordinator: Tab (accept final chunk)
    Coordinator->>Coordinator: armPostExhaustionAcceptance()
    Coordinator->>Coordinator: dispatchSpeculativePostAcceptanceGeneration()
    Coordinator->>AnchorCache: record(invalidated session)
    Coordinator->>WorkController: replaceDebouncedWork(delay:0)
    WorkController->>Engine: generateSuggestion(optimisticSnapshot)
    Note right of Coordinator: pendingSpeculativeSignature = contentSig

    HostPublishPoll->>Coordinator: reconcileHostPublish(baseline)
    alt Host published exactly the speculated text
        Coordinator->>Coordinator: speculation-validated, stand down
    else Host diverged (autocorrect / IME)
        Coordinator->>Coordinator: "pendingSpeculativeSignature = nil"
        Coordinator->>WorkController: schedulePrediction()
    end

    Engine-->>Coordinator: apply(result)
    alt isPaidOffSpeculation (sig matches)
        Coordinator-->>User: overlay shown (speculation won)
    else stale generation
        Coordinator->>Coordinator: discard result
    end
Loading

Fix All in Codex Fix All in Claude Code

Reviews (3): Last reviewed commit: "review: keep apply within the complexity..." | Re-trigger Greptile

Comment thread Cotabby/Support/SuggestionAnchorCache.swift
Comment thread Cotabby/App/Coordinators/SuggestionCoordinator+Prediction.swift
@FuJacob FuJacob force-pushed the quality/responsive-lifecycle branch 2 times, most recently from c177719 to 851b720 Compare June 12, 2026 03:27
@FuJacob FuJacob force-pushed the quality/decode-gates-and-telemetry branch 3 times, most recently from 8f36e25 to 98ca097 Compare June 12, 2026 03:57
FuJacob added 3 commits June 11, 2026 21:59
…-entry

Bounded, string-only memory of recent suggestions (16 entries, 180s expiry).
One match rule covers the three common editing moments that previously paid
debounce plus a full model round-trip: deleting a typo back onto a covered
position, retyping the suggested characters after an unrelated invalidation,
and returning to an unchanged field. The live preceding-text tail must equal
a cached anchor's tail plus the first k characters of its suggestion, k
strictly short of the whole text so a fully accepted suggestion never
re-offers its own tail.
…d start the post-acceptance generation speculatively

Two lifecycle changes that remove model round-trips from the visible path:

Anchor restore: the prediction cycle consults the cache after the typo gate;
a hit re-shows instantly with zero engine work, after re-checking only the
guards that depend on current field state (trailing duplication, stale-accept
echo, selection, secure). The restored text is a suffix of a suggestion that
already passed the normalizer and seam guard on this exact text path.
cotabbyAnchorReuseDisabled is the kill switch.

Speculative post-acceptance prefetch: accepting the final chunk previously
idled through the host-publish poll (10-400ms) before the next generation
could even start, papered over by the 0.8s post-exhaustion Tab window.
Cotabby knows exactly what it just typed, so it now starts the continuation
immediately against the expected post-insert snapshot. The publish poll
becomes the validator: a publish matching the speculation's content
signature stands down and lets the in-flight result land (apply admits it
via the same signature, single-use); any divergence (autocorrect, IME
transformation, a sliding context window) schedules the normal regeneration,
whose newer work id retires the speculation automatically. Wrong speculation
costs one discarded generation; right speculation removes the publish wait
plus a debounce from the gap. cotabbySpeculativePrefetchDisabled is the kill
switch.
…tion-free anchor matching

The speculative post-acceptance generation now applies the same
pre-generation gates as the normal path, without their UI side effects:
it declines on too-little text and on a trailing typo when suppression
is enabled, leaving correction semantics to the post-publish
regeneration. It also consumes the pinned clipboard verdict instead of
re-filtering against the optimistic prefix, which could flip the
verdict and rewrite the prompt head mid-session against the KV prefix
continuity the pin exists to protect. The anchor match scan now builds
its character buffers once per entry and compares slices, instead of
allocating a candidate string per position per entry.
@FuJacob FuJacob force-pushed the quality/responsive-lifecycle branch from 851b720 to 00f8073 Compare June 12, 2026 05:01
@FuJacob FuJacob changed the base branch from quality/decode-gates-and-telemetry to main June 12, 2026 05:03
@FuJacob FuJacob merged commit 0d3177f into main Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant