Responsive lifecycle: anchor reuse cache + speculative post-acceptance prefetch#689
Merged
Conversation
c177719 to
851b720
Compare
8f36e25 to
98ca097
Compare
…-entry Bounded, string-only memory of recent suggestions (16 entries, 180s expiry). One match rule covers the three common editing moments that previously paid debounce plus a full model round-trip: deleting a typo back onto a covered position, retyping the suggested characters after an unrelated invalidation, and returning to an unchanged field. The live preceding-text tail must equal a cached anchor's tail plus the first k characters of its suggestion, k strictly short of the whole text so a fully accepted suggestion never re-offers its own tail.
…d start the post-acceptance generation speculatively Two lifecycle changes that remove model round-trips from the visible path: Anchor restore: the prediction cycle consults the cache after the typo gate; a hit re-shows instantly with zero engine work, after re-checking only the guards that depend on current field state (trailing duplication, stale-accept echo, selection, secure). The restored text is a suffix of a suggestion that already passed the normalizer and seam guard on this exact text path. cotabbyAnchorReuseDisabled is the kill switch. Speculative post-acceptance prefetch: accepting the final chunk previously idled through the host-publish poll (10-400ms) before the next generation could even start, papered over by the 0.8s post-exhaustion Tab window. Cotabby knows exactly what it just typed, so it now starts the continuation immediately against the expected post-insert snapshot. The publish poll becomes the validator: a publish matching the speculation's content signature stands down and lets the in-flight result land (apply admits it via the same signature, single-use); any divergence (autocorrect, IME transformation, a sliding context window) schedules the normal regeneration, whose newer work id retires the speculation automatically. Wrong speculation costs one discarded generation; right speculation removes the publish wait plus a debounce from the gap. cotabbySpeculativePrefetchDisabled is the kill switch.
…tion-free anchor matching The speculative post-acceptance generation now applies the same pre-generation gates as the normal path, without their UI side effects: it declines on too-little text and on a trailing typo when suppression is enabled, leaving correction semantics to the post-publish regeneration. It also consumes the pinned clipboard verdict instead of re-filtering against the optimistic prefix, which could flip the verdict and rewrite the prompt head mid-session against the KV prefix continuity the pin exists to protect. The anchor match scan now builds its character buffers once per entry and compares slices, instead of allocating a candidate string per position per entry.
851b720 to
00f8073
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacked on #688; the final part of the quality stack. Two lifecycle changes that remove model round-trips from the moments users feel most, both behind kill switches:
SuggestionAnchorCache). When the live preceding-text tail equals a cached anchor's tail plus the first k characters of its suggestion (k strictly short of the whole text, so an accepted suggestion never re-offers its own tail), the prediction cycle re-shows the remainder instantly: no debounce, no generation. One match rule covers backspace rollback onto a covered position, retyping suggested characters after an unrelated invalidation, and returning to an unchanged field. Restores re-check only the guards that depend on current field state (trailing duplication, stale-accept echo, selection, secure); the text itself is a suffix of a suggestion that already passed the normalizer and seam guard on this exact path. Suggestions are recorded at display time and again at invalidation (the dying session is exactly what a rollback wants back); corrections are never cached. Kill switch:cotabbyAnchorReuseDisabled.SpeculativeAcceptanceContext). The existing publish poll becomes the validator: a publish whose content signature matches the speculation stands down and lets the in-flight result land (applyadmits it through the stale-generation guard via the same signature, single-use); any divergence (autocorrect, IME transformation, a sliding context window) schedules the normal regeneration, whose newer work id retires the speculation automatically. Wrong speculation costs one discarded background generation; right speculation removes the publish wait plus a debounce from the visible gap. Kill switch:cotabbySpeculativePrefetchDisabled.Validation
New coverage: 11 cache-rule tests (type-through, rollback, never-reoffer-consumed, divergence, cross-field isolation, deepest-match precedence, expiry, capacity, bounded-tail matching); 4 optimistic-snapshot tests (UTF-16 caret advance incl. surrogate pairs, signature equality with an identical real publish, signature divergence under host transformation); 3 coordinator tests (restore reaches
.readywith the engine stub that throws on any generation, proving zero model work; a speculative result with a stale generation applies exactly when the live content matches its signature and the exemption is single-use; stale results without the exemption still drop). One existing test updated to the new contract: a final-chunk accept now leaves the coordinatorgenerating(speculatively) rather thanidle.The suggestion eval is unaffected by construction (it drives the engine directly, not the coordinator lifecycle); end-to-end behavior needs dogfooding, which the kill switches and the
anchor-restore/speculative-generating/speculation-validatedJSONL stages make directly observable.Linked issues
Refs #546 (perceived latency: both changes remove round-trips from the felt path).
Risk / rollout notes
applyfrom generation-equality to generation-equality OR single-use signature match; the signature covers selection, preceding text, trailing text, and secure state, so an admitted result is current by content even when the generation counter moved underneath it.anchor-restorewith the shown text; a bad restore is diagnosable from one JSONL line and disappears withdefaults write com.jacobfu.tabby cotabbyAnchorReuseDisabled -bool true.speculative-generatingvsspeculation-validatedratio in the debug JSONL; persistent mismatches in a host mean that app transforms inserts and speculation should be disabled there (follow-up if observed).Greptile Summary
This PR adds two latency-reduction features behind kill switches: a
SuggestionAnchorCachethat re-shows cached suggestions instantly on backspace rollback, type-through re-entry, and field return; and a speculative post-acceptance prefetch that starts the next generation against an optimistic post-insert snapshot instead of waiting for the host-publish poll. It also ships adaptive debounce, argmax-EOG early stop, per-token confidence gating withLlamaGenerationOutput, layout-estimated overlay drift suppression, and an always-on quality metrics counter store surfaced in the Performance pane.SuggestionAnchorCache): 16-entry, 180s TTL,identityKey-isolated; match rule isliveTail == tail(anchorTail + fullText.prefix(k))fork < fullText.count; invalidated sessions are snapshotted before teardown so backspace rollback recovers them.SpeculativeAcceptanceContext,dispatchSpeculativePostAcceptanceGeneration): builds an optimistic snapshot from the accepted insertion, fires immediately at delay 0, and uses a content-signature exemption inapplyto admit the result when the real publish matches; divergence safely falls through to normal regen with work-ID supersession.SuggestionQualityMetricsStore): always-on counters for generated/shown/suppressed-by-reason/accepted, persisted toUserDefaults, displayed in the Performance pane with a Reset button.Confidence Score: 4/5
Safe to merge with the quality-metrics counting fix addressed; both new lifecycle features are behind kill switches and the speculative path degrades gracefully on divergence.
The acceptance-rate metric in SuggestionQualityMetricsStore can exceed 1.0 because anchor-restored suggestions increment acceptedSuggestions (via recordSuggestionAcceptedIfFirstChunk) but never increment shown (no recordShown() call in restoreSuggestionFromAnchorCache). As anchor hits accumulate the displayed acceptance rate becomes meaningless — the precise diagnostic signal the quality pane is built around. Everything else — the anchor-match algorithm, the speculative signature lifecycle, the stale-result guard in apply, and the overlay stability gate change — is well-reasoned and well-tested.
SuggestionCoordinator+Prediction.swift — specifically the restoreSuggestionFromAnchorCache function (missing recordShown call) and the dispatchGeneration error-path cleanup for pendingSpeculativeSignature.
Important Files Changed
Sequence Diagram
sequenceDiagram participant User participant Coordinator participant AnchorCache participant WorkController participant Engine participant HostPublishPoll Note over User,HostPublishPoll: Anchor Reuse Cache path User->>Coordinator: keystroke / focus return Coordinator->>AnchorCache: remainder(identityKey, precedingText) AnchorCache-->>Coordinator: remainder (cache hit) Coordinator->>Coordinator: restoreSuggestionFromAnchorCache() Coordinator-->>User: overlay shown instantly (no model round-trip) Note over User,HostPublishPoll: Speculative Post-Acceptance Prefetch path User->>Coordinator: Tab (accept final chunk) Coordinator->>Coordinator: armPostExhaustionAcceptance() Coordinator->>Coordinator: dispatchSpeculativePostAcceptanceGeneration() Coordinator->>AnchorCache: record(invalidated session) Coordinator->>WorkController: replaceDebouncedWork(delay:0) WorkController->>Engine: generateSuggestion(optimisticSnapshot) Note right of Coordinator: pendingSpeculativeSignature = contentSig HostPublishPoll->>Coordinator: reconcileHostPublish(baseline) alt Host published exactly the speculated text Coordinator->>Coordinator: speculation-validated, stand down else Host diverged (autocorrect / IME) Coordinator->>Coordinator: "pendingSpeculativeSignature = nil" Coordinator->>WorkController: schedulePrediction() end Engine-->>Coordinator: apply(result) alt isPaidOffSpeculation (sig matches) Coordinator-->>User: overlay shown (speculation won) else stale generation Coordinator->>Coordinator: discard result endReviews (3): Last reviewed commit: "review: keep apply within the complexity..." | Re-trigger Greptile