feat(settings): make token-by-token streaming reveal opt-in (default off)#692
Merged
Conversation
…off) PR #687 added streaming ghost text that reveals a suggestion token-by-token as the model decodes. Some users read the incremental reveal as the suggestion "coming out character by character" and prefer it to appear once, fully formed. Add a "Stream Suggestions While Generating" toggle (Appearance > Display), defaulting off. When off, the prediction path passes no onPartial handler, so the engine skips its per-token main-actor hops and the suggestion is presented once through apply(). When on, the existing streamed-partial behavior (each partial an acceptable session you can Tab into early) is preserved.
…-by-word Address Greptile P2: LLM decoding is token-by-token (sub-word fragments), so the toggle copy should not promise word granularity.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Suggestions were appearing token-by-token (read as "character by character") because PR #687 streams ghost text live as the model decodes. This adds a "Stream Suggestions While Generating" toggle (Appearance → Display), defaulting off, so suggestions appear once, fully formed, after generation finishes. Power users can opt back into the live streaming reveal.
The gate is at the prediction dispatch: when streaming is off, no
onPartialhandler is passed to the engine, so the engine skips its per-token main-actor hops entirely and the suggestion is presented once throughapply(). When on, the existing streamed-partial behavior (each partial rendered as an acceptable session you can Tab into early) is preserved unchanged.Validation
UI: new toggle in Appearance → Display, under "Suggestion Display".
Linked issues
Risk / rollout notes
cotabbyStreamSuggestionsWhileGenerating, defaults to false. This is a behavior change vs. currentmain(perf(stream): render ghost text while the model is still decoding #687 streamed by default) — that's the intent: revert the default to all-at-once and make streaming opt-in.@Published/setter/snapshot/Combine publisher → snapshot struct → Appearance UI toggle → search index). The new Combine upstream is grouped into the existing acceptance-toggle slot (CombineLatest3) to stay under Combine's four-input cap.queueStreamedPartial/applyStreamedPartial/StreamedGhostTextPolicy) is untouched and fully exercised when the toggle is on; the gate only decides whetheronPartialis wired up.Greptile Summary
This PR makes token-by-token suggestion streaming opt-in (defaulting off), reversing the behavior introduced in #687. A new "Stream Suggestions While Generating" toggle is added to Appearance → Display, and the gate is implemented by conditionally omitting
onPartialfrom the engine call.streamSuggestionsWhileGeneratingis threaded through all layers —SuggestionSettingsData,SuggestionSettingsStore(UserDefaults keycotabbyStreamSuggestionsWhileGenerating, defaultfalse),SuggestionSettingsModel(@Published+ setter + Combine publisher viaCombineLatest3), andSuggestionSettingsSnapshot.dispatchGenerationreads the flag from the snapshot on the main actor before creating the work closure, capturing a plainBoolso the streaming decision is stable for the duration of a single generation.SettingsIndexcase with comprehensive search keywords for discoverability.Confidence Score: 5/5
Safe to merge. The change is purely additive — a new opt-in toggle that defaults off — and the existing streaming code path is untouched.
The gate is read once on the main actor before a generation is dispatched, so the streaming decision is stable for the duration of any single generation and cannot change mid-flight. The new setting travels through every layer (data, store, model, snapshot, UI) in a pattern identical to adjacent settings. CombineLatest3 correctly replaces the prior CombineLatest without exceeding Combine's four-input cap, and the test fixture default matches the production default of false.
No files require special attention.
Important Files Changed
shouldStreamPartialsBool captured from the snapshot on the main actor before dispatching work; wires it into theonPartialhandler decision. Clean gate that preserves the streaming path fully when on.streamSuggestionsWhileGenerating: BooltoSuggestionSettingsSnapshot. Well-documented and consistent with adjacent fields.streamSuggestionsWhileGenerating: Boolto the durable data value type. Single initialization site updated inSuggestionSettingsStore; no default needed.@Publishedproperty through init, snapshot construction, setter, and Combine publisher (replacingCombineLatestwithCombineLatest3to stay within the four-input cap). Correct and consistent with the existing pattern.streamWhileGeneratingDefaultsKey, load/resolve, and save logic. UserDefaults keycotabbyStreamSuggestionsWhileGeneratingdefaults to false. Consistent with adjacent settings.streamWhileGeneratingBindingand its Toggle into the Display section. Description text says "token-by-token" which is accurate.streamWhileGeneratingcase with correct pane routing, icon, and search keywords. Thorough keyword set for discoverability.streamSuggestionsWhileGenerating: Bool = falsedefault parameter to the snapshot fixture factory. Default matches production default.Sequence Diagram
sequenceDiagram participant User participant AppearancePaneView participant SuggestionSettingsModel participant SuggestionSettingsStore participant SuggestionCoordinator participant SuggestionEngine User->>AppearancePaneView: Toggle "Stream Suggestions While Generating" AppearancePaneView->>SuggestionSettingsModel: setStreamSuggestionsWhileGenerating(enabled) SuggestionSettingsModel->>SuggestionSettingsStore: saveStreamSuggestionsWhileGenerating(enabled) SuggestionSettingsStore->>SuggestionSettingsStore: userDefaults.set(enabled, forKey:) SuggestionSettingsModel->>SuggestionSettingsModel: Combine publisher fires → snapshot updated Note over SuggestionCoordinator: On next keystroke / focus event SuggestionCoordinator->>SuggestionCoordinator: dispatchGeneration() reads settingsSnapshot.streamSuggestionsWhileGenerating alt "streamSuggestionsWhileGenerating == true" SuggestionCoordinator->>SuggestionEngine: generateSuggestion(onPartial: queueStreamedPartial) SuggestionEngine-->>SuggestionCoordinator: onPartial(partial) per token SuggestionCoordinator->>SuggestionCoordinator: queueStreamedPartial → render ghost text live SuggestionEngine-->>SuggestionCoordinator: final result SuggestionCoordinator->>SuggestionCoordinator: apply(result) else "streamSuggestionsWhileGenerating == false (default)" SuggestionCoordinator->>SuggestionEngine: generateSuggestion(onPartial: nil) SuggestionEngine-->>SuggestionCoordinator: final result (no per-token hops) SuggestionCoordinator->>SuggestionCoordinator: apply(result) — one-shot reveal endReviews (2): Last reviewed commit: "docs(settings): describe streaming revea..." | Re-trigger Greptile