Context relevance: surface conditioning, token-true budgeting, wider prefix, OCR correction#686
Merged
Merged
Conversation
b0e6238 to
f06caeb
Compare
OCR text conditions the prompt, and the downstream hygiene filters can only drop garbled lines, not repair them. Language correction cuts the garbling at the source; the capture is once per focused field, so the extra Vision work is off every hot path.
The base model previously received bare prefix text: on Mail, Slack, or Docs it had no idea what surface it was continuing, so completions read generic. The prompt preface now states the surface (An email being written in Mail.), the window title (the subject, document name, or channel is the highest-signal cue Accessibility offers), the web domain, and the field placeholder, all sanitized and length-capped. Code editors and terminals are deliberately excluded: app metadata biases small base models toward code and numbers exactly where the text already makes the language obvious. The Foundation Models prompt states the same sanitized facts. Capture is one Accessibility read per field session, cached and frozen for the session so the prompt bytes ahead of the prefix stay stable and llama KV prefix reuse keeps absorbing them; a retitling browser tab cannot thrash the cache. Secure fields are never probed. Classification moves to a shared AppSurfaceClassifier so both engines agree about what kind of app the user is in. New Include App Context toggle (default on, indexed for settings search); everything stays on device.
…window The token-aware section allocator existed but nothing called it; the shipped path budgeted 2400 characters flat, which misjudges code, CJK, and punctuation-heavy text. The factory now passes a budget derived from the runtime's per-sequence context window (2048) minus the output ceiling and a safety margin, with per-section character caps retained as a second bound. With token-true budgeting in place, the llama prefix window rises from 1000 characters / 50 words to Foundation Models parity (2500 / 150). The old cap predates KV prefix reuse: prefill for a larger window is now paid once per focused field rather than per keystroke, and the extra preceding sentences carry the topic and voice that multi-paragraph email and docs continuations need. New long-document eval cases show completions correctly referencing content 1500+ characters before the caret at 344-734ms cold start, well inside the existing p95.
…test teardown Stored-property @mainactor classes deallocated inside app-hosted tests double-free without an explicitly nonisolated deinit; FieldStyleCache carries the same workaround. Surfaced by the live resolver tests once this branch rebased onto them.
d2ea9a1 to
c351bdf
Compare
…udget, FM placeholder parity The lowercased hasSuffix paired with an original-string dropLast count could clip the wrong amount for characters that expand under case folding; the strip now uses an anchored backwards case-insensitive range. The 1934 token budget is now derived from LlamaRuntimeConfiguration.default so a context-window change cannot silently desynchronize it, with the output ceiling and safety margin as named constants. The FM prompt now states the field placeholder exactly like the llama preface, and the prefix-window comment states the real latency contract on trim-rejecting catalog models instead of assuming reuse.
c351bdf to
5623a71
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacked on #683 (the eval harness measures everything here). Three changes to what the model sees:
An email being written in Mail.), the sanitized window title (the email subject / document name / channel is the highest-signal cue Accessibility offers), the web domain, and the field placeholder. Code editors and terminals are deliberately excluded (app metadata biases small base models toward code/numbers exactly where the text already makes that obvious), as are anonymous generic apps. The FM prompt states the same sanitized facts. Capture is one AX read per field session, cached and frozen for the session so the prompt bytes ahead of the prefix stay byte-stable and llama KV prefix reuse keeps absorbing them. Secure fields are never probed. New "Include App Context" toggle, default on, settings-search indexed; everything stays on device.Bundle classification moves to a shared
AppSurfaceClassifierso the FM tone hints and the new surface preface can never disagree (FM behavior unchanged, pinned by tests).Validation
Eval (Gemma E2B Q6_K, fixed seed, same dataset as the #683 baseline):
Linked issues
Refs #660 (Google Docs quality: browser surfaces now carry domain + document title context).
Risk / rollout notes
cotabbySurfaceContextEnabled(default on). Snapshot/data/store all migrated with write-back; fresh installs and upgrades both resolve to on.SuggestionConfiguration.standard.Greptile Summary
This PR adds three coordinated improvements to what the local completion model sees: a surface-conditioning preface that tells the model which app, window title, domain, and field placeholder the user is writing in; a token-true prompt budget derived from the runtime's actual KV capacity; and a wider prefix window (1000 → 2500 chars) that amortises its prefill cost via KV prefix reuse.
AppSurfaceClassifier,SurfaceContextComposer,SurfaceContextCache) introduces a shared classifier so the llama preface and the FM tone hint always agree about the current app class, with code editors and terminals explicitly suppressed to avoid biasing base models. Metadata is captured once per field session, frozen for its lifetime to keep prompt bytes byte-stable, and never probed on secure fields.SuggestionModels,SuggestionRequestFactory) wires up the previously zero-callertokenBudgetpath by deriving the budget fromLlamaRuntimeConfiguration.default.contextWindowTokensat compile time, preventing the budget from silently drifting if the context window constant changes.ScreenTextExtractor) enablesusesLanguageCorrection = truefor the once-per-field visual context capture, cutting garbled recognitions at the source rather than relying on downstream filters.Confidence Score: 5/5
Safe to merge — all new behaviour is gated behind the defaulted-on isSurfaceContextEnabled toggle, AX reads are cached per field session, secure fields are never probed, and 1119 tests pass with zero failures.
The surface-conditioning, token-budget, and OCR-correction changes are well-isolated: surface metadata is frozen per field session and sanitized before reaching any prompt, the token budget is now derived from the runtime constant so it cannot drift silently, and the wider prefix window is bounded by the same budget. The previously flagged Unicode suffix-strip and fieldPlaceholder parity issues are both resolved. The only remaining note is a doc-comment naming mismatch on registrableDomain which does not affect runtime behaviour.
No files require special attention.
Important Files Changed
registrableDomainname implies eTLD+1 but retains full subdomain intentionally.Reviews (3): Last reviewed commit: "ci: retrigger checks after force-push sy..." | Re-trigger Greptile