feat: schema-aware greetings via a QueryChatGreeter API#261
Open
gadenbuie wants to merge 22 commits into
Open
Conversation
Introduce a semi-internal QueryChatGreeter (R6) accessed via `qc$greeter`, which generates the opening greeting from a separate, leaner greeting system prompt (inst/prompts/greeting.md) rendered through the existing QueryChatSystemPrompt infrastructure, scoped to `greeter$tables`. - `qc$greeter$tables` / `$prompt` invalidate the cached greeting on set - constructor tables are always included; `add_table(include_in_greeting=)` and `add_tables(include_in_greeting=)` opt additional tables in - `$generate_greeting()` now delegates to `$greeter$generate()` - mod_server streams the greeting via the shared build_greeting_client() Fixes the schema-blind greeting regression from multi-table support (#195). Reuses the regression insight and test infra from #260; the greeting is generated on a separate client, so #260's GREETING_MARKER sentinel and history filtering are intentionally not carried over.
Drop the fallback that re-included all tables when greeter$tables was empty; an explicitly cleared selection now yields a table-less generic greeting. Guard QueryChatSystemPrompt$render() to tolerate an empty data source set (never reached on the main-prompt path).
Setting greeter$tables or greeter$prompt no longer nulls qc$greeting. The constructor populates greeter$tables after setting the greeting, so the old invalidation wiped a user-supplied constructor greeting. Config changes now only affect the next $generate() call.
Mirror the R QueryChatGreeter API in Python: - New QueryChatGreeter class (qc.greeter) holding tables/prompt; setters do not invalidate an existing greeting - generate_greeting() delegates to greeter.generate() - _build_greeting_client() builds a fresh client with a lean greeting system prompt (new prompts/greeting.md) over the greeter's tables - add_table()/add_tables() gain include_in_greeting; constructor tables are always included - QueryChatSystemPrompt.render() tolerates an empty data-source set - Drop the dead GREETING_PROMPT history filter in AppState
Route every backend's greeting generation through a fresh client built from _build_greeting_client() (lean greeting system prompt) instead of the shared session client: - AppState gains greeting_client_factory + build_greeting_client(); wired via create_app_state and reattached on state deserialization - Streamlit, Gradio, Dash stream GREETING_PROMPT through the greeting client, then inject only the result via set_greeting (session-local) - Shiny mod_server gains greeting_client_fn, wired from all call sites In-app greetings stay session-local; only generate_greeting() writes the shared greeting.
Reject non-logical, non-character include_in_greeting instead of silently including no tables, which previously re-created the schema-blind greeting symptom with no signal.
Reject types other than bool/str/list[str] with TypeError, and accept a bare table-name string for parity with the R package (which accepts a length-1 character vector). Previously a non-iterable raised an opaque TypeError and a bare string silently iterated characters.
Reject non-logical include_in_greeting via check_bool instead of silently ignoring it through isTRUE.
Reject non-bool include_in_greeting with TypeError instead of relying on a truthiness check that silently accepted any non-empty value.
Drop data dicts that describe no included table before building the greeting prompt, so a curated greeter$tables subset no longer carries dict-level prose about excluded tables.
Drop data dicts that describe no included table before building the greeting prompt, so a curated greeter.tables subset no longer carries dict-level prose about excluded tables.
- Prune greeter$tables when a table is removed via remove_table, so it no longer keeps a stale name. - Omit the table section (and avoid the doubled "SQL SQL" wording) from the greeting prompt when no tables are included, via a has_tables flag.
- Prune greeter.tables when a table is removed via remove_table, so it no longer keeps a stale name. - Omit the table section (and avoid the doubled "SQL SQL" wording) from the greeting prompt when no tables are included, via a has_tables flag. - Drop the redundant guard around the add_tables greeting update.
Hoist the include_in_greeting type check ahead of table normalization and registration so a rejected value leaves the QueryChat instance unchanged, rather than leaving tables half-registered after the error.
…ent spec Two greeting-path fixes: - add_tables() now validates include_in_greeting before normalizing and registering tables, so a rejected value leaves the instance unmutated. - Shiny server(client=...) now threads the resolved client spec into the greeting client via _build_greeting_client(client_spec=...), so the greeting uses the same provider/model as the session client.
…ips/glossary build_greeting_client() previously dropped any data dict whose tables did not intersect the greeting subset, discarding table-less dicts that only contribute global context. Now a table-less dict is kept for its dict-level description, and the cross-table global fields (relationships, glossary) are stripped from greeting dicts so a curated subset can't leak excluded-table prose.
…dict scoping Two greeting fixes mirroring the R changes and a Python-only backward-compat fix: - get_display_messages() again hides the synthetic GREETING_PROMPT user turn. Older releases generated greetings on the shared client, so state they serialized still restores that turn; without the filter it surfaced as a visible user message after upgrade. New sessions never create it. - _build_greeting_client() keeps a table-less dict for its dict-level description and strips relationships/glossary from greeting dicts, instead of dropping any dict that doesn't intersect the greeting subset.
…al dicts table-less - $server(data_source=...) now registers the deferred table with include_in_greeting = TRUE, matching the constructor rule that primary data is always greeting-included. Without it the first greeting fell back to the generic no-tables prompt. - render() no longer gates has_data_dicts on having a data source, and greeting.md renders data dicts independently of has_tables, so a global (table-less) dict description appears even in a generic zero-table greeting.
render() no longer gates has_data_dicts on having a data source, and greeting.md renders data dicts independently of has_tables. A global (table-less) dict description now appears even in a generic zero-table greeting, completing the earlier greeting dict-scoping fix.
Drop the bare-`str` branch from `add_tables(include_in_greeting=...)` so only `bool` or `list[str]` is accepted, and harden the `QueryChatGreeter.tables` setter to raise on a bare string instead of silently iterating it character-by-character.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Supersedes the constructor-based approach in #260 (kept open for discussion). Credit to @cpsievert for the regression insight and test infrastructure.
Summary
After multi-table support (#195), the schema moved out of the system prompt into a lazy
tool_get_schematool. Greetings are generated by a tool-free client, so the model became schema-blind when writing the opening message — it could no longer describe the data it was about to chat about.This PR fixes that with a dedicated greeting API rather than a constructor parameter. The greeting-specific concern lives on a new semi-internal
QueryChatGreeter, reached viaqc$greeter(R) /qc.greeter(Python), instead of widening theQueryChatconstructor for every single-table user.Key design decisions:
QueryChatGreeterholds$tables(table names whose schema to embed in the greeting) and$prompt(the greeting template). It is semi-internal — accessed throughqc.greeter, not separately constructed (R class is@noRd). Setters store only; changing greeter config never invalidates an existingqc$greeting, so a user-supplied static greeting always survives.greeting.md) instead of the fullprompt.md. The greeting client no longer receives SQL guidelines or tool descriptions — just the schema for the selected tables, the data description, and suggestion-card syntax. A single privatebuild_greeting_client()is the source of truth for bothgreeter$generate()and the in-app greeting path.add_table(include_in_greeting=)/add_tables(include_in_greeting=)opt additional tables in (default off).add_tablesaccepts a logical/boolor a character vector /list[str]of names.qc.greetingis written only by an explicitgenerate_greeting()/greeter.generate(); in-app greetings stay session-local.The Python and R implementations mirror each other.
generate_greeting()is kept as a thin wrapper overgreeter.generate(), so existing callers need no changes.Verification
Tables passed to the constructor are automatically included in the greeting. Tables added later with
add_table()are not included by default — passinclude_in_greeting = TRUEto opt them in.R:
Python:
Automated checks:
make r-check(testthat OK) andmake py-check(ruff clean, pyright 0 errors, 616 tests passed), both green.