Skip to content

feat: schema-aware greetings via a QueryChatGreeter API#261

Open
gadenbuie wants to merge 22 commits into
mainfrom
feat/greeting-generator
Open

feat: schema-aware greetings via a QueryChatGreeter API#261
gadenbuie wants to merge 22 commits into
mainfrom
feat/greeting-generator

Conversation

@gadenbuie

Copy link
Copy Markdown
Contributor

Supersedes the constructor-based approach in #260 (kept open for discussion). Credit to @cpsievert for the regression insight and test infrastructure.

Summary

After multi-table support (#195), the schema moved out of the system prompt into a lazy tool_get_schema tool. Greetings are generated by a tool-free client, so the model became schema-blind when writing the opening message — it could no longer describe the data it was about to chat about.

This PR fixes that with a dedicated greeting API rather than a constructor parameter. The greeting-specific concern lives on a new semi-internal QueryChatGreeter, reached via qc$greeter (R) / qc.greeter (Python), instead of widening the QueryChat constructor for every single-table user.

Key design decisions:

  • QueryChatGreeter holds $tables (table names whose schema to embed in the greeting) and $prompt (the greeting template). It is semi-internal — accessed through qc.greeter, not separately constructed (R class is @noRd). Setters store only; changing greeter config never invalidates an existing qc$greeting, so a user-supplied static greeting always survives.
  • Separate greeting system prompt (greeting.md) instead of the full prompt.md. The greeting client no longer receives SQL guidelines or tool descriptions — just the schema for the selected tables, the data description, and suggestion-card syntax. A single private build_greeting_client() is the source of truth for both greeter$generate() and the in-app greeting path.
  • Constructor tables are always included in the greeting; add_table(include_in_greeting=) / add_tables(include_in_greeting=) opt additional tables in (default off). add_tables accepts a logical/bool or a character vector / list[str] of names.
  • Greeting data dicts are scoped to the included tables: per-table entries are filtered, and table-less global dicts keep their description but drop cross-table relationships/glossary, so a curated greeting subset can't leak excluded-table metadata.
  • Greetings generate on a separate, throwaway client in every backend (Shiny, Streamlit, Gradio, Dash). The shared qc.greeting is written only by an explicit generate_greeting() / greeter.generate(); in-app greetings stay session-local.

The Python and R implementations mirror each other. generate_greeting() is kept as a thin wrapper over greeter.generate(), so existing callers need no changes.

Verification

Tables passed to the constructor are automatically included in the greeting. Tables added later with add_table() are not included by default — pass include_in_greeting = TRUE to opt them in.

R:

library(querychat)
qc <- QueryChat$new(mtcars, "mtcars")
# Constructor table is included in the greeting automatically
qc$greeter$tables
#> [1] "mtcars"

# add_table() does NOT include the table in the greeting by default
qc$add_table(flights, "flights")
qc$greeter$tables
#> [1] "mtcars"

# ...opt it in explicitly
qc$add_table(airports, "airports", include_in_greeting = TRUE)
qc$greeter$tables
#> [1] "mtcars"   "airports"

greeting <- qc$generate_greeting()  # builds a lean, schema-aware greeting client

Python:

from querychat import QueryChat
qc = QueryChat(data_source=df, table_name="mtcars")
qc.greeter.tables          # ['mtcars'] -- constructor table auto-included

# add_table() does NOT include the table in the greeting by default
qc.add_table(flights, "flights")
qc.greeter.tables          # ['mtcars']

# ...opt it in explicitly
qc.add_table(airports, "airports", include_in_greeting=True)
qc.greeter.tables          # ['mtcars', 'airports']

# add_tables() takes a list of names to include a subset
qc.add_tables(engine, include_in_greeting=["orders", "customers"])

qc.generate_greeting()     # schema-aware greeting on a separate client

Automated checks: make r-check (testthat OK) and make py-check (ruff clean, pyright 0 errors, 616 tests passed), both green.

gadenbuie added 22 commits June 24, 2026 10:21
Introduce a semi-internal QueryChatGreeter (R6) accessed via `qc$greeter`,
which generates the opening greeting from a separate, leaner greeting
system prompt (inst/prompts/greeting.md) rendered through the existing
QueryChatSystemPrompt infrastructure, scoped to `greeter$tables`.

- `qc$greeter$tables` / `$prompt` invalidate the cached greeting on set
- constructor tables are always included; `add_table(include_in_greeting=)`
  and `add_tables(include_in_greeting=)` opt additional tables in
- `$generate_greeting()` now delegates to `$greeter$generate()`
- mod_server streams the greeting via the shared build_greeting_client()

Fixes the schema-blind greeting regression from multi-table support (#195).
Reuses the regression insight and test infra from #260; the greeting is
generated on a separate client, so #260's GREETING_MARKER sentinel and
history filtering are intentionally not carried over.
Drop the fallback that re-included all tables when greeter$tables was
empty; an explicitly cleared selection now yields a table-less generic
greeting. Guard QueryChatSystemPrompt$render() to tolerate an empty data
source set (never reached on the main-prompt path).
Setting greeter$tables or greeter$prompt no longer nulls qc$greeting.
The constructor populates greeter$tables after setting the greeting, so
the old invalidation wiped a user-supplied constructor greeting. Config
changes now only affect the next $generate() call.
Mirror the R QueryChatGreeter API in Python:
- New QueryChatGreeter class (qc.greeter) holding tables/prompt; setters
  do not invalidate an existing greeting
- generate_greeting() delegates to greeter.generate()
- _build_greeting_client() builds a fresh client with a lean greeting
  system prompt (new prompts/greeting.md) over the greeter's tables
- add_table()/add_tables() gain include_in_greeting; constructor tables
  are always included
- QueryChatSystemPrompt.render() tolerates an empty data-source set
- Drop the dead GREETING_PROMPT history filter in AppState
Route every backend's greeting generation through a fresh client built
from _build_greeting_client() (lean greeting system prompt) instead of
the shared session client:
- AppState gains greeting_client_factory + build_greeting_client(); wired
  via create_app_state and reattached on state deserialization
- Streamlit, Gradio, Dash stream GREETING_PROMPT through the greeting
  client, then inject only the result via set_greeting (session-local)
- Shiny mod_server gains greeting_client_fn, wired from all call sites

In-app greetings stay session-local; only generate_greeting() writes the
shared greeting.
Reject non-logical, non-character include_in_greeting instead of
silently including no tables, which previously re-created the
schema-blind greeting symptom with no signal.
Reject types other than bool/str/list[str] with TypeError, and accept
a bare table-name string for parity with the R package (which accepts a
length-1 character vector). Previously a non-iterable raised an opaque
TypeError and a bare string silently iterated characters.
Reject non-logical include_in_greeting via check_bool instead of
silently ignoring it through isTRUE.
Reject non-bool include_in_greeting with TypeError instead of relying
on a truthiness check that silently accepted any non-empty value.
Drop data dicts that describe no included table before building the
greeting prompt, so a curated greeter$tables subset no longer carries
dict-level prose about excluded tables.
Drop data dicts that describe no included table before building the
greeting prompt, so a curated greeter.tables subset no longer carries
dict-level prose about excluded tables.
- Prune greeter$tables when a table is removed via remove_table, so it
  no longer keeps a stale name.
- Omit the table section (and avoid the doubled "SQL SQL" wording) from
  the greeting prompt when no tables are included, via a has_tables flag.
- Prune greeter.tables when a table is removed via remove_table, so it
  no longer keeps a stale name.
- Omit the table section (and avoid the doubled "SQL SQL" wording) from
  the greeting prompt when no tables are included, via a has_tables flag.
- Drop the redundant guard around the add_tables greeting update.
Hoist the include_in_greeting type check ahead of table normalization and
registration so a rejected value leaves the QueryChat instance unchanged,
rather than leaving tables half-registered after the error.
…ent spec

Two greeting-path fixes:

- add_tables() now validates include_in_greeting before normalizing and
  registering tables, so a rejected value leaves the instance unmutated.
- Shiny server(client=...) now threads the resolved client spec into the
  greeting client via _build_greeting_client(client_spec=...), so the
  greeting uses the same provider/model as the session client.
…ips/glossary

build_greeting_client() previously dropped any data dict whose tables did
not intersect the greeting subset, discarding table-less dicts that only
contribute global context. Now a table-less dict is kept for its dict-level
description, and the cross-table global fields (relationships, glossary) are
stripped from greeting dicts so a curated subset can't leak excluded-table
prose.
…dict scoping

Two greeting fixes mirroring the R changes and a Python-only backward-compat fix:

- get_display_messages() again hides the synthetic GREETING_PROMPT user turn.
  Older releases generated greetings on the shared client, so state they
  serialized still restores that turn; without the filter it surfaced as a
  visible user message after upgrade. New sessions never create it.
- _build_greeting_client() keeps a table-less dict for its dict-level
  description and strips relationships/glossary from greeting dicts, instead of
  dropping any dict that doesn't intersect the greeting subset.
…al dicts table-less

- $server(data_source=...) now registers the deferred table with
  include_in_greeting = TRUE, matching the constructor rule that primary data
  is always greeting-included. Without it the first greeting fell back to the
  generic no-tables prompt.
- render() no longer gates has_data_dicts on having a data source, and
  greeting.md renders data dicts independently of has_tables, so a global
  (table-less) dict description appears even in a generic zero-table greeting.
render() no longer gates has_data_dicts on having a data source, and
greeting.md renders data dicts independently of has_tables. A global
(table-less) dict description now appears even in a generic zero-table
greeting, completing the earlier greeting dict-scoping fix.
Drop the bare-`str` branch from `add_tables(include_in_greeting=...)`
so only `bool` or `list[str]` is accepted, and harden the
`QueryChatGreeter.tables` setter to raise on a bare string instead of
silently iterating it character-by-character.
@gadenbuie gadenbuie marked this pull request as ready for review June 24, 2026 16:58
@gadenbuie gadenbuie requested a review from cpsievert June 24, 2026 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant