test: integration tests for the NL→SQL query pipeline by kosminus · Pull Request #19 · kosminus/querywise

kosminus · 2026-06-09T19:51:18Z

What

Adds integration tests for the core NL→SQL query pipeline (execute_nl_query / execute_raw_sql), exercising the real composer, validator, error-handler, and interpreter agents end-to-end.

Why

The query pipeline — the product's core loop — had no test coverage: the retry logic, safety gates, policy enforcement, and audit/cost recording could all regress silently on refactor. These tests pin that behavior while staying fast and dependency-free (no live DB or LLM), consistent with the existing suite.

Changes

New backend/tests/test_query_pipeline.py (10 tests, no new dependencies):
- Test doubles at the process edges only: a scripted FakeLLMProvider (records every prompt), a scripted FakeConnector, a minimal FakeSession, and a pre-built BuiltContext — everything in between runs real code, including prompt templates, JSON repair/parsing, policy masking/limits, and audit/cost writes.
- Happy path: results, summary/highlights, success history row, query.executed audit event, cost attribution, and connection limits passed to the connector.
- Prompt integration: the composer prompt contains the assembled semantic context and the question.
- Validation repair: non-SELECT SQL is rejected and the error handler's corrected SQL is the only SQL executed (retry_count == 1).
- Retry bound: unsafe SQL that survives 3 repair attempts is refused with 403, audited as query.blocked, and never reaches the database; a handler that gives up yields 422.
- Execution-error loop: the handler sees the DB error message, corrected SQL succeeds on retry; exhausting all retries records an error history row and raises.
- Policy integration: masked columns are redacted in both the response and the interpreter LLM prompt; policy row/timeout caps override looser connection limits.
- Raw SQL: DML is blocked + audited; the happy path executes without LLM retry.

🤖 Generated with Claude Code

Run execute_nl_query / execute_raw_sql end-to-end through the real agents (composer, validator, error handler, interpreter), prompt plumbing, policy masking/limits, and audit/cost recording — faking only the process edges (scripted LLM provider, scripted connector, fake DB session, pre-built semantic context). Covers the happy path, validation repair, the 3-retry bound with unsafe SQL blocked + audited, execution-error retries, retry exhaustion, policy column masking reaching neither the response nor the interpreter prompt, and raw-SQL safety blocking. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

kosminus merged commit caba258 into main Jun 9, 2026
2 checks passed

kosminus deleted the test/query-pipeline-integration branch June 9, 2026 19:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: integration tests for the NL→SQL query pipeline#19

test: integration tests for the NL→SQL query pipeline#19
kosminus merged 1 commit into
mainfrom
test/query-pipeline-integration

kosminus commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kosminus commented Jun 9, 2026

What

Why

Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant