Skip to content

test: integration tests for the NL→SQL query pipeline#19

Merged
kosminus merged 1 commit into
mainfrom
test/query-pipeline-integration
Jun 9, 2026
Merged

test: integration tests for the NL→SQL query pipeline#19
kosminus merged 1 commit into
mainfrom
test/query-pipeline-integration

Conversation

@kosminus

@kosminus kosminus commented Jun 9, 2026

Copy link
Copy Markdown
Owner

What

Adds integration tests for the core NL→SQL query pipeline (execute_nl_query / execute_raw_sql), exercising the real composer, validator, error-handler, and interpreter agents end-to-end.

Why

The query pipeline — the product's core loop — had no test coverage: the retry logic, safety gates, policy enforcement, and audit/cost recording could all regress silently on refactor. These tests pin that behavior while staying fast and dependency-free (no live DB or LLM), consistent with the existing suite.

Changes

  • New backend/tests/test_query_pipeline.py (10 tests, no new dependencies):
    • Test doubles at the process edges only: a scripted FakeLLMProvider (records every prompt), a scripted FakeConnector, a minimal FakeSession, and a pre-built BuiltContext — everything in between runs real code, including prompt templates, JSON repair/parsing, policy masking/limits, and audit/cost writes.
    • Happy path: results, summary/highlights, success history row, query.executed audit event, cost attribution, and connection limits passed to the connector.
    • Prompt integration: the composer prompt contains the assembled semantic context and the question.
    • Validation repair: non-SELECT SQL is rejected and the error handler's corrected SQL is the only SQL executed (retry_count == 1).
    • Retry bound: unsafe SQL that survives 3 repair attempts is refused with 403, audited as query.blocked, and never reaches the database; a handler that gives up yields 422.
    • Execution-error loop: the handler sees the DB error message, corrected SQL succeeds on retry; exhausting all retries records an error history row and raises.
    • Policy integration: masked columns are redacted in both the response and the interpreter LLM prompt; policy row/timeout caps override looser connection limits.
    • Raw SQL: DML is blocked + audited; the happy path executes without LLM retry.

🤖 Generated with Claude Code

Run execute_nl_query / execute_raw_sql end-to-end through the real
agents (composer, validator, error handler, interpreter), prompt
plumbing, policy masking/limits, and audit/cost recording — faking only
the process edges (scripted LLM provider, scripted connector, fake DB
session, pre-built semantic context).

Covers the happy path, validation repair, the 3-retry bound with
unsafe SQL blocked + audited, execution-error retries, retry
exhaustion, policy column masking reaching neither the response nor
the interpreter prompt, and raw-SQL safety blocking.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@kosminus kosminus merged commit caba258 into main Jun 9, 2026
2 checks passed
@kosminus kosminus deleted the test/query-pipeline-integration branch June 9, 2026 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant