AutoSQL

Self-optimizing SQL query pipeline. Inspired by karpathy/autoresearch.

LLM rewrites query → run it → measure (speed + correctness) → keep if better → repeat

One file. Zero fluff.

The Problem

Slow SQL queries are everywhere. Correlated subqueries, redundant scans, missing pre-aggregations — they kill app performance and nobody catches them until prod is on fire.

AutoSQL fixes this automatically:

You give it a query
It runs the query, records the baseline time and result hash
An LLM rewrites the query trying to make it faster
It runs the new query, checks the result is identical, measures the speedup
Keeps the best version, feeds the history back into the next iteration
Repeats until the budget is spent

Quickstart

# 1. Install dependency
pip install groq

# 2. Get a free API key from https://console.groq.com
export GROQ_API_KEY=gsk_...

# 3. Optimize your query
python AutoSQL.py --query slow.sql --db myapp.sqlite

# Or inline
python AutoSQL.py --query "SELECT * FROM ..." --db myapp.sqlite --iterations 15

# Use a different model
python AutoSQL.py --query slow.sql --db myapp.sqlite --model llama-3.1-8b-instant

Example output

──────────────────────────────────────────────────────────────
  AutoSQL — Self-optimizing Query Pipeline
  Model  : llama-3.3-70b-versatile  (Groq)
──────────────────────────────────────────────────────────────

  Measuring baseline … 1842.3 ms

  [01/10] Generating … ✓  312.4 ms  (5.89× faster)  ← NEW BEST
  [02/10] Generating … ✓  298.1 ms  (6.18× faster)  ← NEW BEST
  [03/10] Generating … ✗  wrong result
  [04/10] Generating … ✓  291.7 ms  (6.32× faster)  ← NEW BEST
  ...

──────────────────────────────────────────────────────────────
  Final Report
──────────────────────────────────────────────────────────────
  Baseline : 1842.3 ms
  Best     : 291.7 ms  (6.32× faster)

What it optimizes

AutoSQL tackles the classic anti-patterns that kill query performance:

Anti-pattern	What AutoSQL does
Correlated subquery in `WHERE`	Replaces with pre-aggregated JOIN / CTE
Correlated subquery in `SELECT`	Replaces with pre-aggregated JOIN
Multiple passes over the same table	Collapses into a single scan

Correctness guarantee

AutoSQL never accepts a faster query that returns different results.
Every candidate is run and its output hashed. Only an exact match against the baseline hash counts as correct.

Outputs

After the run, autosql_log.json contains every attempt:

{
  "baseline_ms": 1842.3,
  "best_ms": 291.7,
  "speedup": 6.32,
  "iterations": [
    { "iteration": 1, "time_ms": 312.4, "speedup": 5.89, "correct": true, "query": "..." },
    ...
  ]
}

Options

Flag	Default	Description
`--query`	(required)	SQL string or `.sql` file path
`--db`	(required)	SQLite database path
`--iterations`	`10`	How many rewrites to attempt
`--model`	`llama-3.3-70b-versatile`	Groq model to use

Extending

AutoSQL intentionally stays minimal. Easy extension points:

Other databases — swap sqlite3 for psycopg2, duckdb, etc. Only run_query() changes.
Other metrics — add memory, I/O reads, or query plan cost alongside wall-clock time.
Indexes — let the LLM propose CREATE INDEX statements alongside the query rewrite.
Batch mode — feed a folder of .sql files, get a report for each.

Requirements

Python 3.10+
groq (pip install groq)
GROQ_API_KEY environment variable (free at console.groq.com)
SQLite database

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoSQL

The Problem

Quickstart

Example output

What it optimizes

Correctness guarantee

Outputs

Options

Extending

Requirements

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

AutoSQL

The Problem

Quickstart

Example output

What it optimizes

Correctness guarantee

Outputs

Options

Extending

Requirements