Skip to content

Latest commit

 

History

History
138 lines (96 loc) · 4.18 KB

File metadata and controls

138 lines (96 loc) · 4.18 KB

AutoSQL

Self-optimizing SQL query pipeline. Inspired by karpathy/autoresearch.

LLM rewrites query → run it → measure (speed + correctness) → keep if better → repeat

One file. Zero fluff.


The Problem

Slow SQL queries are everywhere. Correlated subqueries, redundant scans, missing pre-aggregations — they kill app performance and nobody catches them until prod is on fire.

AutoSQL fixes this automatically:

  1. You give it a query
  2. It runs the query, records the baseline time and result hash
  3. An LLM rewrites the query trying to make it faster
  4. It runs the new query, checks the result is identical, measures the speedup
  5. Keeps the best version, feeds the history back into the next iteration
  6. Repeats until the budget is spent

Quickstart

# 1. Install dependency
pip install groq

# 2. Get a free API key from https://console.groq.com
export GROQ_API_KEY=gsk_...

# 3. Optimize your query
python AutoSQL.py --query slow.sql --db myapp.sqlite

# Or inline
python AutoSQL.py --query "SELECT * FROM ..." --db myapp.sqlite --iterations 15

# Use a different model
python AutoSQL.py --query slow.sql --db myapp.sqlite --model llama-3.1-8b-instant

Example output

──────────────────────────────────────────────────────────────
  AutoSQL — Self-optimizing Query Pipeline
  Model  : llama-3.3-70b-versatile  (Groq)
──────────────────────────────────────────────────────────────

  Measuring baseline … 1842.3 ms

  [01/10] Generating … ✓  312.4 ms  (5.89× faster)  ← NEW BEST
  [02/10] Generating … ✓  298.1 ms  (6.18× faster)  ← NEW BEST
  [03/10] Generating … ✗  wrong result
  [04/10] Generating … ✓  291.7 ms  (6.32× faster)  ← NEW BEST
  ...

──────────────────────────────────────────────────────────────
  Final Report
──────────────────────────────────────────────────────────────
  Baseline : 1842.3 ms
  Best     : 291.7 ms  (6.32× faster)

What it optimizes

AutoSQL tackles the classic anti-patterns that kill query performance:

Anti-pattern What AutoSQL does
Correlated subquery in WHERE Replaces with pre-aggregated JOIN / CTE
Correlated subquery in SELECT Replaces with pre-aggregated JOIN
Multiple passes over the same table Collapses into a single scan

Correctness guarantee

AutoSQL never accepts a faster query that returns different results.
Every candidate is run and its output hashed. Only an exact match against the baseline hash counts as correct.


Outputs

After the run, autosql_log.json contains every attempt:

{
  "baseline_ms": 1842.3,
  "best_ms": 291.7,
  "speedup": 6.32,
  "iterations": [
    { "iteration": 1, "time_ms": 312.4, "speedup": 5.89, "correct": true, "query": "..." },
    ...
  ]
}

Options

Flag Default Description
--query (required) SQL string or .sql file path
--db (required) SQLite database path
--iterations 10 How many rewrites to attempt
--model llama-3.3-70b-versatile Groq model to use

Extending

AutoSQL intentionally stays minimal. Easy extension points:

  • Other databases — swap sqlite3 for psycopg2, duckdb, etc. Only run_query() changes.
  • Other metrics — add memory, I/O reads, or query plan cost alongside wall-clock time.
  • Indexes — let the LLM propose CREATE INDEX statements alongside the query rewrite.
  • Batch mode — feed a folder of .sql files, get a report for each.

Requirements

  • Python 3.10+
  • groq (pip install groq)
  • GROQ_API_KEY environment variable (free at console.groq.com)
  • SQLite database