Self-optimizing SQL query pipeline. Inspired by karpathy/autoresearch.
LLM rewrites query → run it → measure (speed + correctness) → keep if better → repeat
One file. Zero fluff.
Slow SQL queries are everywhere. Correlated subqueries, redundant scans, missing pre-aggregations — they kill app performance and nobody catches them until prod is on fire.
AutoSQL fixes this automatically:
- You give it a query
- It runs the query, records the baseline time and result hash
- An LLM rewrites the query trying to make it faster
- It runs the new query, checks the result is identical, measures the speedup
- Keeps the best version, feeds the history back into the next iteration
- Repeats until the budget is spent
# 1. Install dependency
pip install groq
# 2. Get a free API key from https://console.groq.com
export GROQ_API_KEY=gsk_...
# 3. Optimize your query
python AutoSQL.py --query slow.sql --db myapp.sqlite
# Or inline
python AutoSQL.py --query "SELECT * FROM ..." --db myapp.sqlite --iterations 15
# Use a different model
python AutoSQL.py --query slow.sql --db myapp.sqlite --model llama-3.1-8b-instant──────────────────────────────────────────────────────────────
AutoSQL — Self-optimizing Query Pipeline
Model : llama-3.3-70b-versatile (Groq)
──────────────────────────────────────────────────────────────
Measuring baseline … 1842.3 ms
[01/10] Generating … ✓ 312.4 ms (5.89× faster) ← NEW BEST
[02/10] Generating … ✓ 298.1 ms (6.18× faster) ← NEW BEST
[03/10] Generating … ✗ wrong result
[04/10] Generating … ✓ 291.7 ms (6.32× faster) ← NEW BEST
...
──────────────────────────────────────────────────────────────
Final Report
──────────────────────────────────────────────────────────────
Baseline : 1842.3 ms
Best : 291.7 ms (6.32× faster)
AutoSQL tackles the classic anti-patterns that kill query performance:
| Anti-pattern | What AutoSQL does |
|---|---|
Correlated subquery in WHERE |
Replaces with pre-aggregated JOIN / CTE |
Correlated subquery in SELECT |
Replaces with pre-aggregated JOIN |
| Multiple passes over the same table | Collapses into a single scan |
AutoSQL never accepts a faster query that returns different results.
Every candidate is run and its output hashed. Only an exact match against the baseline hash counts as correct.
After the run, autosql_log.json contains every attempt:
{
"baseline_ms": 1842.3,
"best_ms": 291.7,
"speedup": 6.32,
"iterations": [
{ "iteration": 1, "time_ms": 312.4, "speedup": 5.89, "correct": true, "query": "..." },
...
]
}| Flag | Default | Description |
|---|---|---|
--query |
(required) | SQL string or .sql file path |
--db |
(required) | SQLite database path |
--iterations |
10 |
How many rewrites to attempt |
--model |
llama-3.3-70b-versatile |
Groq model to use |
AutoSQL intentionally stays minimal. Easy extension points:
- Other databases — swap
sqlite3forpsycopg2,duckdb, etc. Onlyrun_query()changes. - Other metrics — add memory, I/O reads, or query plan cost alongside wall-clock time.
- Indexes — let the LLM propose
CREATE INDEXstatements alongside the query rewrite. - Batch mode — feed a folder of
.sqlfiles, get a report for each.
- Python 3.10+
groq(pip install groq)GROQ_API_KEYenvironment variable (free at console.groq.com)- SQLite database