Real-world examples of database optimization with measurable results.
An e-commerce platform was experiencing slow customer order lookups. The query was taking 2.5 seconds on average, causing poor user experience.
SELECT *
FROM orders
WHERE customer_id = 123;EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;
-- Result:
-- Seq Scan on orders (cost=0.00..125000.00 rows=50 width=128)
-- Filter: (customer_id = 123)
-- Planning Time: 0.123 ms
-- Execution Time: 2547.891 msProblems Identified:
- Sequential scan on 5 million row table
- No index on
customer_idcolumn - Fetching all columns with
SELECT *
-- Step 1: Create index on customer_id
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- Step 2: Optimize query to select only needed columns
SELECT id, order_date, total, status
FROM orders
WHERE customer_id = 123;
-- Step 3: Update table statistics
ANALYZE orders;EXPLAIN ANALYZE
SELECT id, order_date, total, status
FROM orders
WHERE customer_id = 123;
-- Result:
-- Index Scan using idx_orders_customer_id (cost=0.43..8.45 rows=50 width=24)
-- Index Cond: (customer_id = 123)
-- Planning Time: 0.156 ms
-- Execution Time: 48.234 ms| Metric | Before | After | Improvement |
|---|---|---|---|
| Execution Time | 2,547 ms | 48 ms | 53x faster |
| Scan Type | Sequential | Index | ✅ |
| Rows Scanned | 5,000,000 | 50 | 99.999% reduction |
| Index Size | - | 120 MB | Small overhead |
- Development Time: 15 minutes
- Performance Gain: 53x faster
- User Impact: Improved page load time from 3s to 0.1s
- Cost: Minimal (120MB disk space)
- ✅ Always index foreign key columns
- ✅ Avoid
SELECT *- fetch only needed columns - ✅ Use
EXPLAIN ANALYZEto identify issues - ✅ Update statistics after creating indexes
A reporting query joining three large tables was taking 8 seconds to execute, causing timeout issues in the web application.
SELECT
o.id,
o.order_date,
o.total,
c.name AS customer_name,
c.email,
p.name AS product_name,
oi.quantity
FROM orders o
LEFT JOIN customers c ON c.id = o.customer_id
LEFT JOIN order_items oi ON oi.order_id = o.id
LEFT JOIN products p ON p.id = oi.product_id
WHERE o.order_date >= '2024-01-01';EXPLAIN ANALYZE [query above]
-- Problems:
-- 1. Sequential scan on orders (no index on order_date)
-- 2. Sequential scan on order_items
-- 3. Multiple hash joins causing high memory usage
-- Execution Time: 8234.567 msBottlenecks:
- No index on
orders.order_date - No index on
order_items.order_id - No index on
order_items.product_id
-- Create indexes for WHERE clause
CREATE INDEX idx_orders_order_date ON orders(order_date);
-- Create indexes for JOIN conditions
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);
-- Create covering index for best performance
CREATE INDEX idx_orders_date_customer
ON orders(order_date, customer_id)
INCLUDE (id, total);
-- Update statistics
ANALYZE orders;
ANALYZE order_items;
ANALYZE customers;
ANALYZE products;EXPLAIN ANALYZE [same query]
-- Now using index scans:
-- Index Scan using idx_orders_date_customer
-- Nested Loop joins instead of Hash joins
-- Execution Time: 312.456 ms| Metric | Before | After | Improvement |
|---|---|---|---|
| Execution Time | 8,234 ms | 312 ms | 26x faster |
| Join Method | Hash Join | Nested Loop | More efficient |
| I/O Operations | High | Low | 95% reduction |
| Memory Usage | 256 MB | 32 MB | 87% reduction |
The query was further improved by filtering earlier:
-- Optimized version: filter before joining
SELECT
o.id,
o.order_date,
o.total,
c.name AS customer_name,
c.email,
p.name AS product_name,
oi.quantity
FROM (
SELECT id, order_date, total, customer_id
FROM orders
WHERE order_date >= '2024-01-01'
) o
LEFT JOIN customers c ON c.id = o.customer_id
LEFT JOIN order_items oi ON oi.order_id = o.id
LEFT JOIN products p ON p.id = oi.product_id;Final Execution Time: 287 ms (29x faster than original)
- ✅ Index all JOIN columns
- ✅ Index WHERE clause columns
- ✅ Consider covering indexes for frequently accessed columns
- ✅ Filter data as early as possible
- ✅ Update statistics after bulk data changes
A dashboard query calculating customer statistics was taking 15 seconds, making the dashboard unusable. The query was run 100+ times per day.
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(total) AS total_spent,
AVG(total) AS avg_order_value,
MAX(order_date) AS last_order_date,
MIN(order_date) AS first_order_date
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 10
ORDER BY total_spent DESC;EXPLAIN ANALYZE [query above]
-- Sequential scan + aggregation on 5M rows
-- Hash Aggregate: 14.8GB disk usage
-- Execution Time: 15234.789 msProblems:
- Aggregating 5 million rows every time
- No caching mechanism
- High memory and I/O usage
-- Create materialized view
CREATE MATERIALIZED VIEW customer_order_stats AS
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(total) AS total_spent,
AVG(total) AS avg_order_value,
MAX(order_date) AS last_order_date,
MIN(order_date) AS first_order_date,
NOW() AS last_updated
FROM orders
GROUP BY customer_id;
-- Create index on materialized view
CREATE INDEX idx_customer_stats_spent
ON customer_order_stats(total_spent DESC);
CREATE INDEX idx_customer_stats_count
ON customer_order_stats(order_count);
-- Refresh strategy (scheduled job)
-- Option 1: Complete refresh nightly
-- REFRESH MATERIALIZED VIEW customer_order_stats;
-- Option 2: Concurrent refresh (non-blocking)
-- REFRESH MATERIALIZED VIEW CONCURRENTLY customer_order_stats;SELECT *
FROM customer_order_stats
WHERE order_count > 10
ORDER BY total_spent DESC;EXPLAIN ANALYZE [new query]
-- Index Scan using idx_customer_stats_spent
-- Execution Time: 8.234 ms| Metric | Before | After | Improvement |
|---|---|---|---|
| Execution Time | 15,234 ms | 8 ms | 1,900x faster |
| I/O Operations | Very High | Minimal | 99.9% reduction |
| CPU Usage | High | Minimal | 99% reduction |
| Memory Usage | 14.8 GB | < 1 MB | Dramatic reduction |
-- Scheduled refresh (cron job every hour)
-- 0 * * * * psql -d mydb -c "REFRESH MATERIALIZED VIEW CONCURRENTLY customer_order_stats;"
-- Or create a function for incremental updates
CREATE OR REPLACE FUNCTION refresh_customer_stats()
RETURNS void AS $$
BEGIN
REFRESH MATERIALIZED VIEW CONCURRENTLY customer_order_stats;
RAISE NOTICE 'Customer stats refreshed at %', NOW();
END;
$$ LANGUAGE plpgsql;Advantages:
- ✅ 1900x faster queries
- ✅ Reduced server load
- ✅ Better user experience
Disadvantages:
- ❌ Data slightly stale (up to 1 hour old)
- ❌ Refresh requires resources
- ❌ Additional storage (200 MB)
Solution: For this use case, hourly updates were acceptable. For real-time needs, consider incremental materialized views or trigger-based updates.
- ✅ Use materialized views for expensive aggregations
- ✅ Index materialized views like regular tables
- ✅ Use CONCURRENTLY for non-blocking refreshes
- ✅ Schedule refreshes during low-traffic periods
- ✅ Balance freshness vs performance needs
An API endpoint was loading customer data with their orders, resulting in 1 query to get customers + N queries to get orders for each customer. For 100 customers, this meant 101 queries!
-- Query 1: Get customers
SELECT * FROM customers LIMIT 100;
-- Then for each customer (100 times):
SELECT * FROM orders WHERE customer_id = ?;Total: 101 queries, ~500ms total time
-- Single query with LEFT JOIN
SELECT
c.id AS customer_id,
c.name,
c.email,
o.id AS order_id,
o.order_date,
o.total,
o.status
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE c.id IN (1, 2, 3, ..., 100) -- List of customer IDs
ORDER BY c.id, o.order_date DESC;-- PostgreSQL: Aggregate orders into JSON array
SELECT
c.id,
c.name,
c.email,
COALESCE(
json_agg(
json_build_object(
'order_id', o.id,
'order_date', o.order_date,
'total', o.total,
'status', o.status
) ORDER BY o.order_date DESC
) FILTER (WHERE o.id IS NOT NULL),
'[]'
) AS orders
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE c.id IN (1, 2, 3, ..., 100)
GROUP BY c.id, c.name, c.email;| Metric | Before (N+1) | After (JOIN) | Improvement |
|---|---|---|---|
| Number of Queries | 101 | 1 | 101x fewer |
| Total Time | 500 ms | 45 ms | 11x faster |
| Network Overhead | High | Minimal | 99% reduction |
| Database Load | High | Low | Significant reduction |
- ✅ Always fetch related data in a single query when possible
- ✅ Use JOINs or JSON aggregation
- ✅ Watch for N+1 patterns in ORM code
- ✅ Use query profiling tools to detect N+1 issues
Based on these case studies, here are the key principles:
- Always use
EXPLAIN ANALYZEbefore optimizing - Identify the actual bottleneck
- Set baseline metrics
- Index foreign keys
- Index WHERE clause columns
- Index ORDER BY columns
- Consider composite indexes
- Select only needed columns
- Filter early in the query
- Use appropriate JOIN types
- Avoid N+1 queries
- Materialized views for aggregations
- Application-level caching
- Database query cache (MySQL)
- Regular VACUUM and ANALYZE
- Monitor slow query logs
- Review and remove unused indexes
- Update statistics
- Test with production-like data volumes
- Simulate concurrent users
- Measure under load
EXPLAIN ANALYZE- Query execution planspg_stat_statements- Query statistics (PostgreSQL)SHOW PROFILE- Query profiling (MySQL)- pgAdmin, MySQL Workbench - Visual tools
- pg_stat_monitor - Enhanced PostgreSQL monitoring
- PMM (Percona Monitoring and Management)
- Datadog, New Relic - APM solutions
- Review your slowest queries
- Run EXPLAIN ANALYZE on them
- Apply appropriate optimization techniques
- Measure improvements
- Document your changes
- Monitor ongoing performance
Remember: Every database is different. Always test optimizations in a staging environment before applying to production!