"The right data structure for the job makes all the difference."
B-trees and LSM trees are general-purpose workhorses, but sometimes specialized data structures offer better performance for specific access patterns. In this chapter, we'll explore hash indexes, bitmap indexes, and other specialized structures.
Hash indexes provide O(1) average-case lookup by key—faster than B-trees for exact matches.
Hash Function: key → bucket number
Example: hash("alice@example.com") mod 8 = 3
┌─────────────────────────────────────────────────────────────┐
│ HASH TABLE │
│ │
│ Bucket 0: [empty] │
│ Bucket 1: [bob@... → row] → [zoe@... → row] │
│ Bucket 2: [empty] │
│ Bucket 3: [alice@... → row] → [mike@... → row] │
│ Bucket 4: [carol@... → row] │
│ Bucket 5: [empty] │
│ Bucket 6: [dave@... → row] │
│ Bucket 7: [eve@... → row] → [frank@... → row] │
│ │
└─────────────────────────────────────────────────────────────┘
Find "alice@example.com":
1. Compute hash: hash("alice@example.com") = 3
2. Go to bucket 3
3. Search chain for exact match
4. Found! Return row pointer
Time: O(1) average (O(n) worst case with bad hash distribution)
No range queries: Hash destroys key ordering
-- Can use hash index:
SELECT * FROM users WHERE email = 'alice@example.com';
-- CANNOT use hash index:
SELECT * FROM users WHERE email > 'alice@example.com';
SELECT * FROM users WHERE email LIKE 'alice%';
SELECT * FROM users ORDER BY email;No prefix matching: Must match entire key
No partial index scans: Can't stop early
In-memory hash tables are straightforward, but on-disk is harder:
Problem: Hash table resizing requires rewriting everything
Solution: Extendible hashing or linear hashing
EXTENDIBLE HASHING
Use first N bits of hash to select bucket
Global depth = 2 (use 2 bits)
00 ──► Bucket A: [keys with hash starting 00...]
01 ──► Bucket B: [keys with hash starting 01...]
10 ──► Bucket C: [keys with hash starting 10...]
11 ──► Bucket C: [same bucket for 10 and 11]
When Bucket A overflows:
- Split it into A1 (000...) and A2 (001...)
- Increase global depth to 3 for those entries
- Other buckets unchanged
PostgreSQL supports hash indexes but with caveats:
-- Create hash index
CREATE INDEX idx_email_hash ON users USING hash (email);
-- Only for equality comparisons
SELECT * FROM users WHERE email = 'alice@example.com'; -- Uses index
SELECT * FROM users WHERE email > 'a'; -- Cannot use hash indexHistorical note: Before PostgreSQL 10, hash indexes weren't crash-safe (no WAL logging). Now they are, but B-trees are still usually preferred.
The Memory (HEAP) storage engine supports hash indexes:
-- Memory table with hash index
CREATE TABLE cache (
key_col VARCHAR(255),
value_col TEXT,
INDEX USING HASH (key_col)
) ENGINE=MEMORY;Useful for temporary tables and caches where data is in memory anyway.
Good use cases:
- High-cardinality exact-match lookups (UUIDs, email addresses)
- In-memory tables/caches
- Join keys when only equality joins are needed
Bad use cases:
- Range queries of any kind
- Columns used in ORDER BY
- Low cardinality columns
- General purpose indexing
We introduced bloom filters in Chapter 5. Let's understand them more deeply.
A bloom filter is a bit array with k hash functions:
Bloom Filter (m=16 bits, k=3 hash functions)
Initial state (empty):
[0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Insert "alice":
h1("alice") = 3
h2("alice") = 7
h3("alice") = 11
Set bits 3, 7, 11 to 1:
[0][0][0][1][0][0][0][1][0][0][0][1][0][0][0][0]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Insert "bob":
h1("bob") = 1
h2("bob") = 7 (already set!)
h3("bob") = 14
[0][1][0][1][0][0][0][1][0][0][0][1][0][0][1][0]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Query "alice":
Check bits 3, 7, 11 → All are 1 → MAYBE present
Query "carol":
h1("carol") = 5
h2("carol") = 9
h3("carol") = 11
Check bits 5, 9, 11
Bit 5 = 0 → DEFINITELY NOT present
(We can stop at first 0 bit)
Probability of false positive ≈ (1 - e^(-kn/m))^k
Where:
m = number of bits
n = number of inserted elements
k = number of hash functions
Example:
m = 10 bits per element
k = 7 hash functions
False positive rate ≈ 0.8%
Given n elements and desired false positive rate p:
Optimal m (bits) = -n * ln(p) / (ln(2))^2
Optimal k (hash functions) = (m/n) * ln(2)
Example: 1 million keys, 1% false positive rate
m ≈ 9.6 million bits ≈ 1.2 MB
k ≈ 7 hash functions
Counting Bloom Filter: Use counters instead of bits, allows deletes Cuckoo Filter: Supports deletes, often more space-efficient Quotient Filter: Cache-friendly, supports merging
Bitmap indexes are highly efficient for low-cardinality columns.
For each distinct value, store a bitmap indicating which rows have that value:
Table: users
┌────┬─────────┬────────┐
│ id │ name │ status │
├────┼─────────┼────────┤
│ 1 │ Alice │ active │
│ 2 │ Bob │ active │
│ 3 │ Carol │ pending│
│ 4 │ Dave │ active │
│ 5 │ Eve │ deleted│
│ 6 │ Frank │ pending│
└────┴─────────┴────────┘
Bitmap Index on status:
"active": [1][1][0][1][0][0] (rows 1, 2, 4)
"pending": [0][0][1][0][0][1] (rows 3, 6)
"deleted": [0][0][0][0][1][0] (row 5)
Queries become bitwise operations:
SELECT * FROM users WHERE status = 'active';
→ Return rows where bitmap "active" has bit = 1
→ Rows 1, 2, 4 SELECT * FROM users WHERE status = 'active' OR status = 'pending';
"active": [1][1][0][1][0][0]
"pending": [0][0][1][0][0][1]
OR result: [1][1][1][1][0][1]
→ Rows 1, 2, 3, 4, 6 SELECT * FROM users WHERE status = 'active' AND country = 'US';
"status=active": [1][1][0][1][0][0]
"country=US": [1][0][1][0][0][1]
AND result: [1][0][0][0][0][0]
→ Row 1 only- Compact for low cardinality: 3 status values × n bits << n × pointer
- Fast bitwise operations: Hardware-accelerated AND, OR, NOT
- Excellent for analytics: Count queries, complex predicates
- Poor for high cardinality: 1 million distinct values = 1 million bitmaps
- Expensive updates: Insert requires updating bitmaps
- Space for sparse bitmaps: Mostly-zero bitmaps waste space
Run-length encoding compresses sparse bitmaps:
Raw bitmap: [1][0][0][0][0][0][0][0][0][0][1]
RLE compressed: [1 at position 0, 1 at position 10]
Or: "1, skip 9, 1"
Word-Aligned Hybrid (WAH) and Roaring Bitmaps are common compression schemes:
Roaring Bitmaps:
- Divide into chunks of 65536 integers
- For each chunk, choose representation:
- Dense: raw bitmap (when > 4096 values)
- Sparse: sorted array of values (when ≤ 4096 values)
- Run: RLE (for runs of consecutive values)
PostgreSQL offers specialized index types for complex data.
A framework for building tree-structured indexes for custom data types:
-- Full-text search
CREATE INDEX idx_content ON documents USING gist (to_tsvector('english', content));
-- Geometric queries
CREATE INDEX idx_location ON places USING gist (coordinates);
-- Range types
CREATE INDEX idx_dates ON events USING gist (date_range);GiST supports operators:
- Containment:
@>,<@ - Overlap:
&& - Nearest neighbor:
<-> - Full-text search:
@@
For indexing composite values (arrays, JSONB, full-text):
-- Array containment
CREATE INDEX idx_tags ON articles USING gin (tags);
SELECT * FROM articles WHERE tags @> ARRAY['database', 'index'];
-- JSONB queries
CREATE INDEX idx_data ON events USING gin (data);
SELECT * FROM events WHERE data @> '{"type": "click"}';
-- Full-text search
CREATE INDEX idx_body ON posts USING gin (to_tsvector('english', body));GIN structure:
Inverted Index for tags column:
"database" → [article_ids: 1, 5, 12, 47, ...]
"index" → [article_ids: 1, 23, 47, 89, ...]
"postgres" → [article_ids: 5, 12, 89, ...]
Query: tags @> ARRAY['database', 'index']
Intersect posting lists:
"database": {1, 5, 12, 47}
"index": {1, 23, 47, 89}
Result: {1, 47}
| Aspect | GiST | GIN |
|---|---|---|
| Build time | Faster | Slower |
| Index size | Smaller | Larger |
| Query speed | Slower | Faster |
| Updates | Faster | Slower |
| Best for | Geometric, ranges | Full-text, arrays, JSONB |
For very large tables with naturally ordered data, BRIN indexes are extremely compact.
Store min/max values for ranges of physical pages:
Table: time_series_data (sorted by timestamp)
Pages 0-127: min_ts = 2024-01-01, max_ts = 2024-01-15
Pages 128-255: min_ts = 2024-01-15, max_ts = 2024-01-31
Pages 256-383: min_ts = 2024-02-01, max_ts = 2024-02-14
...
Query: WHERE timestamp BETWEEN '2024-01-20' AND '2024-01-25'
Check BRIN:
Pages 0-127: max < query_min → SKIP
Pages 128-255: range overlaps → SCAN
Pages 256-383: min > query_max → SKIP
Requirements:
1. Very large table (millions+ rows)
2. Data naturally ordered by indexed column
3. Queries filter by that column
Perfect for:
- Time-series data (ordered by time)
- Log tables (ordered by log_time)
- Append-only data
Example: 1 billion row table
B-tree index: ~30 GB
BRIN index: ~1 MB
CREATE INDEX idx_logs_ts ON logs USING brin (created_at);
-- With custom pages per range
CREATE INDEX idx_logs_ts ON logs USING brin (created_at)
WITH (pages_per_range = 32);- Only useful if data is physically ordered
- False positives: must scan matching ranges
- Not useful for randomly distributed data
Sometimes you don't need to index everything.
Index only a subset of rows:
-- Only index active users
CREATE INDEX idx_active_users ON users (email)
WHERE status = 'active';
-- This query uses the partial index:
SELECT * FROM users WHERE email = 'alice@example.com' AND status = 'active';
-- This query cannot use it (no status filter):
SELECT * FROM users WHERE email = 'alice@example.com';Benefits:
- Smaller index size
- Faster index maintenance (fewer entries)
- Better cache efficiency
Common patterns:
-- Unprocessed items
CREATE INDEX idx_unprocessed ON jobs (created_at) WHERE processed = false;
-- Recent data
CREATE INDEX idx_recent ON orders (customer_id)
WHERE created_at > '2024-01-01';
-- Non-null values
CREATE INDEX idx_phone ON contacts (phone) WHERE phone IS NOT NULL;Index the result of an expression:
-- Case-insensitive search
CREATE INDEX idx_lower_email ON users (lower(email));
SELECT * FROM users WHERE lower(email) = 'alice@example.com';
-- Date extraction
CREATE INDEX idx_order_year ON orders (extract(year FROM created_at));
SELECT * FROM orders WHERE extract(year FROM created_at) = 2024;
-- JSON field
CREATE INDEX idx_type ON events ((data->>'type'));
SELECT * FROM events WHERE data->>'type' = 'click';The query must match the expression exactly to use the index.
Include extra columns in the index to avoid heap lookups:
-- Regular index
CREATE INDEX idx_email ON users (email);
-- Query needs name too:
SELECT name FROM users WHERE email = 'alice@example.com';
-- Must: Look up email in index → Get row pointer → Read heap for name
-- Covering index
CREATE INDEX idx_email_name ON users (email) INCLUDE (name);
-- or
CREATE INDEX idx_email_name ON users (email, name);
-- Now: Look up email in index → Name is already there!-- Include columns that aren't searchable, just needed in results
CREATE INDEX idx_email ON users (email) INCLUDE (name, avatar_url);
-- The included columns aren't in the search key
-- But they're stored in leaf pagesTrade-off: Larger index, but avoids heap access for covered queries.
┌─────────────────────────────────────────┐
│ What queries will use this index? │
└─────────────────────────────────────────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Equality │ │ Range │ │ Full-text│
│ only │ │ scans │ │ search │
└──────────┘ └──────────┘ └──────────┘
│ │ │
▼ ▼ ▼
Consider: B-tree is GIN for
Hash index usually best tsvector
│
▼
┌─────────────────┐
│ Data naturally │
│ ordered by col? │
└─────────────────┘
│ │
YES NO
│ │
▼ ▼
Consider Stick with
BRIN B-tree
| Index Type | Best For | Avoid When |
|---|---|---|
| B-tree | General purpose, ranges | N/A (default choice) |
| Hash | Equality only, high cardinality | Need ranges/ordering |
| GIN | Arrays, JSONB, full-text | Write-heavy, simple queries |
| GiST | Geometric, ranges, nearest neighbor | Simple equality queries |
| BRIN | Huge tables, naturally ordered | Random data distribution |
| Bitmap* | Low cardinality, analytics | High cardinality, OLTP |
*Bitmap indexes created on-the-fly in PostgreSQL for complex queries
Different index types serve different purposes:
- Hash indexes: O(1) lookup but no ranges
- Bloom filters: Probabilistic membership testing
- Bitmap indexes: Efficient for low-cardinality analytics
- GiST/GIN: Specialized queries (full-text, geometric, JSONB)
- BRIN: Extremely compact for ordered data
- Partial indexes: Index only what you query
- Expression indexes: Index computed values
- Covering indexes: Avoid heap lookups
Choose indexes based on your query patterns, not just data types.
In Chapter 7, we'll switch from indexing to durability. We'll explore Write-Ahead Logging (WAL)—the mechanism that ensures your data survives crashes.
"The best index is the one that turns a table scan into a direct lookup. The second best is knowing when no index is needed."