Everything compresses differently.
A survey of the major algorithms — Huffman to Zstandard, lossless to lossy, zip files to video codecs — covering mechanics, tradeoffs, and ideal contexts. Closes with a bonus section on why LLMs are, at their core, just really expensive lossy compressors.
The book is published at: https://cloudstreet-dev.github.io/The-Big-Book-of-Compression-Algorithms/
| Chapter | Topic |
|---|---|
| What is Compression? | Entropy, Shannon's theorem, lossless vs lossy |
| Huffman Coding | Frequency trees, prefix codes, canonical Huffman |
| Arithmetic Coding | Interval subdivision, range coding, ANS/FSE |
| Chapter | Topic |
|---|---|
| LZ77, LZ78, LZW | Sliding windows, growing dictionaries, how ZIP works |
| Deflate, Brotli, Zstandard | Why Zstd won, tunable tradeoffs |
| LZMA and 7-Zip | Ultra-high compression, Markov models, range coding |
| Chapter | Topic |
|---|---|
| RLE and Simple Methods | RLE, delta encoding, BWT, bzip2 pipeline |
| Image Compression | PNG, JPEG, WebP, AVIF — what you lose and when it matters |
| Audio Compression | FLAC, MP3, AAC, Opus — psychoacoustics and bitrate tradeoffs |
| Chapter | Topic |
|---|---|
| Video Compression | H.264, H.265, AV1 — inter-frame prediction, why video is different |
| Database and Columnar Compression | Snappy, LZ4, Parquet, dictionary/RLE/delta encodings |
| Network Compression | HTTP compression, HPACK, when to compress in transit vs at rest |
| Chapter | Topic |
|---|---|
| Choosing the Right Algorithm | Decision framework, speed vs ratio vs compatibility |
| Chapter | Topic |
|---|---|
| LLMs as Lossy Compressors | Training as compression, inference as decompression, hallucinations as artifacts |
Requires mdBook:
cargo install mdbook
mdbook build # outputs to ./book/
mdbook serve # serves at http://localhost:3000 with live reloadPublished by CloudStreet. Written for developers who want to understand not just which algorithm to use, but why it works and where it breaks down.
Licensed under CC BY 4.0.