|
| 1 | +# Database Internals: Where Your Data Actually Lives |
| 2 | + |
| 3 | +**A CloudStreet Educational Book** |
| 4 | + |
| 5 | +*Written by Opus 4.5* |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +[](https://github.com/cloudstreet-dev/Database-Internals/actions/workflows/deploy.yml) |
| 10 | + |
| 11 | +## Read Online |
| 12 | + |
| 13 | +**[Read the book online](https://cloudstreet-dev.github.io/Database-Internals/)** - Hosted on GitHub Pages |
| 14 | + |
| 15 | +--- |
| 16 | + |
| 17 | +## About This Book |
| 18 | + |
| 19 | +Ever wondered what happens when you hit COMMIT? Why does that one query take 30 seconds while another returns instantly? What's actually going on when your database "recovers" after a crash? |
| 20 | + |
| 21 | +This book takes you on a journey into the heart of database systems—the storage engines, B-trees, write-ahead logs, and MVCC implementations that power everything from your local SQLite database to planet-scale distributed systems. We'll explore how databases transform your SQL queries into disk operations, manage concurrent access from thousands of users, and guarantee your data survives power failures and hardware crashes. |
| 22 | + |
| 23 | +Whether you're a developer trying to understand why your queries are slow, an engineer designing data-intensive systems, or simply curious about one of the most sophisticated pieces of software ever created, this book will give you the mental models to understand what's really happening beneath the abstraction layers. |
| 24 | + |
| 25 | +## Who This Book Is For |
| 26 | + |
| 27 | +- **Backend developers** who want to write better queries and design better schemas |
| 28 | +- **Software engineers** building systems that interact heavily with databases |
| 29 | +- **System architects** making decisions about data storage and retrieval |
| 30 | +- **The curious** who want to understand the engineering marvels hiding behind `SELECT * FROM users` |
| 31 | + |
| 32 | +## What You'll Learn |
| 33 | + |
| 34 | +- How data is physically organized on disk and in memory |
| 35 | +- The data structures that make queries fast (and when they don't) |
| 36 | +- How databases handle multiple users reading and writing simultaneously |
| 37 | +- What guarantees ACID actually provides and how they're implemented |
| 38 | +- Why write-ahead logging is essential for crash recovery |
| 39 | +- How query optimizers decide the best way to execute your SQL |
| 40 | +- The trade-offs between different storage engine architectures |
| 41 | +- How distributed databases maintain consistency across machines |
| 42 | + |
| 43 | +## Table of Contents |
| 44 | + |
| 45 | +### Part I: Foundations |
| 46 | +1. [Introduction: The Journey of a Query](src/01-introduction.md) |
| 47 | +2. [Storage Engines and File Formats](src/02-storage-engines.md) |
| 48 | +3. [Disk I/O and Page Management](src/03-disk-io.md) |
| 49 | + |
| 50 | +### Part II: Data Structures |
| 51 | +4. [Indexing Structures: B-Trees and Beyond](src/04-indexing-structures.md) |
| 52 | +5. [LSM Trees and Write-Optimized Structures](src/05-lsm-trees.md) |
| 53 | +6. [Hash Indexes and Specialized Structures](src/06-hash-indexes.md) |
| 54 | + |
| 55 | +### Part III: Transactions and Concurrency |
| 56 | +7. [Write-Ahead Logging (WAL)](src/07-write-ahead-logging.md) |
| 57 | +8. [MVCC and Transaction Isolation](src/08-mvcc-isolation.md) |
| 58 | +9. [Locking and Concurrency Control](src/09-locking-concurrency.md) |
| 59 | + |
| 60 | +### Part IV: Query Processing |
| 61 | +10. [Query Parsing and Planning](src/10-query-parsing.md) |
| 62 | +11. [Query Optimization](src/11-query-optimization.md) |
| 63 | +12. [Buffer Pools and Caching](src/12-buffer-pools.md) |
| 64 | + |
| 65 | +### Part V: Reliability and Scale |
| 66 | +13. [Recovery and Crash Safety](src/13-recovery.md) |
| 67 | +14. [Column Stores vs Row Stores](src/14-column-vs-row.md) |
| 68 | +15. [Distributed Databases and Replication](src/15-distributed-databases.md) |
| 69 | + |
| 70 | +### Appendices |
| 71 | +- [Appendix A: Glossary of Terms](src/appendix-a-glossary.md) |
| 72 | +- [Appendix B: Further Reading](src/appendix-b-reading.md) |
| 73 | + |
| 74 | +## How to Read This Book |
| 75 | + |
| 76 | +This book is designed to be read sequentially, as later chapters build on concepts introduced earlier. However, if you're already familiar with certain topics, feel free to skip ahead: |
| 77 | + |
| 78 | +- **New to databases?** Start from Chapter 1 and work through sequentially. |
| 79 | +- **Know the basics?** Skip to Part II for the data structure deep-dives. |
| 80 | +- **Here for concurrency?** Part III covers transactions, locking, and MVCC. |
| 81 | +- **Query performance issues?** Part IV on query processing will be most relevant. |
| 82 | +- **Scaling up?** Part V covers distributed systems and different storage architectures. |
| 83 | + |
| 84 | +## Building Locally |
| 85 | + |
| 86 | +This book is built using [mdBook](https://rust-lang.github.io/mdBook/). To build locally: |
| 87 | + |
| 88 | +```bash |
| 89 | +# Install mdBook |
| 90 | +cargo install mdbook |
| 91 | + |
| 92 | +# Build the book |
| 93 | +mdbook build |
| 94 | + |
| 95 | +# Serve locally with hot-reload |
| 96 | +mdbook serve --open |
| 97 | +``` |
| 98 | + |
| 99 | +## Conventions Used |
| 100 | + |
| 101 | +Throughout this book, we use several conventions: |
| 102 | + |
| 103 | +- `Code blocks` indicate SQL, pseudocode, or data structure representations |
| 104 | +- **Bold terms** indicate important concepts being introduced |
| 105 | +- *Italics* are used for emphasis and technical terms |
| 106 | +- ASCII diagrams illustrate data structures and system architectures |
| 107 | +- PostgreSQL is used as the primary reference implementation, with notes on how other databases differ |
| 108 | + |
| 109 | +## About the Author |
| 110 | + |
| 111 | +This book was written by **Opus 4.5**, Anthropic's AI assistant, as part of the CloudStreet educational series. The content synthesizes knowledge from database research papers, system documentation, and practical engineering experience into an accessible guide for working developers. |
| 112 | + |
| 113 | +## License |
| 114 | + |
| 115 | +This work is part of the CloudStreet Educational Series. |
| 116 | + |
| 117 | +--- |
| 118 | + |
| 119 | +*"The database is the most important software component in most applications, yet it remains a black box to most developers. Let's open that box."* |
| 120 | + |
| 121 | +— Opus 4.5 |
0 commit comments