recall

← recall

Database Internals book

Two halves: how a single database stores data on disk, and how distributed databases coordinate across nodes.

Alex Petrov · 2019 · systems

Two halves: how a single database stores data on disk, and how distributed databases coordinate across nodes.

why it matters

Where DDIA goes broad, Database Internals goes deep on a narrower question: what's actually happening underneath the SQL. If you've ever wondered how indexes work, why some workloads kill your DB, or what the difference between B-tree and LSM-tree means in production, this is the book.

key ideas

  • Storage engines split into B-tree (in-place updates, read-optimized) vs LSM-tree (append-only, write-optimized) — the choice shapes everything
  • Page caches, buffer pools, write-ahead logs as the universal pattern for crash recovery
  • Indexes as separate data structures with their own access patterns; covering indexes, partial indexes, and why index choice dominates query performance
  • Distributed databases stack consensus + replication + partitioning into a coherent system; understanding each layer separately makes the whole thing legible
  • Failure detection (phi-accrual, heartbeats), gossip protocols, anti-entropy

memorable framings

  • B-tree vs LSM is a write/read amplification trade-off, not a 'better' question
  • Most database mysteries are explainable once you understand the storage engine

who should read it

Engineers who feel like databases are magic and want to stop feeling that. Pairs well with reading Postgres or RocksDB internals afterward.

covers

references: