Designing Data-Intensive Applications book
The reference book for data systems — replication, partitioning, transactions, batch and stream processing, consistency, consensus.
The reference book for data systems — replication, partitioning, transactions, batch and stream processing, consistency, consensus.
why it matters
If you can only read one technical book this decade, it's this one. DDIA is the cross-cutting reference that connects database internals, distributed systems theory, and the real-world tradeoffs you face shipping data infra. Almost every term in this catalog has a DDIA citation because that's where most engineers first meet the idea.
key ideas
- Replication models: single-leader, multi-leader, leaderless — each with distinct failure modes and consistency properties
- Partitioning: range vs hash partitioning, hot partitions, secondary indexes, rebalancing
- Transactions: isolation levels (read committed, snapshot, serializable), what each actually buys you, and what they don't
- The 'Beyond' chapter: end-to-end argument, why distributed transactions are usually the wrong answer, what stronger semantics cost
- Stream processing as a generalization of databases: Kafka as a log, derived data, change data capture
- Consistency vs consensus: linearizability, total order broadcast, the FLP impossibility result framed accessibly
memorable framings
- Maintainability: operability, simplicity, evolvability — the operational triad
- The unbundled database: take the things a database does internally and run them as separate systems
- Stream-table duality: every table can be derived from a log; every log can be materialized into a table
who should read it
Senior engineers working on anything data-intensive. Read it once cover-to-cover, then return to chapters as you hit specific problems. The exercises at the end of each chapter are worth doing.
covers
- replica-lag
- eventual-consistency
- leader-follower
- split-brain
- consensus
- paxos
- raft
- mvcc
- serializability
- linearizability
- two-phase-commit
- two-phase-locking
- optimistic-concurrency-control
- pessimistic-concurrency-control
- snapshot-isolation
- cap-theorem
- pacelc
- base
- change-data-capture
- hot-partition
- consistent-hashing
- rendezvous-hashing
- quorum
- sloppy-quorum
- vector-clock
- lamport-clock
- idempotency
- fallacies-of-distributed-computing