Designing Data-Intensive Applications book

The reference book for data systems — replication, partitioning, transactions, batch and stream processing, consistency, consensus.

Martin Kleppmann · 2017 · systems

The reference book for data systems — replication, partitioning, transactions, batch and stream processing, consistency, consensus.

why it matters

If you can only read one technical book this decade, it's this one. DDIA is the cross-cutting reference that connects database internals, distributed systems theory, and the real-world tradeoffs you face shipping data infra. Almost every term in this catalog has a DDIA citation because that's where most engineers first meet the idea.

key ideas

Replication models: single-leader, multi-leader, leaderless — each with distinct failure modes and consistency properties
Partitioning: range vs hash partitioning, hot partitions, secondary indexes, rebalancing
Transactions: isolation levels (read committed, snapshot, serializable), what each actually buys you, and what they don't
The 'Beyond' chapter: end-to-end argument, why distributed transactions are usually the wrong answer, what stronger semantics cost
Stream processing as a generalization of databases: Kafka as a log, derived data, change data capture
Consistency vs consensus: linearizability, total order broadcast, the FLP impossibility result framed accessibly

memorable framings

Maintainability: operability, simplicity, evolvability — the operational triad
The unbundled database: take the things a database does internally and run them as separate systems
Stream-table duality: every table can be derived from a log; every log can be materialized into a table

who should read it

Senior engineers working on anything data-intensive. Read it once cover-to-cover, then return to chapters as you hit specific problems. The exercises at the end of each chapter are worth doing.

covers

references:

dataintensive.net