The Site Reliability Workbook book
Companion to the SRE Book — more practical, less philosophical. How to actually implement SRE practices.
Companion to the SRE Book — more practical, less philosophical. How to actually implement SRE practices.
why it matters
Where the SRE Book argues for the practices, the Workbook shows you how to do them. The chapters on SLO definition, alerting, and capacity planning are particularly load-bearing. Free online.
key ideas
- SLO definition is iterative — start with what you have, refine as you learn what users actually care about
- Multi-window multi-burn-rate alerting: how to alert on SLO violation in a way that catches both fast spikes and slow leaks
- Eliminating toil: identify it, measure it, fund the engineering to remove it
- Non-abstract large system design: practical scaling exercises
who should read it
Pair with the SRE Book. Read the Workbook second; it'll make more sense once you have the philosophical grounding.