recall

← recall

The Site Reliability Workbook book

Companion to the SRE Book — more practical, less philosophical. How to actually implement SRE practices.

Beyer, Murphy, Rensin, Kawahara, Thorne (eds.) · 2018 · platform

Companion to the SRE Book — more practical, less philosophical. How to actually implement SRE practices.

why it matters

Where the SRE Book argues for the practices, the Workbook shows you how to do them. The chapters on SLO definition, alerting, and capacity planning are particularly load-bearing. Free online.

key ideas

  • SLO definition is iterative — start with what you have, refine as you learn what users actually care about
  • Multi-window multi-burn-rate alerting: how to alert on SLO violation in a way that catches both fast spikes and slow leaks
  • Eliminating toil: identify it, measure it, fund the engineering to remove it
  • Non-abstract large system design: practical scaling exercises

who should read it

Pair with the SRE Book. Read the Workbook second; it'll make more sense once you have the philosophical grounding.

covers

references: