recall

← recall

Release It! book

Stability and capacity patterns for systems that have to stay up. The book where 'circuit breaker' and 'bulkhead' became standard vocabulary.

Michael Nygard · 2018 · resilience

Stability and capacity patterns for systems that have to stay up. The book where 'circuit breaker' and 'bulkhead' became standard vocabulary.

why it matters

Pre-dates and predicts most cloud-native resilience practice. The stability patterns it names are now baseline expectations — circuit breakers, bulkheads, timeouts, fail-fast, steady state. If you've ever built a service that handles failure well, you're applying Nygard whether you read him or not. The 2nd edition adds chaos engineering and microservices context.

key ideas

  • Stability anti-patterns: integration points, chain reactions, cascading failures, blocked threads, attacks of self, slow responses, unbounded result sets
  • Stability patterns: timeout, circuit breaker, bulkhead, steady state, fail fast, handshake, test harness, decoupling middleware
  • Capacity is about understanding the system's load profile, not just provisioning more servers
  • Steady state: design every server to run forever without manual intervention — log rotation, cache eviction, archive policies
  • Network behavior under failure: TCP keepalive, connection pools, DNS, all surprising in production

memorable framings

  • 'Every system will fail. The question is not whether but how.'
  • 'A failure in any one component should not cause others to fail.'

who should read it

Every engineer who deploys to production. Read once early in your career, again after your first major incident.

covers

references: