The Tail at Scale conceptpaper

Dean & Barroso, 2013.

Dean & Barroso, 2013. Argues that at scale, tail latency dominates user experience and must be engineered against directly. The math: if a request hits N services and each has a 1% chance of being slow, the request has a (1 - 0.99^N) chance — for N=100, that's ~63%. Mitigations they introduced: hedged requests (fire a duplicate after p95), tied requests (cancel siblings on first response), micro-partitions for faster recovery, latency-aware load balancing. The paper that taught the industry that p99 isn't a number you measure, it's a thing you architect against.

The Tail at Scale conceptpaper

see also