exponential backoff pattern
When retrying a failed call, wait a doubling amount of time between attempts and add randomness so callers don't all retry at the same instant. Without jitter, retries lock-step into a herd that crushes the recovering service.
When retrying a failed call, wait a doubling amount of time between attempts and add randomness so callers don't all retry at the same instant. Without jitter, retries lock-step into a herd that crushes the recovering service.
symptoms
- clients retrying in lockstep cause spikes when a downstream recovers
- thundering-herd on the recovering service
causes
- fixed retry interval
- synchronized retry timers across many clients
fixes
- full jitter: sleep(random(0, base * 2^n))
- cap maximum backoff
- retry budget (cap retries as a fraction of requests)
you might say
- back off and retry
- exponential with jitter
- full jitter