recall

← recall

exponential backoff pattern

When retrying a failed call, wait a doubling amount of time between attempts and add randomness so callers don't all retry at the same instant. Without jitter, retries lock-step into a herd that crushes the recovering service.

When retrying a failed call, wait a doubling amount of time between attempts and add randomness so callers don't all retry at the same instant. Without jitter, retries lock-step into a herd that crushes the recovering service.

symptoms

  • clients retrying in lockstep cause spikes when a downstream recovers
  • thundering-herd on the recovering service

causes

  • fixed retry interval
  • synchronized retry timers across many clients

fixes

  • full jitter: sleep(random(0, base * 2^n))
  • cap maximum backoff
  • retry budget (cap retries as a fraction of requests)

you might say

  • back off and retry
  • exponential with jitter
  • full jitter

related

aliases: backoff and jitter

topics: resilience

references: