recall

← recall

request hedging pattern

If a request hasn't responded within (say) the p95 latency, fire a duplicate to a different replica. Take whichever returns first. Cheap insurance against tail latency at the cost of slight extra load. From Dean & Barroso's 'The Tail at Scale.'

If a request hasn't responded within (say) the p95 latency, fire a duplicate to a different replica. Take whichever returns first. Cheap insurance against tail latency at the cost of slight extra load. From Dean & Barroso's 'The Tail at Scale.'

symptoms

  • p99 dominated by a few stragglers
  • tail latency >> median

causes

  • one slow node poisoning every read
  • no retry path for slow-but-not-failed requests

fixes

  • hedge: fire duplicate after p95 threshold
  • cap hedging budget so it doesn't double load
  • use only for idempotent operations

you might say

  • hedge the request
  • speculative retry
  • tied request

related

aliases: hedged request, speculative request

topics: performance, resilience

references: