request hedging pattern

If a request hasn't responded within (say) the p95 latency, fire a duplicate to a different replica. Take whichever returns first. Cheap insurance against tail latency at the cost of slight extra load. From Dean & Barroso's 'The Tail at Scale.'

If a request hasn't responded within (say) the p95 latency, fire a duplicate to a different replica. Take whichever returns first. Cheap insurance against tail latency at the cost of slight extra load. From Dean & Barroso's 'The Tail at Scale.'

symptoms

p99 dominated by a few stragglers
tail latency >> median

causes

one slow node poisoning every read
no retry path for slow-but-not-failed requests

fixes

hedge: fire duplicate after p95 threshold
cap hedging budget so it doesn't double load
use only for idempotent operations

you might say

hedge the request
speculative retry
tied request

related

aliases: hedged request, speculative request

topics: performance, resilience

references:

The Tail at Scale (Dean & Barroso)