request hedging pattern
If a request hasn't responded within (say) the p95 latency, fire a duplicate to a different replica. Take whichever returns first. Cheap insurance against tail latency at the cost of slight extra load. From Dean & Barroso's 'The Tail at Scale.'
If a request hasn't responded within (say) the p95 latency, fire a duplicate to a different replica. Take whichever returns first. Cheap insurance against tail latency at the cost of slight extra load. From Dean & Barroso's 'The Tail at Scale.'
symptoms
- p99 dominated by a few stragglers
- tail latency >> median
causes
- one slow node poisoning every read
- no retry path for slow-but-not-failed requests
fixes
- hedge: fire duplicate after p95 threshold
- cap hedging budget so it doesn't double load
- use only for idempotent operations
you might say
- hedge the request
- speculative retry
- tied request