load shedding pattern

When the system can't handle all incoming load, drop or reject some requests early instead of trying everything and failing slowly. Better to serve 80% of users well than 100% of users badly.

When the system can't handle all incoming load, drop or reject some requests early instead of trying everything and failing slowly. Better to serve 80% of users well than 100% of users badly.

symptoms

latency climbing while the system tries to serve everyone
queues growing faster than they drain
thread pools saturated

causes

unexpected traffic spike
downstream slowness amplifying upstream load
no admission control at the edge

fixes

admission control at the edge (rate limiter, queue length cap)
priority lanes — drop low-priority work first
circuit breakers downstream
explicit error responses (429/503) instead of timeouts

you might say

shed load
we're shedding traffic
drop the low-priority work

related

aliases: drop traffic, throttle

topics: resilience, scaling

references:

AWS Builders' Library — Using load shedding to avoid overload