hot partition pattern
Most partitions are fine, but one partition gets a disproportionate share of traffic. That one shard becomes the bottleneck for the whole system regardless of how much capacity you have elsewhere.
Most partitions are fine, but one partition gets a disproportionate share of traffic. That one shard becomes the bottleneck for the whole system regardless of how much capacity you have elsewhere.
symptoms
- one shard at high CPU/IO while others idle
- throttling on a small subset of keys
- scaling the cluster doesn't help because the hot partition is the bottleneck
causes
- partition key with skewed distribution (e.g. tenant_id where one tenant is huge)
- time-based partition keys (everyone writes 'today')
- celebrity / popular item access
fixes
- repartition with higher-cardinality key
- salt / shard-suffix the hot key
- cache the hot keys
- request hedging for read hotspots
you might say
- we've got a hot key
- this one shard is melting
- the partition key isn't distributing well