recall

← recall

shuffle sharding pattern

Instead of giving each tenant one shard, give them a small random subset (e.g. 2 out of 8 nodes). Two random tenants almost never share both nodes — so even a tenant that overloads its nodes only affects the small fraction of tenants who unluckily share both. Limits blast radius without dedicating resources per tenant.

Instead of giving each tenant one shard, give them a small random subset (e.g. 2 out of 8 nodes). Two random tenants almost never share both nodes — so even a tenant that overloads its nodes only affects the small fraction of tenants who unluckily share both. Limits blast radius without dedicating resources per tenant.

symptoms

  • noisy neighbor wrecking many tenants
  • one bad tenant impacting all

causes

  • simple sharding puts groups of tenants together
  • no isolation between tenants

fixes

  • random subset of nodes per tenant
  • pair with circuit breakers per tenant
  • cells + shuffle-sharding inside each cell

you might say

  • shuffle shard
  • pick 2 of N
  • rare collision

related

topics: architecture, resilience, scaling

references: