split brain pattern
A network partition makes two halves of a cluster each believe they're the live one. Both accept writes; data diverges. The hardest distributed-systems failure to recover from cleanly.
A network partition makes two halves of a cluster each believe they're the live one. Both accept writes; data diverges. The hardest distributed-systems failure to recover from cleanly.
symptoms
- writes accepted on both sides of a partition
- diverged state after the partition heals
- leader election flapping
causes
- no quorum requirement for leadership
- asymmetric failure detection
- trusting heartbeats without majority confirmation
fixes
- quorum-based leader election (only one half can have a quorum)
- fencing tokens at the resource layer
- STONITH / shoot-the-other-node-in-the-head
you might say
- we split-brained
- two leaders at once