recall

← recall

heartbeat pattern

Each node periodically sends a small message to a coordinator (or its peers) saying 'still alive.' Missing heartbeats trigger failure detection and re-election. Trade-off: too short and you flap; too long and dead nodes go unnoticed.

Each node periodically sends a small message to a coordinator (or its peers) saying 'still alive.' Missing heartbeats trigger failure detection and re-election. Trade-off: too short and you flap; too long and dead nodes go unnoticed.

symptoms

  • need to detect node failures
  • leadership leases need liveness signal

causes

  • no out-of-band liveness mechanism
  • TCP keepalive too coarse

fixes

  • heartbeat every N ms
  • failure threshold = K missed heartbeats
  • phi-accrual for adaptive thresholds

you might say

  • heartbeat interval
  • phi-accrual
  • no heartbeat in N seconds

related

topics: distributed-systems

references: