recall

← recall

health check pattern

An endpoint (often /health or /ready) that the orchestrator polls to know if your instance is alive and ready for traffic. K8s splits this in two: liveness (restart if unhealthy) and readiness (don't send traffic if not ready). Cheap, easy to get wrong, surprisingly impactful.

An endpoint (often /health or /ready) that the orchestrator polls to know if your instance is alive and ready for traffic. K8s splits this in two: liveness (restart if unhealthy) and readiness (don't send traffic if not ready). Cheap, easy to get wrong, surprisingly impactful.

symptoms

  • traffic going to instances that can't serve it
  • instances marked healthy that are deeply broken
  • restart loops from too-aggressive liveness checks

causes

  • health check too shallow (always returns 200)
  • health check too deep (fails when any downstream is slow, causing cascades)
  • liveness and readiness conflated

fixes

  • readiness checks dependencies needed to serve
  • liveness only checks "this process is unrecoverable"
  • startup probes for slow-booting services

you might say

  • /healthz
  • failing readiness
  • pod stuck in NotReady

related

aliases: health endpoint, liveness probe, readiness probe

topics: operations, resilience