health check pattern
An endpoint (often /health or /ready) that the orchestrator polls to know if your instance is alive and ready for traffic. K8s splits this in two: liveness (restart if unhealthy) and readiness (don't send traffic if not ready). Cheap, easy to get wrong, surprisingly impactful.
An endpoint (often /health or /ready) that the orchestrator polls to know if your instance is alive and ready for traffic. K8s splits this in two: liveness (restart if unhealthy) and readiness (don't send traffic if not ready). Cheap, easy to get wrong, surprisingly impactful.
symptoms
- traffic going to instances that can't serve it
- instances marked healthy that are deeply broken
- restart loops from too-aggressive liveness checks
causes
- health check too shallow (always returns 200)
- health check too deep (fails when any downstream is slow, causing cascades)
- liveness and readiness conflated
fixes
- readiness checks dependencies needed to serve
- liveness only checks "this process is unrecoverable"
- startup probes for slow-booting services
you might say
- /healthz
- failing readiness
- pod stuck in NotReady