Skip to main content

Health Checks

Health checks let Kubernetes know if your application is working correctly. Kubernetes uses this information to route traffic and restart unhealthy containers.

What You'll Learn

  • The three Kubernetes probe types and when to use each
  • What to check (and what not to check) in each probe
  • How to configure probe timing parameters
  • Standard health endpoints for the Ybor Platform

Probe Types

Kubernetes provides three types of probes:

Liveness Probe

Question: Is the application running?

The liveness probe detects if your application is stuck or deadlocked. If the probe fails, Kubernetes restarts the container.

Use for: Detecting hung processes, infinite loops, deadlocks.

livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3

Readiness Probe

Question: Is the application ready to receive traffic?

The readiness probe determines if your application can handle requests. If the probe fails, Kubernetes removes the pod from service endpoints (stops sending traffic).

Use for: Waiting for dependencies, handling temporary overload.

readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3

Startup Probe

Question: Has the application finished starting?

The startup probe runs during container startup. Other probes don't run until startup succeeds. Useful for slow-starting applications.

Use for: Applications with long initialization (loading models, warming caches).

startupProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 0
periodSeconds: 10
failureThreshold: 30 # Allow 5 minutes to start

Standard Endpoints

The Ybor Platform uses these standard health endpoints:

EndpointPurposeResponse
/healthComprehensive health checkDetailed JSON
/health/liveLiveness probe200 OK or 503
/health/readyReadiness probe200 OK or 503

Liveness Endpoint (/health/live)

The liveness endpoint should be simple and fast. It only checks if the process is responsive.

Good liveness check:

@app.get("/health/live")
def liveness():
return {"status": "ok"}

Bad liveness check (too complex):

@app.get("/health/live")
def liveness():
# DON'T do this - database failures will restart your app
db.ping()
redis.ping()
return {"status": "ok"}
Keep Liveness Simple

Never include dependency checks in liveness probes. If your database is down, restarting your application won't fix it—and may make things worse during recovery.

Readiness Endpoint (/health/ready)

The readiness endpoint checks if the application can handle traffic. Include critical dependencies.

@app.get("/health/ready")
def readiness():
checks = {}

# Check database
try:
db.execute("SELECT 1")
checks["database"] = "ok"
except Exception as e:
checks["database"] = str(e)
return JSONResponse(
status_code=503,
content={"status": "unhealthy", "checks": checks}
)

# Check required external services
# ...

return {"status": "healthy", "checks": checks}

Comprehensive Endpoint (/health)

The comprehensive endpoint provides detailed status for debugging and dashboards.

{
"status": "healthy",
"version": "1.2.3",
"uptime": "2d 4h 32m",
"checks": {
"database": {
"status": "healthy",
"latency_ms": 2
},
"redis": {
"status": "healthy",
"latency_ms": 1
},
"external_api": {
"status": "degraded",
"latency_ms": 250,
"message": "Elevated latency"
}
}
}

Response Codes

CodeMeaningProbe Result
200-399HealthySuccess
400-599UnhealthyFailure

Return 200 OK when healthy, 503 Service Unavailable when unhealthy.

What to Check

Liveness Probe

Check:

  • Process is responsive
  • Basic memory/thread sanity

Don't check:

  • Database connectivity
  • External service availability
  • Cache availability

Readiness Probe

Check:

  • Database connectivity
  • Required configuration is loaded
  • Required external services are reachable
  • Application has finished initialization

Consider checking:

  • Cache connectivity (if required)
  • Message queue connectivity (if required)

Probe Configuration

Timing Parameters

ParameterDescriptionTypical Value
initialDelaySecondsWait before first probe5-30 seconds
periodSecondsTime between probes5-10 seconds
timeoutSecondsProbe timeout1-5 seconds
successThresholdSuccesses to mark healthy1
failureThresholdFailures to mark unhealthy3

Example Configuration

apiVersion: v1
kind: Pod
spec:
containers:
- name: app
image: my-app:1.0.0
ports:
- containerPort: 8080

livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3

readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3

startupProbe:
httpGet:
path: /health/ready
port: 8080
periodSeconds: 10
failureThreshold: 30

Best Practices

  1. Keep liveness probes simple - Only check if the process is alive
  2. Be thorough in readiness probes - Check all required dependencies
  3. Return quickly - Probes should complete in < 1 second
  4. Use appropriate timeouts - Don't let probe timeouts cause false failures
  5. Set failure thresholds - Allow for transient failures (usually 3)
  6. Use startup probes for slow starts - Prevent premature restarts during initialization

Troubleshooting

Pod keeps restarting

  • Liveness probe may be too aggressive
  • Increase failureThreshold or periodSeconds
  • Check if liveness probe includes slow dependency checks
  • Review kubectl describe pod for probe failure details

Pod never becomes ready

  • Readiness probe may be checking an unavailable dependency
  • Check dependency health
  • Review application logs for startup errors
  • Verify probe endpoint returns correct status codes

Probe timeouts

  • Probe handler may be too slow
  • Increase timeoutSeconds
  • Optimize the health check logic
  • Consider async dependency checks with caching

Languages

For language-specific health check implementations:

  • Python - FastAPI, Starlette
  • .NET - IHealthCheck, MapHealthChecks
  • Java - Spring Boot Actuator
  • Rust - Axum, Tonic
  • JavaScript - Express, Fastify