Health Checks
Health checks let Kubernetes know if your application is working correctly. Kubernetes uses this information to route traffic and restart unhealthy containers.
What You'll Learn
- The three Kubernetes probe types and when to use each
- What to check (and what not to check) in each probe
- How to configure probe timing parameters
- Standard health endpoints for the Ybor Platform
Probe Types
Kubernetes provides three types of probes:
Liveness Probe
Question: Is the application running?
The liveness probe detects if your application is stuck or deadlocked. If the probe fails, Kubernetes restarts the container.
Use for: Detecting hung processes, infinite loops, deadlocks.
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3
Readiness Probe
Question: Is the application ready to receive traffic?
The readiness probe determines if your application can handle requests. If the probe fails, Kubernetes removes the pod from service endpoints (stops sending traffic).
Use for: Waiting for dependencies, handling temporary overload.
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
Startup Probe
Question: Has the application finished starting?
The startup probe runs during container startup. Other probes don't run until startup succeeds. Useful for slow-starting applications.
Use for: Applications with long initialization (loading models, warming caches).
startupProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 0
periodSeconds: 10
failureThreshold: 30 # Allow 5 minutes to start
Standard Endpoints
The Ybor Platform uses these standard health endpoints:
| Endpoint | Purpose | Response |
|---|---|---|
/health | Comprehensive health check | Detailed JSON |
/health/live | Liveness probe | 200 OK or 503 |
/health/ready | Readiness probe | 200 OK or 503 |
Liveness Endpoint (/health/live)
The liveness endpoint should be simple and fast. It only checks if the process is responsive.
Good liveness check:
@app.get("/health/live")
def liveness():
return {"status": "ok"}
Bad liveness check (too complex):
@app.get("/health/live")
def liveness():
# DON'T do this - database failures will restart your app
db.ping()
redis.ping()
return {"status": "ok"}
Never include dependency checks in liveness probes. If your database is down, restarting your application won't fix it—and may make things worse during recovery.
Readiness Endpoint (/health/ready)
The readiness endpoint checks if the application can handle traffic. Include critical dependencies.
@app.get("/health/ready")
def readiness():
checks = {}
# Check database
try:
db.execute("SELECT 1")
checks["database"] = "ok"
except Exception as e:
checks["database"] = str(e)
return JSONResponse(
status_code=503,
content={"status": "unhealthy", "checks": checks}
)
# Check required external services
# ...
return {"status": "healthy", "checks": checks}
Comprehensive Endpoint (/health)
The comprehensive endpoint provides detailed status for debugging and dashboards.
{
"status": "healthy",
"version": "1.2.3",
"uptime": "2d 4h 32m",
"checks": {
"database": {
"status": "healthy",
"latency_ms": 2
},
"redis": {
"status": "healthy",
"latency_ms": 1
},
"external_api": {
"status": "degraded",
"latency_ms": 250,
"message": "Elevated latency"
}
}
}
Response Codes
| Code | Meaning | Probe Result |
|---|---|---|
| 200-399 | Healthy | Success |
| 400-599 | Unhealthy | Failure |
Return 200 OK when healthy, 503 Service Unavailable when unhealthy.
What to Check
Liveness Probe
Check:
- Process is responsive
- Basic memory/thread sanity
Don't check:
- Database connectivity
- External service availability
- Cache availability
Readiness Probe
Check:
- Database connectivity
- Required configuration is loaded
- Required external services are reachable
- Application has finished initialization
Consider checking:
- Cache connectivity (if required)
- Message queue connectivity (if required)
Probe Configuration
Timing Parameters
| Parameter | Description | Typical Value |
|---|---|---|
initialDelaySeconds | Wait before first probe | 5-30 seconds |
periodSeconds | Time between probes | 5-10 seconds |
timeoutSeconds | Probe timeout | 1-5 seconds |
successThreshold | Successes to mark healthy | 1 |
failureThreshold | Failures to mark unhealthy | 3 |
Example Configuration
apiVersion: v1
kind: Pod
spec:
containers:
- name: app
image: my-app:1.0.0
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
startupProbe:
httpGet:
path: /health/ready
port: 8080
periodSeconds: 10
failureThreshold: 30
Best Practices
- Keep liveness probes simple - Only check if the process is alive
- Be thorough in readiness probes - Check all required dependencies
- Return quickly - Probes should complete in < 1 second
- Use appropriate timeouts - Don't let probe timeouts cause false failures
- Set failure thresholds - Allow for transient failures (usually 3)
- Use startup probes for slow starts - Prevent premature restarts during initialization
Troubleshooting
Pod keeps restarting
- Liveness probe may be too aggressive
- Increase
failureThresholdorperiodSeconds - Check if liveness probe includes slow dependency checks
- Review
kubectl describe podfor probe failure details
Pod never becomes ready
- Readiness probe may be checking an unavailable dependency
- Check dependency health
- Review application logs for startup errors
- Verify probe endpoint returns correct status codes
Probe timeouts
- Probe handler may be too slow
- Increase
timeoutSeconds - Optimize the health check logic
- Consider async dependency checks with caching
Languages
For language-specific health check implementations:
- Python - FastAPI, Starlette
- .NET - IHealthCheck, MapHealthChecks
- Java - Spring Boot Actuator
- Rust - Axum, Tonic
- JavaScript - Express, Fastify
Related
- Deployments — Basic Walkthrough — Configure probes in PlatformApplication
- Logging — Log health check results for debugging
- Observability Overview — The four pillars and platform integration