Health Checks

Health checks let Kubernetes know if your application is working correctly. Kubernetes uses this information to route traffic and restart unhealthy containers.

What You'll Learn

The three Kubernetes probe types and when to use each
What to check (and what not to check) in each probe
How to configure probe timing parameters
Standard health endpoints for the Ybor Platform

Probe Types

Kubernetes provides three types of probes:

Liveness Probe

Question: Is the application running?

The liveness probe detects if your application is stuck or deadlocked. If the probe fails, Kubernetes restarts the container.

Use for: Detecting hung processes, infinite loops, deadlocks.

livenessProbe:
  httpGet:
    path: /health/live
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 10
  failureThreshold: 3

Readiness Probe

Question: Is the application ready to receive traffic?

The readiness probe determines if your application can handle requests. If the probe fails, Kubernetes removes the pod from service endpoints (stops sending traffic).

Use for: Waiting for dependencies, handling temporary overload.

readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 3

Startup Probe

Question: Has the application finished starting?

The startup probe runs during container startup. Other probes don't run until startup succeeds. Useful for slow-starting applications.

Use for: Applications with long initialization (loading models, warming caches).

startupProbe:
  httpGet:
    path: /health/ready
    port: 8080
  initialDelaySeconds: 0
  periodSeconds: 10
  failureThreshold: 30  # Allow 5 minutes to start

Standard Endpoints

The Ybor Platform uses these standard health endpoints:

Endpoint	Purpose	Response
`/health`	Comprehensive health check	Detailed JSON
`/health/live`	Liveness probe	200 OK or 503
`/health/ready`	Readiness probe	200 OK or 503

Liveness Endpoint (`/health/live`)

The liveness endpoint should be simple and fast. It only checks if the process is responsive.

Good liveness check:

@app.get("/health/live")
def liveness():
    return {"status": "ok"}

Bad liveness check (too complex):

@app.get("/health/live")
def liveness():
    # DON'T do this - database failures will restart your app
    db.ping()
    redis.ping()
    return {"status": "ok"}

Keep Liveness Simple

Never include dependency checks in liveness probes. If your database is down, restarting your application won't fix it—and may make things worse during recovery.

Readiness Endpoint (`/health/ready`)

The readiness endpoint checks if the application can handle traffic. Include critical dependencies.

@app.get("/health/ready")
def readiness():
    checks = {}

    # Check database
    try:
        db.execute("SELECT 1")
        checks["database"] = "ok"
    except Exception as e:
        checks["database"] = str(e)
        return JSONResponse(
            status_code=503,
            content={"status": "unhealthy", "checks": checks}
        )

    # Check required external services
    # ...

    return {"status": "healthy", "checks": checks}

Comprehensive Endpoint (`/health`)

The comprehensive endpoint provides detailed status for debugging and dashboards.

{
  "status": "healthy",
  "version": "1.2.3",
  "uptime": "2d 4h 32m",
  "checks": {
    "database": {
      "status": "healthy",
      "latency_ms": 2
    },
    "redis": {
      "status": "healthy",
      "latency_ms": 1
    },
    "external_api": {
      "status": "degraded",
      "latency_ms": 250,
      "message": "Elevated latency"
    }
  }
}

Response Codes

Code	Meaning	Probe Result
200-399	Healthy	Success
400-599	Unhealthy	Failure

Return 200 OK when healthy, 503 Service Unavailable when unhealthy.

What to Check

Liveness Probe

Check:

Process is responsive
Basic memory/thread sanity

Don't check:

Database connectivity
External service availability
Cache availability

Readiness Probe

Check:

Database connectivity
Required configuration is loaded
Required external services are reachable
Application has finished initialization

Consider checking:

Cache connectivity (if required)
Message queue connectivity (if required)

Probe Configuration

Timing Parameters

Parameter	Description	Typical Value
`initialDelaySeconds`	Wait before first probe	5-30 seconds
`periodSeconds`	Time between probes	5-10 seconds
`timeoutSeconds`	Probe timeout	1-5 seconds
`successThreshold`	Successes to mark healthy	1
`failureThreshold`	Failures to mark unhealthy	3

Example Configuration

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: app
    image: my-app:1.0.0
    ports:
    - containerPort: 8080

    livenessProbe:
      httpGet:
        path: /health/live
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 3
      failureThreshold: 3

    readinessProbe:
      httpGet:
        path: /health/ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 3
      failureThreshold: 3

    startupProbe:
      httpGet:
        path: /health/ready
        port: 8080
      periodSeconds: 10
      failureThreshold: 30

Best Practices

Keep liveness probes simple - Only check if the process is alive
Be thorough in readiness probes - Check all required dependencies
Return quickly - Probes should complete in < 1 second
Use appropriate timeouts - Don't let probe timeouts cause false failures
Set failure thresholds - Allow for transient failures (usually 3)
Use startup probes for slow starts - Prevent premature restarts during initialization

Troubleshooting

Pod keeps restarting

Liveness probe may be too aggressive
Increase failureThreshold or periodSeconds
Check if liveness probe includes slow dependency checks
Review kubectl describe pod for probe failure details

Pod never becomes ready

Readiness probe may be checking an unavailable dependency
Check dependency health
Review application logs for startup errors
Verify probe endpoint returns correct status codes

Probe timeouts

Probe handler may be too slow
Increase timeoutSeconds
Optimize the health check logic
Consider async dependency checks with caching

Languages

For language-specific health check implementations:

Python - FastAPI, Starlette
.NET - IHealthCheck, MapHealthChecks
Java - Spring Boot Actuator
Rust - Axum, Tonic
JavaScript - Express, Fastify

Deployments — Basic Walkthrough — Configure probes in PlatformApplication
Logging — Log health check results for debugging
Observability Overview — The four pillars and platform integration

What You'll Learn​

Probe Types​

Liveness Probe​

Readiness Probe​

Startup Probe​

Standard Endpoints​

Liveness Endpoint (/health/live)​

Readiness Endpoint (/health/ready)​

Comprehensive Endpoint (/health)​

Response Codes​

What to Check​

Liveness Probe​

Readiness Probe​

Probe Configuration​

Timing Parameters​

Example Configuration​

Best Practices​

Troubleshooting​

Pod keeps restarting​

Pod never becomes ready​

Probe timeouts​

Languages​

Related​

What You'll Learn

Probe Types

Liveness Probe

Readiness Probe

Startup Probe

Standard Endpoints

Liveness Endpoint (`/health/live`)

Readiness Endpoint (`/health/ready`)

Comprehensive Endpoint (`/health`)

Response Codes

What to Check

Liveness Probe

Readiness Probe

Probe Configuration

Timing Parameters

Example Configuration

Best Practices

Troubleshooting

Pod keeps restarting

Pod never becomes ready

Probe timeouts

Languages

Related