Observability

This guide covers adding observability to your migrated application. Unlike builds and deployments, observability can be adopted incrementally—start with the basics and add more sophisticated instrumentation over time.

What You'll Learn

The observability phases and what to prioritize
Minimum requirements for platform integration
How to add each observability pillar progressively
What to skip initially and what to add later

Requirements Overview

Incremental Adoption

Full observability is not required before you deploy, but Phase 1 health checks (readiness and liveness endpoints) are the minimum expectation for stable deployments. Start there and add additional capabilities progressively based on your operational needs.

Phase	Components	Priority	Reference
Phase 1	Health checks	High — Required for reliable deployments	Health Checks
Phase 2	Structured logging	High — Essential for debugging	Logging
Phase 3	Metrics	Medium — Important for monitoring	Metrics
Phase 4	Tracing	Lower — Valuable for distributed systems	Tracing

Phase 1: Health Checks

Health checks are the most important observability component for platform deployment. Kubernetes uses them to route traffic and restart unhealthy containers.

Minimum Viable Health Checks

Implement these two endpoints:

Endpoint	Purpose	Returns
`/health/live`	"Is the process running?"	200 if alive
`/health/ready`	"Can it handle traffic?"	200 if ready

Quick Start Pattern

For most legacy applications, start simple:

# The liveness endpoint - keep it simple
@app.get("/health/live")
def liveness():
    return {"status": "ok"}

# The readiness endpoint - can check dependencies
@app.get("/health/ready")
def readiness():
    return {"status": "ok"}

Keep Liveness Simple

Never check external dependencies in the liveness probe. If your database is down, restarting your application won't fix it.

Update Your Manifest

spec:
  deployment:
    readinessProbe:
      port: 8080
      path: /health/ready
    livenessProbe:
      port: 8080
      path: /health/live

For complete patterns including dependency checks, timing configuration, and language-specific implementations, see Health Checks.

Phase 2: Structured Logging

The platform collects logs from stdout/stderr. Structured (JSON) logs are much more useful than plain text.

Why Structured Logging?

Plain Text	Structured JSON
`Error connecting to database: timeout`	`{"level":"error","message":"Error connecting to database","error":"timeout","timestamp":"..."}`
Hard to filter and search	Easy to filter by level, component, error type

Migration Approach

If your application already logs to stdout, it works with the platform. To improve:

Configure JSON output — Most logging libraries support this
Add standard fields — level, timestamp, message at minimum
Include context — request IDs, user IDs where appropriate

Language-Specific Libraries

Language	Recommended Library
Python	structlog, python-json-logger
Node.js	pino, winston
Java	Logback with JSON encoder
.NET	Serilog with JSON formatter
Rust	tracing with JSON subscriber

For configuration patterns and examples, see Logging and the Language Reference.

Phase 3: Metrics

Metrics provide numerical measurements for monitoring and alerting. The platform scrapes Prometheus-format metrics from your /metrics endpoint.

When to Add Metrics

Add metrics when you need:

Dashboards showing application performance
Alerts based on error rates or latency
Capacity planning data
SLI/SLO tracking

What Metrics to Start With

Metric Type	Examples
RED metrics	Request rate, Error rate, Duration
USE metrics	Utilization, Saturation, Errors
Business metrics	Orders processed, Users active

Exposing Metrics

Your application exposes metrics by implementing a /metrics endpoint that returns Prometheus format. The platform automatically scrapes this endpoint from any HTTP port.

spec:
  deployment:
    ports:
      - port: 8080
        protocol: http  # Platform scrapes /metrics on HTTP ports

For implementation patterns including auto-instrumentation options, see Metrics.

Phase 4: Tracing

Distributed tracing shows how requests flow through your services. It's most valuable when you have multiple services communicating.

When to Add Tracing

Add tracing when you need:

End-to-end request visibility across services
Latency breakdown by service
Dependency mapping
Root cause analysis for distributed failures

Migration Approach

Tracing requires more instrumentation than other pillars. For legacy apps:

Start with auto-instrumentation — OpenTelemetry provides automatic instrumentation for common frameworks
Add manual spans later — Instrument custom business logic after auto-instrumentation works
Propagate context — Ensure trace context passes between services

For implementation details including auto-instrumentation setup, see Tracing.

Prioritization by Application Type

Different application types benefit from different observability investments:

Application Type	Recommended Priority
API/Web Service	Health checks → Logging → Metrics → Tracing
Background Worker	Health checks → Logging → Metrics
CLI/Batch Job	Logging → Metrics (if long-running)
Event Consumer	Health checks → Logging → Metrics → Tracing

High-Traffic Services

Focus on:

Accurate metrics for capacity planning
Low-overhead tracing (use sampling)
Efficient structured logging

Critical Business Services

Focus on:

Comprehensive health checks including dependencies
Detailed business metrics
Full tracing for debugging

Internal Tools

Focus on:

Basic health checks
Structured logging for debugging
Skip metrics and tracing initially

Common Migration Challenges

Logging Library Conflicts

Problem: Application uses multiple logging libraries with different configurations.

Solution: Consolidate to one library or configure all to output JSON to stdout.

Performance Concerns

Problem: Concern about observability overhead.

Solution:

Start with low-overhead options (structured logging, basic metrics)
Use sampling for tracing in high-traffic services
Profile before and after to quantify impact

Missing Request Context

Problem: Logs don't include request IDs or correlation IDs.

Solution: Add middleware to generate and propagate request IDs. See your language's reference page for patterns.

Next Steps

Implement Phase 1 — Add health check endpoints and update your manifest
Deploy and verify — Ensure health checks work in the platform
Add Phase 2 — Configure structured logging
Iterate — Add metrics and tracing as operational needs arise

Observability Overview — Complete observability reference with the four pillars
Health Checks — Probe types, timing, implementation patterns
Logging — Structured logging configuration
Metrics — Prometheus metrics patterns
Tracing — OpenTelemetry instrumentation
Language Reference — Language-specific implementations

What You'll Learn​

Requirements Overview​

Phase 1: Health Checks​

Minimum Viable Health Checks​

Quick Start Pattern​

Update Your Manifest​

Phase 2: Structured Logging​

Why Structured Logging?​

Migration Approach​

Language-Specific Libraries​

Phase 3: Metrics​

When to Add Metrics​

What Metrics to Start With​

Exposing Metrics​

Phase 4: Tracing​

When to Add Tracing​

Migration Approach​

Prioritization by Application Type​

High-Traffic Services​

Critical Business Services​

Internal Tools​

Common Migration Challenges​

Logging Library Conflicts​

Performance Concerns​

Missing Request Context​

Next Steps​

Related Documentation​

What You'll Learn

Requirements Overview

Phase 1: Health Checks

Minimum Viable Health Checks

Quick Start Pattern

Update Your Manifest

Phase 2: Structured Logging

Why Structured Logging?

Migration Approach

Language-Specific Libraries

Phase 3: Metrics

When to Add Metrics

What Metrics to Start With

Exposing Metrics

Phase 4: Tracing

When to Add Tracing

Migration Approach

Prioritization by Application Type

High-Traffic Services

Critical Business Services

Internal Tools

Common Migration Challenges

Logging Library Conflicts

Performance Concerns

Missing Request Context

Next Steps

Related Documentation