Skip to main content

Metrics

Metrics are numerical measurements collected over time. Unlike logs (which record discrete events), metrics capture the ongoing state and performance of your application. The Ybor Platform uses Prometheus for metrics collection.

What You'll Learn

  • The four Prometheus metric types and when to use each
  • Naming conventions and labeling best practices
  • What to measure (Golden Signals, RED, USE methods)
  • How to expose a /metrics endpoint

How Metrics Work

┌─────────────────┐     scrape /metrics      ┌────────────────┐
│ Your Service │◄────────────────────────│ Prometheus │
│ │ │ │
│ /metrics │────────────────────────►│ Time Series │
│ endpoint │ metrics data │ Database │
└─────────────────┘ └────────────────┘
  1. Your application exposes a /metrics endpoint
  2. Prometheus scrapes this endpoint at regular intervals (typically 15-30 seconds)
  3. Metrics are stored as time series data
  4. Grafana queries Prometheus to visualize metrics

Metric Types

Prometheus defines four core metric types:

Counter

A value that only increases (or resets to zero on restart).

Use for: Counting events that happen over time.

# Total HTTP requests processed
http_requests_total{method="GET", path="/api/users", status="200"} 1542
http_requests_total{method="POST", path="/api/users", status="201"} 89
http_requests_total{method="GET", path="/api/users", status="500"} 3

Common counters:

  • requests_total - Total requests processed
  • errors_total - Total errors encountered
  • items_processed_total - Total items processed
  • bytes_transferred_total - Total bytes sent/received

Gauge

A value that can go up or down.

Use for: Current state or measurements.

# Current number of active connections
active_connections 42

# Current queue depth
queue_size{queue="orders"} 156

# Current memory usage in bytes
memory_usage_bytes 1073741824

Common gauges:

  • active_connections - Current connections
  • queue_size - Items in queue
  • cache_size - Cache entries
  • temperature - Current temperature

Histogram

Samples observations and counts them in configurable buckets.

Use for: Measuring distributions of values, especially latency.

# Request duration histogram
http_request_duration_seconds_bucket{le="0.005"} 24054
http_request_duration_seconds_bucket{le="0.01"} 33444
http_request_duration_seconds_bucket{le="0.025"} 100392
http_request_duration_seconds_bucket{le="0.05"} 129389
http_request_duration_seconds_bucket{le="0.1"} 133988
http_request_duration_seconds_bucket{le="+Inf"} 144320
http_request_duration_seconds_sum 53.112
http_request_duration_seconds_count 144320

Common histograms:

  • request_duration_seconds - Request latency
  • response_size_bytes - Response sizes
  • db_query_duration_seconds - Database query time

Summary

Similar to histogram but calculates quantiles on the client side.

Use for: When you need pre-calculated percentiles.

# Request duration summary
http_request_duration_seconds{quantile="0.5"} 0.023
http_request_duration_seconds{quantile="0.9"} 0.042
http_request_duration_seconds{quantile="0.99"} 0.087
Prefer Histograms

Histograms are more flexible than summaries. They can be aggregated across instances and allow calculating arbitrary percentiles in Prometheus queries.

Naming Conventions

Follow Prometheus naming conventions for consistency:

PatternExampleDescription
<namespace>_<name>_<unit>http_request_duration_secondsInclude unit suffix
_total suffix for countershttp_requests_totalIndicates counter type
_info suffix for metadataapp_infoBuild/version information

Good names:

http_requests_total
http_request_duration_seconds
db_connections_active
cache_hits_total
queue_messages_pending

Bad names:

requests              # Missing namespace and unit
httpRequestsTotal # CamelCase (use snake_case)
request_time_ms # Use seconds, not milliseconds
num_errors # Use _total suffix for counters

Labels

Labels add dimensions to metrics, allowing filtering and aggregation.

http_requests_total{method="GET", path="/api/users", status="200"}
http_requests_total{method="POST", path="/api/users", status="201"}
http_requests_total{method="GET", path="/api/orders", status="200"}

Label Best Practices

DO use labels for:

  • HTTP method, path, status code
  • Database operation type
  • Queue or topic name
  • Error type or code

DO NOT use labels for:

  • High cardinality values (user IDs, request IDs)
  • Unbounded sets (arbitrary user input)
  • Values that change frequently
Cardinality

Every unique combination of labels creates a new time series. Avoid labels with many possible values (user IDs, timestamps) as they can overwhelm Prometheus.

What to Measure

The Four Golden Signals

Google's Site Reliability Engineering book recommends monitoring these four signals:

SignalDescriptionMetric Type
LatencyTime to service a requestHistogram
TrafficRequests per secondCounter
ErrorsRate of failed requestsCounter
SaturationHow "full" your service isGauge

RED Method

For request-driven services:

  • Rate - Requests per second
  • Errors - Failed requests per second
  • Duration - Time per request

USE Method

For resources (CPU, memory, disk):

  • Utilization - Percentage of resource used
  • Saturation - Amount of work queued
  • Errors - Error events

The /metrics Endpoint

Your application should expose a /metrics endpoint that returns Prometheus format:

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",path="/api/users",status="200"} 1542
http_requests_total{method="POST",path="/api/users",status="201"} 89

# HELP http_request_duration_seconds HTTP request latency
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.005"} 24054
http_request_duration_seconds_bucket{le="0.01"} 33444
http_request_duration_seconds_bucket{le="+Inf"} 144320
http_request_duration_seconds_sum 53.112
http_request_duration_seconds_count 144320

# HELP active_connections Current number of active connections
# TYPE active_connections gauge
active_connections 42

Languages

For language-specific metrics implementations:

  • Python - prometheus-client
  • .NET - prometheus-net, OpenTelemetry
  • Java - Micrometer, Spring Boot Actuator
  • Rust - prometheus crate
  • JavaScript - prom-client