Metrics

Metrics are numerical measurements collected over time. Unlike logs (which record discrete events), metrics capture the ongoing state and performance of your application. The Ybor Platform uses Prometheus for metrics collection.

What You'll Learn

The four Prometheus metric types and when to use each
Naming conventions and labeling best practices
What to measure (Golden Signals, RED, USE methods)
How to expose a /metrics endpoint

How Metrics Work

┌─────────────────┐     scrape /metrics      ┌────────────────┐
│  Your Service   │◄────────────────────────│   Prometheus   │
│                 │                          │                │
│  /metrics       │────────────────────────►│   Time Series  │
│  endpoint       │     metrics data         │   Database     │
└─────────────────┘                          └────────────────┘

Your application exposes a /metrics endpoint
Prometheus scrapes this endpoint at regular intervals (typically 15-30 seconds)
Metrics are stored as time series data
Grafana queries Prometheus to visualize metrics

Metric Types

Prometheus defines four core metric types:

Counter

A value that only increases (or resets to zero on restart).

Use for: Counting events that happen over time.

# Total HTTP requests processed
http_requests_total{method="GET", path="/api/users", status="200"} 1542
http_requests_total{method="POST", path="/api/users", status="201"} 89
http_requests_total{method="GET", path="/api/users", status="500"} 3

Common counters:

requests_total - Total requests processed
errors_total - Total errors encountered
items_processed_total - Total items processed
bytes_transferred_total - Total bytes sent/received

Gauge

A value that can go up or down.

Use for: Current state or measurements.

# Current number of active connections
active_connections 42

# Current queue depth
queue_size{queue="orders"} 156

# Current memory usage in bytes
memory_usage_bytes 1073741824

Common gauges:

active_connections - Current connections
queue_size - Items in queue
cache_size - Cache entries
temperature - Current temperature

Histogram

Samples observations and counts them in configurable buckets.

Use for: Measuring distributions of values, especially latency.

# Request duration histogram
http_request_duration_seconds_bucket{le="0.005"} 24054
http_request_duration_seconds_bucket{le="0.01"} 33444
http_request_duration_seconds_bucket{le="0.025"} 100392
http_request_duration_seconds_bucket{le="0.05"} 129389
http_request_duration_seconds_bucket{le="0.1"} 133988
http_request_duration_seconds_bucket{le="+Inf"} 144320
http_request_duration_seconds_sum 53.112
http_request_duration_seconds_count 144320

Common histograms:

request_duration_seconds - Request latency
response_size_bytes - Response sizes
db_query_duration_seconds - Database query time

Summary

Similar to histogram but calculates quantiles on the client side.

Use for: When you need pre-calculated percentiles.

# Request duration summary
http_request_duration_seconds{quantile="0.5"} 0.023
http_request_duration_seconds{quantile="0.9"} 0.042
http_request_duration_seconds{quantile="0.99"} 0.087

Prefer Histograms

Histograms are more flexible than summaries. They can be aggregated across instances and allow calculating arbitrary percentiles in Prometheus queries.

Naming Conventions

Follow Prometheus naming conventions for consistency:

Pattern	Example	Description
`<namespace>_<name>_<unit>`	`http_request_duration_seconds`	Include unit suffix
`_total` suffix for counters	`http_requests_total`	Indicates counter type
`_info` suffix for metadata	`app_info`	Build/version information

Good names:

http_requests_total
http_request_duration_seconds
db_connections_active
cache_hits_total
queue_messages_pending

Bad names:

requests              # Missing namespace and unit
httpRequestsTotal     # CamelCase (use snake_case)
request_time_ms       # Use seconds, not milliseconds
num_errors            # Use _total suffix for counters

Labels

Labels add dimensions to metrics, allowing filtering and aggregation.

http_requests_total{method="GET", path="/api/users", status="200"}
http_requests_total{method="POST", path="/api/users", status="201"}
http_requests_total{method="GET", path="/api/orders", status="200"}

Label Best Practices

DO use labels for:

HTTP method, path, status code
Database operation type
Queue or topic name
Error type or code

DO NOT use labels for:

High cardinality values (user IDs, request IDs)
Unbounded sets (arbitrary user input)
Values that change frequently

Cardinality

Every unique combination of labels creates a new time series. Avoid labels with many possible values (user IDs, timestamps) as they can overwhelm Prometheus.

What to Measure

The Four Golden Signals

Google's Site Reliability Engineering book recommends monitoring these four signals:

Signal	Description	Metric Type
Latency	Time to service a request	Histogram
Traffic	Requests per second	Counter
Errors	Rate of failed requests	Counter
Saturation	How "full" your service is	Gauge

RED Method

For request-driven services:

Rate - Requests per second
Errors - Failed requests per second
Duration - Time per request

USE Method

For resources (CPU, memory, disk):

Utilization - Percentage of resource used
Saturation - Amount of work queued
Errors - Error events

The /metrics Endpoint

Your application should expose a /metrics endpoint that returns Prometheus format:

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",path="/api/users",status="200"} 1542
http_requests_total{method="POST",path="/api/users",status="201"} 89

# HELP http_request_duration_seconds HTTP request latency
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.005"} 24054
http_request_duration_seconds_bucket{le="0.01"} 33444
http_request_duration_seconds_bucket{le="+Inf"} 144320
http_request_duration_seconds_sum 53.112
http_request_duration_seconds_count 144320

# HELP active_connections Current number of active connections
# TYPE active_connections gauge
active_connections 42

Languages

For language-specific metrics implementations:

Python - prometheus-client
.NET - prometheus-net, OpenTelemetry
Java - Micrometer, Spring Boot Actuator
Rust - prometheus crate
JavaScript - prom-client

Logging — Discrete events for debugging
Tracing — Request flow for latency analysis
Observability Overview — The four pillars and platform integration

What You'll Learn​

How Metrics Work​

Metric Types​

Counter​

Gauge​

Histogram​

Summary​

Naming Conventions​

Labels​

Label Best Practices​

What to Measure​

The Four Golden Signals​

RED Method​

USE Method​

The /metrics Endpoint​

Languages​

Related​