Istio Diagnostics

Global (Cluster‑Wide) Istio Troubleshooting

This section is for determining whether there is a cluster‑wide Istio problem.

Most issues are not caused by Istio globally. You should only follow this section if multiple services or namespaces are affected.

Signs of a Global Istio Issue

Investigate Istio system health if one or more of the following are true:

Multiple unrelated services are failing at the same time
Traffic failures affect many namespaces or the entire cluster
External traffic through Gateways fails for all backends
New pods cannot communicate with any other services
Recent changes were made to:
- Istio version or configuration
- Cluster networking (CNI, nodes, kernel updates)

If the problem affects only one service, namespace, or route, skip this section and go to Service‑Specific Istio Troubleshooting.

Global Istio Health Checks

Start here when suspecting a global issue. Always check these in order.

ztunnel DaemonSet Status

Ambient mode depends on ztunnel running on every node.

kubectl get daemonset -n istio-system

Confirm:

ztunnel exists
Desired = Current = Ready = number of nodes

If istiod is running but ztunnel is not

If istiod is healthy but ztunnel pods are failing to start or are crash looping, this usually indicates a lower-level networking issue, not an Istio configuration problem.

Common causes include:

CNI issues (e.g. Cilium or another CNI not functioning correctly)
Node-level networking problems
Recent node or kernel changes

In this case:

kubectl logs -n istio-system -l app=ztunnel

ztunnel logs will often clearly indicate a networking or CNI failure.

info

Follow our Networking Troubleshooting Guide before escalating Istio issues.

Namespace Is in Ambient Mode

kubectl get namespace <namespace> --show-labels

Look for:

istio.io/dataplane-mode=ambient

If this label is missing, traffic is not flowing through the ambient mesh.

Service‑Specific Istio Troubleshooting

Use this section only after confirming that Istio is healthy at a global level or if only one service seems to be affected.

If istio-system is healthy, issues are usually caused by service‑specific Istio configuration, not Istio itself.

note

Examples include misconfigured Gateways, HTTPRoutes, AuthorizationPolicies, or missing waypoints.

Common Issues That Look Like Istio Problems (But Aren't)

Before troubleshooting Istio configuration for a single service, make sure the service itself is healthy. Istio cannot route traffic to workloads that Kubernetes considers unavailable.

The Service has no healthy endpoints
- If a Service has zero endpoints, Istio will return errors such as 503 or no healthy upstream.
- This is Istio reporting the problem correctly, not causing it.
Pods behind the Service are not Running and Ready
- CrashLooping, unready, or non-listening pods will cause Istio-looking errors even when the mesh is functioning correctly.

As a first step, always verify:

kubectl get pods -n <namespace>
kubectl get svc <service-name> -n <namespace>
kubectl get endpoints <service-name> -n <namespace>

If these checks fail, resolve the service health issue before continuing with Istio-specific troubleshooting.

Using `istioctl` for Analysis

istioctl is the Istio command‑line tool. It can inspect, validate, and diagnose Istio configuration, but it can also modify cluster state if used incorrectly.

Be Careful

istioctl has powerful commands
Some commands can change Istio configuration or restart components
Only use the commands described below unless explicitly instructed by support

Installing `istioctl`

Install the version of istioctl that matches the Istio version running in your cluster.

If you are unsure which version to use, contact support before proceeding.

After installation, confirm it works:

istioctl version

What `istioctl analyze` Does

istioctl analyze is a safe, read‑only command that checks Istio resources for common problems.

It can detect issues such as:

Invalid or conflicting Istio configuration
Gateway or HTTPRoute resources that will never match traffic
Policies that reference missing services or namespaces
Configuration that requires a waypoint proxy that does not exist

It does not:

Modify resources
Restart pods
Change traffic behavior

Running `istioctl analyze`

Analyze the entire cluster:

istioctl analyze

Analyze a specific namespace:

istioctl analyze -n <namespace>

Include verbose output if requested:

istioctl analyze --verbose

Interpreting Results

istioctl analyze reports findings as:

Errors – Configuration that will not work
Warnings – Configuration that is suspicious or incomplete
Info – Observations that may or may not be relevant

When opening a support request, include:

The full output of istioctl analyze
The namespace(s) involved

Do not attempt to fix issues by trial‑and‑error changes unless you understand the impact.

Simple, Safe Service‑Level Actions

These actions are low‑risk and often resolve transient issues.

Restart the Application Pods

kubectl rollout restart deployment <deployment-name> -n <namespace>

This refreshes workload identity and ambient tunnel connections.

Restart ztunnel (Only if Multiple Services Are Affected)

kubectl rollout restart daemonset ztunnel -n istio-system

Temporarily Remove a Namespace from Ambient Mode (Isolation Test)

kubectl label namespace <namespace> istio.io/dataplane-mode-
kubectl rollout restart deployment <deployment-name> -n <namespace>

If traffic works immediately afterward, the issue is almost certainly mesh‑related.

Re‑enable ambient mode:

kubectl label namespace <namespace> istio.io/dataplane-mode=ambient

Information to Collect Before Escalation

Please collect all of the following before contacting support.

Cluster and Namespace Context

kubectl get nodes
kubectl get namespace <namespace> --show-labels

Istio Component Status

kubectl get pods -n istio-system
kubectl get pods -n istio-system -o wide
kubectl get daemonset ztunnel -n istio-system

When using -o wide, check:

Whether failing pods are all on the same node
Whether healthy and unhealthy pods are split across nodes

If issues appear isolated to a specific node:

kubectl get nodes
kubectl describe node <node-name>

Node-level problems (NotReady, network unavailable, pressure conditions) can cause Istio components to fail on that node only.

info

Follow our Node Troubleshooting Guide if issues are node-specific.

Recent Logs (Last 10–15 Minutes)

kubectl logs -n istio-system deployment/istiod --since=15m
kubectl logs -n istio-system -l app=ztunnel --since=15m

If using an ingress gateway:

kubectl logs -n istio-system deployment/<gateway-deployment> --since=15m

Global (Cluster‑Wide) Istio Troubleshooting​

Signs of a Global Istio Issue​

Global Istio Health Checks​

ztunnel DaemonSet Status​

If istiod is running but ztunnel is not​

Namespace Is in Ambient Mode​

Service‑Specific Istio Troubleshooting​

Common Issues That Look Like Istio Problems (But Aren't)​

Using istioctl for Analysis​

Installing istioctl​

What istioctl analyze Does​

Running istioctl analyze​

Interpreting Results​

Simple, Safe Service‑Level Actions​

Restart the Application Pods​

Restart ztunnel (Only if Multiple Services Are Affected)​

Temporarily Remove a Namespace from Ambient Mode (Isolation Test)​

Information to Collect Before Escalation​

Cluster and Namespace Context​

Istio Component Status​

Recent Logs (Last 10–15 Minutes)​