Istio Diagnostics
Global (Cluster‑Wide) Istio Troubleshooting
This section is for determining whether there is a cluster‑wide Istio problem.
Most issues are not caused by Istio globally. You should only follow this section if multiple services or namespaces are affected.
Signs of a Global Istio Issue
Investigate Istio system health if one or more of the following are true:
- Multiple unrelated services are failing at the same time
- Traffic failures affect many namespaces or the entire cluster
- External traffic through Gateways fails for all backends
- New pods cannot communicate with any other services
- Recent changes were made to:
- Istio version or configuration
- Cluster networking (CNI, nodes, kernel updates)
If the problem affects only one service, namespace, or route, skip this section and go to Service‑Specific Istio Troubleshooting.
Global Istio Health Checks
Start here when suspecting a global issue. Always check these in order.
ztunnel DaemonSet Status
Ambient mode depends on ztunnel running on every node.
kubectl get daemonset -n istio-system
Confirm:
ztunnelexists- Desired = Current = Ready = number of nodes
If istiod is running but ztunnel is not
If istiod is healthy but ztunnel pods are failing to start or are crash looping, this usually indicates a lower-level networking issue, not an Istio configuration problem.
Common causes include:
- CNI issues (e.g. Cilium or another CNI not functioning correctly)
- Node-level networking problems
- Recent node or kernel changes
In this case:
kubectl logs -n istio-system -l app=ztunnel
ztunnel logs will often clearly indicate a networking or CNI failure.
Follow our Networking Troubleshooting Guide before escalating Istio issues.
Namespace Is in Ambient Mode
kubectl get namespace <namespace> --show-labels
Look for:
istio.io/dataplane-mode=ambient
If this label is missing, traffic is not flowing through the ambient mesh.
Service‑Specific Istio Troubleshooting
Use this section only after confirming that Istio is healthy at a global level or if only one service seems to be affected.
If istio-system is healthy, issues are usually caused by service‑specific Istio configuration, not Istio itself.
Examples include misconfigured Gateways, HTTPRoutes, AuthorizationPolicies, or missing waypoints.
Common Issues That Look Like Istio Problems (But Aren't)
Before troubleshooting Istio configuration for a single service, make sure the service itself is healthy. Istio cannot route traffic to workloads that Kubernetes considers unavailable.
-
The Service has no healthy endpoints
- If a Service has zero endpoints, Istio will return errors such as
503orno healthy upstream. - This is Istio reporting the problem correctly, not causing it.
- If a Service has zero endpoints, Istio will return errors such as
-
Pods behind the Service are not Running and Ready
- CrashLooping, unready, or non-listening pods will cause Istio-looking errors even when the mesh is functioning correctly.
As a first step, always verify:
kubectl get pods -n <namespace>
kubectl get svc <service-name> -n <namespace>
kubectl get endpoints <service-name> -n <namespace>
If these checks fail, resolve the service health issue before continuing with Istio-specific troubleshooting.
Using istioctl for Analysis
istioctl is the Istio command‑line tool. It can inspect, validate, and diagnose Istio configuration, but it can also modify cluster state if used incorrectly.
istioctlhas powerful commands- Some commands can change Istio configuration or restart components
- Only use the commands described below unless explicitly instructed by support
Installing istioctl
Install the version of istioctl that matches the Istio version running in your cluster.
If you are unsure which version to use, contact support before proceeding.
After installation, confirm it works:
istioctl version
What istioctl analyze Does
istioctl analyze is a safe, read‑only command that checks Istio resources for common problems.
It can detect issues such as:
- Invalid or conflicting Istio configuration
- Gateway or HTTPRoute resources that will never match traffic
- Policies that reference missing services or namespaces
- Configuration that requires a waypoint proxy that does not exist
It does not:
- Modify resources
- Restart pods
- Change traffic behavior
Running istioctl analyze
Analyze the entire cluster:
istioctl analyze
Analyze a specific namespace:
istioctl analyze -n <namespace>
Include verbose output if requested:
istioctl analyze --verbose
Interpreting Results
istioctl analyze reports findings as:
- Errors – Configuration that will not work
- Warnings – Configuration that is suspicious or incomplete
- Info – Observations that may or may not be relevant
When opening a support request, include:
- The full output of
istioctl analyze - The namespace(s) involved
Do not attempt to fix issues by trial‑and‑error changes unless you understand the impact.
Simple, Safe Service‑Level Actions
These actions are low‑risk and often resolve transient issues.
Restart the Application Pods
kubectl rollout restart deployment <deployment-name> -n <namespace>
This refreshes workload identity and ambient tunnel connections.
Restart ztunnel (Only if Multiple Services Are Affected)
kubectl rollout restart daemonset ztunnel -n istio-system
Temporarily Remove a Namespace from Ambient Mode (Isolation Test)
kubectl label namespace <namespace> istio.io/dataplane-mode-
kubectl rollout restart deployment <deployment-name> -n <namespace>
If traffic works immediately afterward, the issue is almost certainly mesh‑related.
Re‑enable ambient mode:
kubectl label namespace <namespace> istio.io/dataplane-mode=ambient
Information to Collect Before Escalation
Please collect all of the following before contacting support.
Cluster and Namespace Context
kubectl get nodes
kubectl get namespace <namespace> --show-labels
Istio Component Status
kubectl get pods -n istio-system
kubectl get pods -n istio-system -o wide
kubectl get daemonset ztunnel -n istio-system
When using -o wide, check:
- Whether failing pods are all on the same node
- Whether healthy and unhealthy pods are split across nodes
If issues appear isolated to a specific node:
kubectl get nodes
kubectl describe node <node-name>
Node-level problems (NotReady, network unavailable, pressure conditions) can cause Istio components to fail on that node only.
Follow our Node Troubleshooting Guide if issues are node-specific.
Recent Logs (Last 10–15 Minutes)
kubectl logs -n istio-system deployment/istiod --since=15m
kubectl logs -n istio-system -l app=ztunnel --since=15m
If using an ingress gateway:
kubectl logs -n istio-system deployment/<gateway-deployment> --since=15m