Karpenter Concepts
How Karpenter Differs from Traditional Autoscaling
This differs from traditional node groups or cluster autoscalers:
- Traditional autoscaling grows and shrinks predefined sets of nodes
- Karpenter creates individual nodes on demand, based on real workload requirements
Karpenter focuses on capacity. It ensures that the cluster has nodes capable of running your workloads, but it does not control application behavior or correctness.
How Karpenter Handles Node Deletions
Karpenter actively manages node removal, not just node creation.
Unlike traditional autoscalers that scale down entire node groups on a timer, Karpenter evaluates individual nodes and removes them when they are no longer needed.
At a high level, Karpenter will:
- Identify nodes that are empty or underutilized
- Safely evict workloads from those nodes
- Terminate the nodes and release cloud capacity
This behavior has several important benefits:
- Faster scale-down: unused nodes are removed quickly instead of waiting for group-level cooldowns
- Lower cost: capacity closely tracks actual workload demand
- Better handling of churn: nodes are treated as disposable, not long-lived pets
From an application perspective, this means:
- Nodes may disappear even when the cluster is healthy
- Workloads must tolerate pod restarts and rescheduling
- Node deletion is a normal part of steady-state operation
This design works especially well with stateless and horizontally scalable workloads, and it is a key reason Karpenter pairs well with spot capacity.
High-Level Architecture
How Karpenter Provisions Capacity
When a pod cannot be placed by the Kubernetes scheduler, it enters a Pending state. Karpenter continuously watches for these unschedulable pods.
At a high level, Karpenter evaluates:
- Pod resource requests (CPU, memory, GPUs)
- Scheduling constraints (node selectors, affinities, taints, tolerations)
- NodePool or Provisioner configuration
- Available cloud provider capacity
If Karpenter determines that a suitable node can be created, it requests that node from the cloud provider.
What You Will See When Karpenter Is Working Normally
When Karpenter provisions a node, you may observe the following behavior:
- Pods remain in a
Pendingstate - A node is selected or implied for those pods
- The node may not yet exist or may not be
Ready - Pods appear to be "waiting for" a node
This is expected. Karpenter has decided what node is required and requested it, but the instance may still be booting and joining the cluster.
Once the node becomes ready, the pods will start running.
What Karpenter Controls vs What It Does Not
Karpenter does control:
- When nodes are created
- What types of nodes are created
- When empty or underutilized nodes are removed
Karpenter does not control:
- Application health or readiness
- Pod admission or policy enforcement
- Cloud provider capacity availability
- How quickly instances boot or join the cluster
Global (Cluster-Wide) Karpenter Troubleshooting
This section helps determine whether Karpenter itself is unhealthy.
When to Suspect a Cluster-Wide Karpenter Issue
Most scheduling issues are not caused by Karpenter itself. You should suspect a cluster-wide Karpenter issue only when multiple workloads are affected.
Signs that may indicate a Karpenter issue include:
- Many pods across namespaces remain pending
- No new nodes are being created despite sustained pending pods
- Karpenter was recently upgraded or reconfigured
- Cloud provider changes were made (permissions, quotas, regions)
It is unlikely to be a Karpenter issue if:
- Only a single workload is affected
- Pods are blocked by admission or policy
- Pods are pending due to missing tolerations or selectors
Check the Karpenter Controller
kubectl get pods -n karpenter
The Karpenter controller should be Running without frequent restarts.
Dependency Chain
Karpenter depends on several systems working together:
Karpenter controller
↓
Cloud provider APIs
↓
Instance launch
↓
Node joins cluster
If any step in this chain fails, nodes will not become available.
Check Karpenter Logs
kubectl logs -n karpenter deployment/karpenter
Look for errors related to:
- Permissions or credentials
- Instance type availability
- Quotas or limits
Karpenter logs are often explicit about why capacity cannot be created.
Service- / Workload-Specific Karpenter Troubleshooting
Use this section if only a single workload or namespace is affected.
Common Issues That Look Like Karpenter Problems (But Aren't)
Before assuming Karpenter is broken, check the following:
- Pods are blocked by admission controllers or policies
- Pods request resources that cannot be satisfied
- Scheduling constraints prevent placement on any node
- Nodes are created but never become Ready
- Nodes exist but remain
NotReady - Pods appear bound or waiting but never start
- This is often caused by unrelated DaemonSets failing, most commonly:
- CNI / networking plugins
- Node-level networking configuration
- Nodes exist but remain
In this case, Karpenter has successfully created capacity, but the node cannot join the cluster correctly.
These issues will cause pods to remain pending even if Karpenter is healthy.
Inspect the Pod with kubectl describe pod
kubectl describe pod is one of the most useful troubleshooting tools when investigating possible Karpenter issues.
kubectl describe pod <pod-name> -n <namespace>
This command frequently surfaces explicit Karpenter-related messages when Karpenter is the limiting factor.
Look for events such as:
- Messages indicating that Karpenter is evaluating or provisioning capacity
- Errors indicating that no compatible instance types are available
- Messages referencing NodePools, Provisioners, or capacity constraints
If Karpenter is the problem, it is often visible here.
Check NodeClaims (Is Karpenter Creating Capacity?)
Karpenter represents its provisioning decisions using NodeClaims. Checking NodeClaims helps determine whether Karpenter is actively attempting to create capacity.
kubectl get nodeclaims
What to look for:
-
No NodeClaims
- Karpenter is not attempting to create nodes
- This usually means the pod is unschedulable for reasons outside Karpenter (constraints, policies, or configuration)
-
NodeClaims exist, but nodes are not Ready
- Karpenter has requested capacity
- The issue is likely with node startup, cloud capacity, or node-level components (e.g., CNI / networking)
-
NodeClaims repeatedly created and deleted
- Karpenter is unable to successfully launch usable nodes
- This often points to cloud provider limits, permissions, or incompatible configuration
NodeClaims provide a clear signal of where Karpenter is in the provisioning process.
Confirm the Workload Can Be Scheduled
After reviewing the pod events, confirm that the workload itself is schedulable:
Check for:
- Resource requests that exceed available instance types
- Node selectors or affinities that match no nodes
- Missing tolerations for required taints
If the pod cannot be scheduled by Kubernetes, Karpenter cannot help.