Karpenter Concepts

How Karpenter Differs from Traditional Autoscaling

This differs from traditional node groups or cluster autoscalers:

Traditional autoscaling grows and shrinks predefined sets of nodes
Karpenter creates individual nodes on demand, based on real workload requirements

Karpenter focuses on capacity. It ensures that the cluster has nodes capable of running your workloads, but it does not control application behavior or correctness.

How Karpenter Handles Node Deletions

Karpenter actively manages node removal, not just node creation.

Unlike traditional autoscalers that scale down entire node groups on a timer, Karpenter evaluates individual nodes and removes them when they are no longer needed.

At a high level, Karpenter will:

Identify nodes that are empty or underutilized
Safely evict workloads from those nodes
Terminate the nodes and release cloud capacity

This behavior has several important benefits:

Faster scale-down: unused nodes are removed quickly instead of waiting for group-level cooldowns
Lower cost: capacity closely tracks actual workload demand
Better handling of churn: nodes are treated as disposable, not long-lived pets

From an application perspective, this means:

Nodes may disappear even when the cluster is healthy
Workloads must tolerate pod restarts and rescheduling
Node deletion is a normal part of steady-state operation

This design works especially well with stateless and horizontally scalable workloads, and it is a key reason Karpenter pairs well with spot capacity.

High-Level Architecture

How Karpenter Provisions Capacity

When a pod cannot be placed by the Kubernetes scheduler, it enters a Pending state. Karpenter continuously watches for these unschedulable pods.

At a high level, Karpenter evaluates:

Pod resource requests (CPU, memory, GPUs)
Scheduling constraints (node selectors, affinities, taints, tolerations)
NodePool or Provisioner configuration
Available cloud provider capacity

If Karpenter determines that a suitable node can be created, it requests that node from the cloud provider.

What You Will See When Karpenter Is Working Normally

When Karpenter provisions a node, you may observe the following behavior:

Pods remain in a Pending state
A node is selected or implied for those pods
The node may not yet exist or may not be Ready
Pods appear to be "waiting for" a node

This is expected. Karpenter has decided what node is required and requested it, but the instance may still be booting and joining the cluster.

Once the node becomes ready, the pods will start running.

What Karpenter Controls vs What It Does Not

Karpenter does control:

When nodes are created
What types of nodes are created
When empty or underutilized nodes are removed

Karpenter does not control:

Application health or readiness
Pod admission or policy enforcement
Cloud provider capacity availability
How quickly instances boot or join the cluster

Global (Cluster-Wide) Karpenter Troubleshooting

This section helps determine whether Karpenter itself is unhealthy.

When to Suspect a Cluster-Wide Karpenter Issue

Most scheduling issues are not caused by Karpenter itself. You should suspect a cluster-wide Karpenter issue only when multiple workloads are affected.

Signs that may indicate a Karpenter issue include:

Many pods across namespaces remain pending
No new nodes are being created despite sustained pending pods
Karpenter was recently upgraded or reconfigured
Cloud provider changes were made (permissions, quotas, regions)

It is unlikely to be a Karpenter issue if:

Only a single workload is affected
Pods are blocked by admission or policy
Pods are pending due to missing tolerations or selectors

Check the Karpenter Controller

kubectl get pods -n karpenter

The Karpenter controller should be Running without frequent restarts.

Dependency Chain

Karpenter depends on several systems working together:

Karpenter controller
  ↓
Cloud provider APIs
  ↓
Instance launch
  ↓
Node joins cluster

If any step in this chain fails, nodes will not become available.

Check Karpenter Logs

kubectl logs -n karpenter deployment/karpenter

Look for errors related to:

Permissions or credentials
Instance type availability
Quotas or limits

Karpenter logs are often explicit about why capacity cannot be created.

Service- / Workload-Specific Karpenter Troubleshooting

Use this section if only a single workload or namespace is affected.

Common Issues That Look Like Karpenter Problems (But Aren't)

Before assuming Karpenter is broken, check the following:

Pods are blocked by admission controllers or policies
Pods request resources that cannot be satisfied
Scheduling constraints prevent placement on any node
Nodes are created but never become Ready
- Nodes exist but remain NotReady
- Pods appear bound or waiting but never start
- This is often caused by unrelated DaemonSets failing, most commonly:
  - CNI / networking plugins
  - Node-level networking configuration

In this case, Karpenter has successfully created capacity, but the node cannot join the cluster correctly.

These issues will cause pods to remain pending even if Karpenter is healthy.

Inspect the Pod with `kubectl describe pod`

kubectl describe pod is one of the most useful troubleshooting tools when investigating possible Karpenter issues.

kubectl describe pod <pod-name> -n <namespace>

This command frequently surfaces explicit Karpenter-related messages when Karpenter is the limiting factor.

Look for events such as:

Messages indicating that Karpenter is evaluating or provisioning capacity
Errors indicating that no compatible instance types are available
Messages referencing NodePools, Provisioners, or capacity constraints

If Karpenter is the problem, it is often visible here.

Check NodeClaims (Is Karpenter Creating Capacity?)

Karpenter represents its provisioning decisions using NodeClaims. Checking NodeClaims helps determine whether Karpenter is actively attempting to create capacity.

kubectl get nodeclaims

What to look for:

No NodeClaims
- Karpenter is not attempting to create nodes
- This usually means the pod is unschedulable for reasons outside Karpenter (constraints, policies, or configuration)
NodeClaims exist, but nodes are not Ready
- Karpenter has requested capacity
- The issue is likely with node startup, cloud capacity, or node-level components (e.g., CNI / networking)
NodeClaims repeatedly created and deleted
- Karpenter is unable to successfully launch usable nodes
- This often points to cloud provider limits, permissions, or incompatible configuration

NodeClaims provide a clear signal of where Karpenter is in the provisioning process.

Confirm the Workload Can Be Scheduled

After reviewing the pod events, confirm that the workload itself is schedulable:

Check for:

Resource requests that exceed available instance types
Node selectors or affinities that match no nodes
Missing tolerations for required taints

If the pod cannot be scheduled by Kubernetes, Karpenter cannot help.

How Karpenter Differs from Traditional Autoscaling​

How Karpenter Handles Node Deletions​

High-Level Architecture​

How Karpenter Provisions Capacity​

What You Will See When Karpenter Is Working Normally​

What Karpenter Controls vs What It Does Not​

Global (Cluster-Wide) Karpenter Troubleshooting​

When to Suspect a Cluster-Wide Karpenter Issue​

Check the Karpenter Controller​

Dependency Chain​

Check Karpenter Logs​

Service- / Workload-Specific Karpenter Troubleshooting​

Common Issues That Look Like Karpenter Problems (But Aren't)​

Inspect the Pod with kubectl describe pod​

Check NodeClaims (Is Karpenter Creating Capacity?)​

Confirm the Workload Can Be Scheduled​