Pod scheduling issues are one of the most common Kubernetes errors. There are several reasons why a new Pod can get stuck in a Pending state with FailedScheduling as its reason. A Pod that displays this status won’t start any containers so you’ll be unable to use your application.
Pending Pods caused by scheduling problems don’t normally start running without some manual intervention. You’ll need to investigate the root cause and take action to fix your cluster. In this article, you’ll learn how to diagnose and resolve this problem so you can bring your workloads up.
Identifying a FailedScheduling Error
It’s normal for Pods to show a Pending status for a short period after you add them to your cluster. Kubernetes needs to schedule container instances to your Nodes and those Nodes have to pull the image from its registry. The first sign that a Pod’s failed scheduling is when it still shows as Pending after the usual startup period has elapsed. You can check the status by running Kubectl’s get pods command:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
demo-pod 0/1 Pending 0 4m05s
demo-pod is over four minutes old but it’s still in the Pending state. Pods don’t usually take this long to start containers so it’s time to start investigating what Kubernetes is waiting for.
The next diagnosis step is to retrieve the Pod’s event history using the describe pod command:
$ kubectl describe pod demo-pod
…
Events:
Type Reason Age From Message
—- —— —- —- ——-
…
Warning FailedScheduling 4m default-scheduler 0/4 nodes are available: 1 Too many pods, 3 Insufficient cpu.
The event history confirms a FailedScheduling error is the reason for the prolonged Pending state. This event is reported when Kubernetes can’t allocate the required number of Pods to any of the worker nodes in your cluster.
The event’s message reveals why scheduling is currently impossible: there are four nodes in the cluster but none of them can take the Pod. Three of the nodes have insufficient CPU capacity while the other has reached a cap on the number of Pods it can accept.
Understanding FailedScheduling Errors and Similar Problems
Kubernetes can only schedule Pods onto nodes that have spare resources available. Nodes with exhausted CPU or memory capacity can’t take any more Pods. Pods can also fail scheduling if they explicitly request more resources than any node can provide. This maintains your cluster’s stability.
The Kubernetes control plane is aware of the Pods already allocated to the nodes in your cluster. It uses this information to determine the set of nodes that can receive a new Pod. A scheduling error results when there’s no candidates available, leaving the Pod stuck Pending until capacity is freed up.
Kubernetes can fail to schedule Pods for other reasons too. There are several ways in which nodes can be deemed ineligible to host a Pod, despite having adequate system resources:
- The node might have been cordoned by an administrator to stop it receiving new Pods ahead of a maintenance operation.
- The node could be tainted with an effect that prevents Pods from scheduling. Your Pod won’t be accepted by the node unless it has a corresponding toleration.
- Your Pod might be requesting a hostPort which is already bound on the node. Nodes can only provide a particular port number to a single Pod at a time.
- Your Pod could be using a nodeSelector that means it has to be scheduled to a node with a particular label. Nodes that lack the label won’t be eligible.
- Pod and Node affinities and anti-affinities might be unsatisfiable, causing a scheduling conflict that prevents new Pods from being accepted.
- The Pod might have a nodeName field that identifies a specific node to schedule to. The Pod will be stuck pending if that node is offline or unschedulable.
It’s the responsibility of kube-scheduler, the Kubernetes scheduler, to work through these conditions and identify the set of nodes that can take a new Pod. A FailedScheduling event occurs when none of the nodes satisfy the criteria.
Resolving the FailedScheduling State
The message displayed next to FailedScheduling events usually reveals why each node in your cluster was unable to take the Pod. You can use this information to start addressing the problem. In the example shown above, the cluster had four Pods, three where the CPU limit had been reached, and one that had exceeded a Pod count limit.
Cluster capacity is the root cause in this case. You can scale your cluster with new nodes to resolve hardware consumption problems, adding resources that will provide extra flexibility. As this will also raise your costs, it’s worthwhile checking whether you’ve got any redundant Pods in your cluster first. Deleting unused resources will free up capacity for new ones.
You can inspect the available resources on each of your nodes using the describe node command:
$ kubectl describe node demo-node
…
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
——– ——– ——
cpu 812m (90%) 202m (22%)
memory 905Mi (57%) 715Mi (45%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Pods on this node are already requesting 57% of the available memory. If a new Pod requested 1 Gi for itself then the node would be unable to accept the scheduling request. Monitoring this information for each of your nodes can help you assess whether your cluster is becoming over-provisioned. It’s important to have spare capacity available in case one of your nodes becomes unhealthy and its workloads have to be rescheduled to another.
Scheduling failures due to there being no schedulable nodes will show a message similar to the following in the FailedScheduling event:
0/4 nodes are available: 4 node(s) were unschedulable
Nodes that are unschedulable because they’ve been cordoned will include SchedulingDisabled in their status field:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
node-1 Ready,SchedulingDisabled control-plane,master 26m v1.23.3
You can uncordon the node to allow it to receive new Pods:
$ kubectl uncordon node-1
node/node-1 uncordoned
When nodes aren’t cordoned and have sufficient resources, scheduling errors are normally caused by tainting or an incorrect nodeSelector field on your Pod. If you’re using nodeSelector, check you haven’t made a typo and that there are Pods in your cluster that have the labels you’ve specified.
When nodes are tainted, make sure you’ve included the corresponding toleration in your Pod’s manifest. As an example, here’s a node that’s been tainted so Pods don’t schedule unless they have a demo-taint: allow toleration:
$ kubectl taint nodes node-1 demo-taint=allow:NoSchedule
Modify your Pod manifests so they can schedule onto the Node:
spec:
tolerations:
– key: demo-taint
operator: Equal
value: allow
effect: NoSchedule
Resolving the problem that caused the FailedScheduling state will allow Kubernetes to resume scheduling your pending Pods. They’ll start running automatically shortly after the control plane detects the changes to your nodes. You don’t need to manually restart or recreate your Pods, unless the issue’s due to mistakes in your Pod’s manifest such as incorrect affinity or nodeSelector fields.
Summary
FailedScheduling errors occur when Kubernetes can’t place a new Pod onto any node in your cluster. This is often because your existing nodes are running low on hardware resources such as CPU, memory, and disk. When this is the case, you can resolve the problem by scaling your cluster to include additional nodes.
Scheduling failures also arise when Pods specify affinities, anti-affinities, and node selectors that can’t currently be satisfied by the nodes available in your cluster. Cordoned and tainted nodes further reduce the options available to Kubernetes. This kind of issue can be addressed by checking your manifests for typos in labels and removing constraints you no longer need.
Source by www.howtogeek.com