Advanced Scheduling in Kubernetes with Dynamic Node Pools — Part I

Upendra Kumarage
3 min readNov 4, 2020

So, the Dynamic Node Pools now. In essence, a Dynamic Node Pool is a node pool built up with nodes that are spin up and down on-demand from a managed Kubernetes Service. Such dynamic node pools are cost-efficient as they will be scaled up and down on demand thus saving the cost of idle computing resources. Apart from the cost-benefit, these node pools will aid the deployed application to handle an unexpected increase in traffic without having to depend on manual interventions.

Also, there might be cases where Kubernetes service providers having dynamic node pools that are subjected to frequent updates and node recreation/restarts. In such cases, it is difficult to exercise the regular methods of assigning pods to nodes. We will not be able to use custom node labels as they will be removed on a node restart. Furthermore, in the case of scaling, such custom labels might need to created manually or using some other automated mechanism.

As a solution, we can use Kubernetes advance scheduling options such as

  • Node Affinity/anti-affinity
  • Pod Affinity/anti-affinity
  • Built-in Node Labels
  • Custom Schedulers

to extend the pod scheduling capability.

Let’s have a look at Node Affinity, Pod Affinity, and Built-in Node Labels.

First of all, why do we need to assign pods to selected nodes?. There can be many reasons. There can be reasons such as,

  • Application related constraints such as should not run multiple replicas of the same pod in a single node
  • Need to schedule certain pods in a selected set of nodes only. As an example, if we are using Kubernetes clusters through a cloud service, then there might be a requirement to run some set of pods only in nodes that are located in a specific region/zone.
  • Cases such as certain pods only need to be scheduled on specialized hardware that has sufficient/dedicated resources.

Node Affinity/Anti Affinity

Node Affinity is more similar to Nodeselector but with extensive usages. This allows us to schedule a pod to the selected nodes with far more flexibility. One of the most interesting thing for me in Node Affinity/Anti-affinity is the ability to define the soft requirements where the scheduler will still schedule pods even though the preferred requirements are not met.

Let’s have a look at the Node Affinity bit deeply. Node affinity has two types of scheduling mechanisms namely,

requiredDuringSchedulingIgnoredDuringExecution
preferredDuringSchedulingIgnoredDuringExecution

As mentioned earlier, they can be taken as soft and hard respectively. Let’s see a small example of how Node Affinity can be used to schedule a pod only in two availability zones of a cloud-managed Kubernetes service.

apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity-for-two-az
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/e2e-az-name
operator: In
values:
- region-az1
- region-az2
containers:
- name: with-node-affinity
image: nginx

In the above, we can see our normal manifest fields starting from apiVersion. Here the nodAffinity is written with the scheduling option requiredDuringSchedulingIgnoredDuringExecution, which required the pods to be scheduled in nodes which are in region-az1 and region-az2 availability zones. “IgnoredDuringExecution” means that the pod will still run if labels on a node change and affinity rules are no longer met. In simpler words, once the pod is scheduled, the pod will be there for its life even though the affinity rules are no longer met.

Let’s see one more example with the nodeAffinity where the pods are scheduled on nodes that have selected hostname.

apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity-on-selected-hosts
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- hostname1
- hostname2
containers:
- name: with-node-affinity
image: nginx

This is using the scheduling option preferredDuringSchedulingIgnoredDuringExecution, which means, even the given conditions are not met, the pods will be scheduled but more preference is given to schedule on the nodes with given hostnames.

We will discuss more on node Anti-affinity and pod affinity in our next article. You can also refer to the official Kubernetes documentation for further information and there are lots of resources that can be found on advance scheduling as well.

References

[1] https://kubernetes.io/blog/2017/03/advanced-scheduling-in-kubernetes/

[2] https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/

[3] https://thenewstack.io/implement-node-and-pod-affinity-anti-affinity-in-kubernetes-a-practical-example/

--

--

Upendra Kumarage

Cloud & DevOps enthusiast, Cloud Operations professional