Scheduling pods on nodes in Kubernetes using labels

This post assumes that you have basic understanding of Kubernetes terms like pods, deployments and nodes.

A Kubernetes cluster can have many nodes. Each node in turn can run multiple pods. By default Kubernetes manages which pod will run on which node and this is something we do not need to worry about it.

However sometimes we want to ensure that certain pods do not run on the same node. For example we have an application called wheel. We have both staging and production version of this app and we want to ensure that production pod and staging pod are not on the same host.

To ensure that certain pods do not run on the same host we can use nodeSelector constraint in PodSpec to schedule pods on nodes.

Kubernetes cluster

We will use kops to provision cluster. We can check the health of cluster using kops validate-cluster.

1$ kops validate cluster
2Using cluster from kubectl context: test-k8s.nodes-staging.com
3
4Validating cluster test-k8s.nodes-staging.com
5
6INSTANCE GROUPS
7NAME              ROLE   MACHINETYPE MIN MAX SUBNETS
8master-us-east-1a Master m4.large    1   1 us-east-1a
9master-us-east-1b Master m4.large    1   1 us-east-1b
10master-us-east-1c Master m4.large    1   1 us-east-1c
11nodes-wheel-stg   Node   m4.large    2   5 us-east-1a,us-east-1b
12nodes-wheel-prd   Node   m4.large    2   5 us-east-1a,us-east-1b
13
14NODE STATUS
15           NAME                ROLE   READY
16ip-192-10-110-59.ec2.internal  master True
17ip-192-10-120-103.ec2.internal node   True
18ip-192-10-42-9.ec2.internal    master True
19ip-192-10-73-191.ec2.internal  master True
20ip-192-10-82-66.ec2.internal   node   True
21ip-192-10-72-68.ec2.internal   node   True
22ip-192-10-182-70.ec2.internal  node   True
23
24Your cluster test-k8s.nodes-staging.com is ready

Here we can see that there are two instance groups for nodes: nodes-wheel-stg and nodes-wheel-prd.

nodes-wheel-stg might have application pods like pod-wheel-stg-sidekiq, pod-wheel-stg-unicorn and pod-wheel-stg-redis. Smilarly nodes-wheel-prd might have application pods like pod-wheel-prd-sidekiq, pod-wheel-prd-unicorn and pod-wheel-prd-redis.

As we can see the Max number of nodes for instance group nodes-wheel-stg and nodes-wheel-prd is 5. It means if new nodes are created in future then based on the instance group the newly created nodes will automatically be labelled and no manual work is required.

Labelling a Node

We will use kubernetes labels to label a node. To add a label we need to edit instance group using kops.

1$ kops edit ig nodes-wheel-stg

This will open up instance group configuration file, we will add following label in instance group spec.

1nodeLabels:
2   type: wheel-stg

Complete ig configuration looks like this.

1apiVersion: kops/v1alpha2
2kind: InstanceGroup
3metadata:
4  creationTimestamp: 2017-10-12T06:24:53Z
5  labels:
6    kops.k8s.io/cluster: k8s.nodes-staging.com
7  name: nodes-wheel-stg
8spec:
9  image: kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28
10  machineType: m4.large
11  maxSize: 5
12  minSize: 2
13  nodeLabels:
14    type: wheel-stg
15  role: Node
16  subnets:
17  - us-east-1a
18  - us-east-1b
19  - us-east-1c

Similarly, we can label for instance group nodes-wheel-prod with label type wheel-prod.

After making the changes update cluster using kops rolling update cluster --yes --force. This will update the cluster with specified labels.

New nodes added in future will have labels based on respective instance groups.

Once nodes are labeled we can verify using kubectl describe node.

1$ kubectl describe node ip-192-10-82-66.ec2.internal
2Name:               ip-192-10-82-66.ec2.internal
3Roles:              node
4Labels:             beta.kubernetes.io/arch=amd64
5                    beta.kubernetes.io/instance-type=m4.large
6                    beta.kubernetes.io/os=linux
7                    failure-domain.beta.kubernetes.io/region=us-east-1
8                    failure-domain.beta.kubernetes.io/zone=us-east-1a
9                    kubernetes.io/hostname=ip-192-10-82-66.ec2.internal
10                    kubernetes.io/role=node
11                    type=wheel-stg

In this way we have our node labeled using kops.

Labelling nodes using kubectl

We can also label node using kubectl.

1$ kubectl label node ip-192-20-44-136.ec2.internal type=wheel-stg

After labeling a node, we will add nodeSelector field to our PodSpec in deployment template.

We will add the following block in deployment manifest.

1nodeSelector:
2  type: wheel-stg

We can add this configuration in original deployment manifest.

1apiVersion: v1
2kind: Deployment
3metadata:
4  name: test-staging-node
5  labels:
6    app: test-staging
7  namespace: test
8spec:
9  replicas: 1
10  template:
11    metadata:
12      labels:
13        app: test-staging
14    spec:
15      containers:
16      - image: <your-repo>/<your-image-name>:latest
17        name: test-staging
18        imagePullPolicy: Always
19        - name: REDIS_HOST
20          value: test-staging-redis
21        - name: APP_ENV
22          value: staging
23        - name: CLIENT
24          value: test
25        ports:
26        - containerPort: 80
27      nodeSelector:
28        type: wheel-stg
29      imagePullSecrets:
30        - name: registrykey

Let's launch this deployment and check where the pod is scheduled.

1$ kubectl apply -f test-deployment.yml
2deployment "test-staging-node" created

We can verify that our pod is running on node type=wheel-stg.

1kubectl describe pod test-staging-2751555626-9sd4m
2Name:           test-staging-2751555626-9sd4m
3Namespace:      default
4Node:           ip-192-10-82-66.ec2.internal/192.10.82.66
5...
6...
7Conditions:
8  Type           Status
9  Initialized    True
10  Ready          True
11  PodScheduled   True
12QoS Class:       Burstable
13Node-Selectors:  type=wheel-stg
14Tolerations:     node.alpha.kubernetes.io/notReady:NoExecute for 300s
15                 node.alpha.kubernetes.io/unreachable:NoExecute for 300s
16Events:          <none>

Similarly we run nodes-wheel-prod pods on nodes labeled with type: wheel-prod.

Please note that when we specify nodeSelector and no node matches label then pods are in pending state as they don't find node with matching label.

In this way we schedule our pods to run on specific nodes for certain use-cases.

If you liked this blog, you might also like the other blogs we have written. Check out the full archive.

Setting up Prometheus and Grafana on Kubernetes using Helm

Vishal Yadav

January 25, 2024

How we added sleep when idle feature to neetoDeploy and reduced cost

Sreeram Venkitesh

January 19, 2024

Building the metrics dashboard in neetoDeploy with Prometheus