Ante Miličević
January 31, 2024

Kubernetes Cluster Autoscaler

In this article, we will look into Kubernetes Cluster Autoscaler and how we can use it to scale up and down the worker nodes.

Kubernetes offers Kubernetes Cluster Autoscaler as an automation tool for hardware capacity management. It maintains the Kubernetes cluster and is supported by management cloud providers. Today, we’ll talk about the Autosclaer and will also showcase a real use case to help you understand the Kubernetes Cluster Autoscaler. 

Introduction to Cluster Autoscaling

When we keep on adding new pods at some point cluster worker nodes take up all the resources and no worker nodes are available to schedule the pods. In such scenarios, the pods have to wait for the CPU and the memory and go to a pending state. To resolve this issue the Kubernetes admin can manually add more worker nodes. This will enable the scheduling of newly added pods. The manual approach is not a scalable solution and takes more time.

In such cases, Kubernetes Autoscaler helps to automatically scale the pods and manage the clusters. This way the worker nodes are added and removed from the cluster automatically.

This is a cloud-based solution and is not supported by on-prem self-hosted k8 environments. In on-prem deployments, it’s not possible to create and delete virtual machines automatically using APIs.

You can install the Cluster Autoscaler in your cloud environment manually, most cloud-based installations have this feature.

Pre-requisites of Cluster AutoScaling and Supported Platforms

All the major Kubernetes platforms like Azure Kubernetes, Google Kubernetes engine, or Elastic kubernetes services provide cluster autoscaler capabilities. However, each of these platforms has its own limitations.

The cluster autoscaler can be enabled through GUI or command line methods for instance in GKE you can use the following command to enable a cluster autoscaler:

<pre class="codeWrap"><code>gcloud container clusters create example-cluster
   --num-nodes 2
   --zone us-central1-a
   --node-locations us-central1-a,us-central1-b,us-central1-f
   --enable-autoscaling --min-nodes 1 --max-nodes 4
</code></pre>

The above command will create a cluster autoscaler on multi-zone clusters with a minimum of 1 node or a maximum of 4 nodes per zone.

Cluster Autoscaling: Hands-on example

To utilize the resources efficiently Kubernetes automatically assign the pods to worker nodes. To successfully execute this resource requests and limits need to be defined on pods. This is important for cluster autoscaler to correctly allocate the resources. 

Cluster autoscaler considers two factors: node utilization or pod scheduling while making the decision. When a pending pod on the cluster is detected, the cluster autoschedule adds more nodes. Similarly, when node utilization decreases it removes nodes from the cluster.

Let’s understand this with an example of application deployment on GCP using the Kubernetes cluster autoscaler feature on GKE. We will test the cluster autoscaler by adding pods to our cluster. This way we can observe how autoscaler detects the pending pods and make the decision to add more nodes. Then we will remove the pods to check how the autoscaler behaves in this scenario.

Create a cluster with 3 worker nodes.

<pre class="codeWrap"><code>student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ gcloud container clusters create scaling-demo --num-nodes=3
  NAME          LOCATION       MASTER_VERSION  MASTER_IP       MACHINE_TYPE  NODE_VERSION    NUM_NODES  STATUS
 scaling-demo  us-central1-a  1.20.8-gke.900  35.225.137.158  e2-medium     1.20.8-gke.900  3          RUNNING</code></pre>

Use the following command to enable autoscaling on the cluster.

<pre class="codeWrap"><code>student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ gcloud beta container clusters update scaling-demo --enable-autoscaling --min-nodes 1 --max-nodes 5
 Updating scaling-demo...done.Updated [https://container.googleapis.com/v1beta1/projects/qwiklabs-gcp-03-a94f05d7b8a0/zones/us-central1-a/clusters/scaling-demo].</code></pre>

Now output the nodes created.

<pre class="codeWrap"><code>student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl get nodes
 NAME                                          STATUS   ROLES    AGE   VERSION
 gke-scaling-demo-default-pool-b182e404-5l2v   Ready    <none>   6m    v1.20.8-gke.900
 gke-scaling-demo-default-pool-b182e404-87gq   Ready    <none>   6m    v1.20.8-gke.900
 gke-scaling-demo-default-pool-b182e404-kwfc   Ready    <none>   6m    v1.20.8-gke.900
</code></pre>

Create an application with predefined resource requests and limitations. This way Kubernetes scheduler can automatically detect the required limits and can add more nodes to the cluster.

<pre class="codeWrap"><code>student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ cat deployment.yaml
 apiVersion: v1
 kind: Service
 metadata:
   name: application-cpu
   labels:
     app: application-cpu
 spec:
   type: ClusterIP
   selector:
     app: application-cpu
   ports:
     - protocol: TCP
       name: http
       port: 80
       targetPort: 80
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
   name: application-cpu
   labels:
     app: application-cpu
 spec:
   selector:
     matchLabels:
       app: application-cpu
   replicas: 1
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
       maxUnavailable: 0
   template:
     metadata:
       labels:
         app: application-cpu
     spec:
       containers:
       - name: application-cpu
         image: aimvector/application-cpu:v1.0.2
         imagePullPolicy: Always
         ports:
         - containerPort: 80
         resources:
           requests:
             memory: "50Mi"
             cpu: "500m"
           limits:
             memory: "500Mi"
             cpu: "2000m"
 student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl create -f deployment.yaml
 service/application-cpu created
 deployment.apps/application-cpu created
</code></pre>

The application deployment is running with one pod up.

<pre class="codeWrap"><code>student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl get pods
 NAME                               READY   STATUS    RESTARTS   AGE
 application-cpu-7879778795-8t8bn   1/1     Running   0          9s</code></pre>

Set the number of replicas to two and add one more replica pod.

<pre class="codeWrap"><code>student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl scale deploy/application-cpu --replicas 2 deployment.apps/application-cpu scaled
 student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl get pods
 NAME                               READY   STATUS    RESTARTS   AGE
 application-cpu-7879778795-8t8bn   1/1     Running   0          2m29s
 application-cpu-7879778795-rzxc7   0/1     Pending   0          5s
</code></pre>

Here you can see that the new pod is in a pending state. The cluster autoscaler detects this event and will start creating and adding new nodes to the cluster.

<pre class="codeWrap"><code>student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl get events
 LAST SEEN   TYPE      REASON                    OBJECT                                             MESSAGE
 3m24s       Normal    Scheduled                 pod/application-cpu-7879778795-8t8bn               Successfully assigned default/application-cpu-7879778795-8t8bn to gke-scaling-demo-default-pool-b182e404-87gq
 3m23s       Normal    Pulling                   pod/application-cpu-7879778795-8t8bn               Pulling image "aimvector/application-cpu:v1.0.2"
 3m20s       Normal    Pulled                    pod/application-cpu-7879778795-8t8bn               Successfully pulled image "aimvector/application-cpu:v1.0.2" in 3.035424763s  
 3m20s       Normal    Created                   pod/application-cpu-7879778795-8t8bn               Created container application-cpu
 3m20s       Normal    Started                   pod/application-cpu-7879778795-8t8bn               Started container application-cpu
 60s         Warning   FailedScheduling          pod/application-cpu-7879778795-rzxc7               0/3 nodes are available: 3 Insufficient cpu.
 56s         Normal    TriggeredScaleUp          pod/application-cpu-7879778795-rzxc7               pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/qwiklabs-gcp-03-a94f05d7b8a0/zones/us-central1-a/instanceGroups/gke-scaling-demo-default-pool-b182e404-grp 3->4 (max: 5)}]
 2s          Warning   FailedScheduling          pod/application-cpu-7879778795-rzxc7               0/4 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 3 Insufficient cpu.
 3m24s       Normal    SuccessfulCreate          replicaset/application-cpu-7879778795              Created pod: application-cpu-7879778795-8t8bn
 60s         Normal    SuccessfulCreate          replicaset/application-cpu-7879778795              Created pod: application-cpu-7879778795-rzxc7
 3m24s       Normal    ScalingReplicaSet         deployment/application-cpu                         Scaled up replica set application-cpu-7879778795 to 1
 60s         Normal    ScalingReplicaSet         deployment/application-cpu                         Scaled up replica set application-cpu-7879778795 to 2
 5m50s       Normal    RegisteredNode            node/gke-scaling-demo-default-pool-b182e404-5l2v   Node gke-scaling-demo-default-pool-b182e404-5l2v event: Registered Node gke-scaling-demo-default-pool-b182e404-5l2v in Controller
 5m50s       Normal    RegisteredNode            node/gke-scaling-demo-default-pool-b182e404-87gq   Node gke-scaling-demo-default-pool-b182e404-87gq event: Registered Node gke-scaling-demo-default-pool-b182e404-87gq in Controller
 13s         Normal    Starting                  node/gke-scaling-demo-default-pool-b182e404-ccft   Starting kubelet.
 13s         Warning   InvalidDiskCapacity       node/gke-scaling-demo-default-pool-b182e404-ccft   invalid capacity 0 on image filesystem
 12s         Normal    NodeHasSufficientMemory   node/gke-scaling-demo-default-pool-b182e404-ccft   Node gke-scaling-demo-default-pool-b182e404-ccft status is now: NodeHasSufficientMemory
 12s         Normal    NodeHasNoDiskPressure     node/gke-scaling-demo-default-pool-b182e404-ccft   Node gke-scaling-demo-default-pool-b182e404-ccft status is now: NodeHasNoDiskPressure
 12s         Normal    NodeHasSufficientPID      node/gke-scaling-demo-default-pool-b182e404-ccft   Node gke-scaling-demo-default-pool-b182e404-ccft status is now: NodeHasSufficientPID
 12s         Normal    NodeAllocatableEnforced   node/gke-scaling-demo-default-pool-b182e404-ccft   Updated Node Allocatable limit across pods
 10s         Normal    RegisteredNode            node/gke-scaling-demo-default-pool-b182e404-ccft   Node gke-scaling-demo-default-pool-b182e404-ccft event: Registered Node gke-scaling-demo-default-pool-b182e404-ccft in Controller
 9s          Normal    Starting                  node/gke-scaling-demo-default-pool-b182e404-ccft   Starting kube-proxy.
 6s          Warning   ContainerdStart           node/gke-scaling-demo-default-pool-b182e404-ccft   Starting containerd container runtime...
 6s          Warning   DockerStart               node/gke-scaling-demo-default-pool-b182e404-ccft   Starting Docker Application Container Engine...
 6s          Warning   KubeletStart              node/gke-scaling-demo-default-pool-b182e404-ccft   Started Kubernetes kubelet.
 6s          Warning   NodeSysctlChange          node/gke-scaling-demo-default-pool-b182e404-ccft   {"unmanaged": {"net.netfilter.nf_conntrack_buckets": "32768"}}
 1s          Normal    NodeReady                 node/gke-scaling-demo-default-pool-b182e404-ccft   Node gke-scaling-demo-default-pool-b182e404-ccft status is now: NodeReady</code></pre>

Check the status of the worker nodes.

<pre class="codeWrap"><code>student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl get nodes
 NAME                                          STATUS   ROLES    AGE   VERSION
  gke-scaling-demo-default-pool-b182e404-5l2v   Ready    <none>   13m   v1.20.8-gke.900
 gke-scaling-demo-default-pool-b182e404-87gq   Ready    <none>   13m   v1.20.8-gke.900
 gke-scaling-demo-default-pool-b182e404-ccft   Ready    <none>   83s   v1.20.8-gke.900
 gke-scaling-demo-default-pool-b182e404-kwfc   Ready    <none>   13m   v1.20.8-gke.900
</code></pre>


The new worker nodes are added and the pod is now up and running.

<pre class="codeWrap"><code>student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl get pods
 NAME                               READY   STATUS    RESTARTS   AGE
 application-cpu-7879778795-8t8bn   1/1     Running   0          4m45s
 application-cpu-7879778795-rzxc7   1/1     Running   0          2m21s</code></pre>

Add one more pod. 

<pre class="codeWrap"><code>student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl scale deploy/application-cpu --replicas 3  
deployment.apps/application-cpu scaled</code></pre>

The newly added pod goes to a pending state.

<pre class="codeWrap"><code>student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl get pods
 NAME                               READY   STATUS    RESTARTS   AGE
 application-cpu-7879778795-56l6d   0/1     Pending   0          16s
 application-cpu-7879778795-8t8bn   1/1     Running   0          5m22s
 application-cpu-7879778795-rzxc7   1/1     Running   0          2m58s</code></pre>

You can observe the event triggered due to adding a new node. 

<pre class="codeWrap"><code>student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl get events
 LAST SEEN   TYPE      REASON                    OBJECT                                             MESSAGE
 33s         Warning   FailedScheduling          pod/application-cpu-7879778795-56l6d               0/4 nodes are available: 4 Insufficient cpu.
 29s         Normal    TriggeredScaleUp          pod/application-cpu-7879778795-56l6d               pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/qwiklabs-gcp-03-a94f05d7b8a0/zones/us-central1-a/instanceGroups/gke-scaling-demo-default-pool-b182e404-grp 4->5 (max: 5)}]
 5m39s       Normal    Scheduled                 pod/application-cpu-7879778795-8t8bn               Successfully assigned default/application-cpu-7879778795-8t8bn to gke-scaling-demo-default-pool-b182e404-87gq
 5m38s       Normal    Pulling                   pod/application-cpu-7879778795-8t8bn               Pulling image "aimvector/application-cpu:v1.0.2"
 5m35s       Normal    Pulled                    pod/application-cpu-7879778795-8t8bn               Successfully pulled image "aimvector/application-cpu:v1.0.2" in 3.035424763s  
 5m35s       Normal    Created                   pod/application-cpu-7879778795-8t8bn               Created container application-cpu
 5m35s       Normal    Started                   pod/application-cpu-7879778795-8t8bn               Started container application-cpu
 3m15s       Warning   FailedScheduling          pod/application-cpu-7879778795-rzxc7               0/3 nodes are available: 3 Insufficient cpu.
 3m11s       Normal    TriggeredScaleUp          pod/application-cpu-7879778795-rzxc7               pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/qwiklabs-gcp-03-a94f05d7b8a0/zones/us-central1-a/instanceGroups/gke-scaling-demo-default-pool-b182e404-grp 3->4 (max: 5)}]
 2m17s       Warning   FailedScheduling          pod/application-cpu-7879778795-rzxc7               0/4 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 3 Insufficient cpu.
 2m7s        Normal    Scheduled                 pod/application-cpu-7879778795-rzxc7               Successfully assigned default/application-cpu-7879778795-rzxc7 to gke-scaling-demo-default-pool-b182e404-ccft
 2m6s        Normal    Pulling                   pod/application-cpu-7879778795-rzxc7               Pulling image "aimvector/application-cpu:v1.0.2"
 2m5s        Normal    Pulled                    pod/application-cpu-7879778795-rzxc7               Successfully pulled image "aimvector/application-cpu:v1.0.2" in 1.493155878s
 2m5s        Normal    Created                   pod/application-cpu-7879778795-rzxc7               Created container application-cpu
 2m5s        Normal    Started                   pod/application-cpu-7879778795-rzxc7               Started container application-cpu
 5m39s       Normal    SuccessfulCreate          replicaset/application-cpu-7879778795              Created pod: application-cpu-7879778795-8t8bn
 3m15s       Normal    SuccessfulCreate          replicaset/application-cpu-7879778795              Created pod: application-cpu-7879778795-rzxc7
 33s         Normal    SuccessfulCreate          replicaset/application-cpu-7879778795              Created pod: application-cpu-7879778795-56l6d
 5m39s       Normal    ScalingReplicaSet         deployment/application-cpu                         Scaled up replica set application-cpu-7879778795 to 1
 3m15s       Normal    ScalingReplicaSet         deployment/application-cpu                         Scaled up replica set application-cpu-7879778795 to 2
 33s         Normal    ScalingReplicaSet         deployment/application-cpu                         Scaled up replica set application-cpu-7879778795 to 3
 2m28s       Normal    Starting                  node/gke-scaling-demo-default-pool-b182e404-ccft   Starting kubelet.
 2m28s       Warning   InvalidDiskCapacity       node/gke-scaling-demo-default-pool-b182e404-ccft   invalid capacity 0 on image filesystem
 2m27s       Normal    NodeHasSufficientMemory   node/gke-scaling-demo-default-pool-b182e404-ccft   Node gke-scaling-demo-default-pool-b182e404-ccft status is now: NodeHasSufficientMemory
 2m27s       Normal    NodeHasNoDiskPressure     node/gke-scaling-demo-default-pool-b182e404-ccft   Node gke-scaling-demo-default-pool-b182e404-ccft status is now: NodeHasNoDiskPressure
 2m27s       Normal    NodeHasSufficientPID      node/gke-scaling-demo-default-pool-b182e404-ccft   Node gke-scaling-demo-default-pool-b182e404-ccft status is now: NodeHasSufficientPID
 2m27s       Normal    NodeAllocatableEnforced   node/gke-scaling-demo-default-pool-b182e404-ccft   Updated Node Allocatable limit across pods
 2m25s       Normal    RegisteredNode            node/gke-scaling-demo-default-pool-b182e404-ccft   Node gke-scaling-demo-default-pool-b182e404-ccft event: Registered Node gke-scaling-demo-default-pool-b182e404-ccft in Controller
 2m24s       Normal    Starting                  node/gke-scaling-demo-default-pool-b182e404-ccft   Starting kube-proxy.
 2m21s       Warning   ContainerdStart           node/gke-scaling-demo-default-pool-b182e404-ccft   Starting containerd container runtime...
 2m21s       Warning   DockerStart               node/gke-scaling-demo-default-pool-b182e404-ccft   Starting Docker Application Container Engine...
 2m21s       Warning   KubeletStart              node/gke-scaling-demo-default-pool-b182e404-ccft   Started Kubernetes kubelet.
 2m21s       Warning   NodeSysctlChange          node/gke-scaling-demo-default-pool-b182e404-ccft   {"unmanaged": {"net.netfilter.nf_conntrack_buckets": "32768"}}  
 2m16s       Normal    NodeReady                 node/gke-scaling-demo-default-pool-b182e404-ccft   Node gke-scaling-demo-default-pool-b182e404-ccft status is now: NodeReady</code></pre>

For the pending pod event check that the cluster autoscaler has added a worker node.

<pre class="codeWrap"><code>student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl get nodes
 
 NAME                                          STATUS   ROLES    AGE     VERSION
 gke-scaling-demo-default-pool-b182e404-5l2v   Ready    <none>   14m     v1.20.8-gke.900
 gke-scaling-demo-default-pool-b182e404-7sg8   Ready    <none>   2s      v1.20.8-gke.900
 gke-scaling-demo-default-pool-b182e404-87gq   Ready    <none>   14m     v1.20.8-gke.900
 gke-scaling-demo-default-pool-b182e404-ccft   Ready    <none>   2m43s   v1.20.8-gke.900
 gke-scaling-demo-default-pool-b182e404-kwfc   Ready    <none>   14m     v1.20.8-gke.900
</code></pre>

You can see that the new pod has successfully allocated resources.

<pre class="codeWrap"><code>student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl get pods

 NAME                               READY   STATUS    RESTARTS   AGE
 application-cpu-7879778795-56l6d   1/1     Running   0          82s
 application-cpu-7879778795-8t8bn   1/1     Running   0          6m28s
 application-cpu-7879778795-rzxc7   1/1     Running   0          4m4s
</code></pre>

Let’s see how Cluster autoscaler is going to behave when we scale down the resources. Remove the pods and just leave one pod in the cluster.

<pre class="codeWrap"><code>student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl scale
deploy/application-cpu --replicas 1  deployment.apps/application-cpu scaled

 student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl get pods

 NAME                               READY   STATUS    RESTARTS   AGE
 application-cpu-7879778795-8t8bn   1/1     Running   0          7m41s
</code></pre>

Check the number of nodes. You can see that nodes aren’t scaled down yet as the autoscaler needs some time to detect and react to the event.

<pre class="codeWrap"><code>student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl get nodes
 
 NAME                                          STATUS   ROLES    AGE   VERSION
 gke-scaling-demo-default-pool-b182e404-5l2v   Ready    <none>   25m   v1.20.8-gke.900
 gke-scaling-demo-default-pool-b182e404-7sg8   Ready    <none>   10m   v1.20.8-gke.900
 gke-scaling-demo-default-pool-b182e404-87gq   Ready    <none>   25m   v1.20.8-gke.900
 gke-scaling-demo-default-pool-b182e404-ccft   Ready    <none>   13m   v1.20.8-gke.900
 gke-scaling-demo-default-pool-b182e404-kwfc   Ready    <none>   25m   v1.20.8-gke.900
</code></pre>

After 10-5 minutes check the status of the nodes again and this time the nodes are removed from the cluster. The nodes are scaled back to the initial number of nodes.

<pre class="codeWrap"><code>student_02_c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl get events
 LAST SEEN   TYPE      REASON                                                                                                      OBJECT                                             MESSAGE

 9m31s       Normal    ScaleDown                                                                                                   node/gke-scaling-demo-default-pool-b182e404-7sg8   node removed by cluster autoscaler
 8m53s       Normal    NodeNotReady                                                                                                node/gke-scaling-demo-default-pool-b182e404-7sg8   Node gke-scaling-demo-default-pool-b182e404-7sg8 status is now: NodeNotReady
 7m44s       Normal    Deleting node gke-scaling-demo-default-pool-b182e404-7sg8 because it does not exist in the cloud provider   node/gke-scaling-demo-default-pool-b182e404-7sg8   Node gke-scaling-demo-default-pool-b182e404-7sg8 event: DeletingNode
 7m42s       Normal    RemovingNode                                                                                                node/gke-scaling-demo-default-pool-b182e404-7sg8   Node gke-scaling-demo-default-pool-b182e404-7sg8 event: Removing Node gke-scaling-demo-default-pool-b182e404-7sg8 from Controller
 9m30s       Normal    ScaleDown                                                                                                   node/gke-scaling-demo-default-pool-b182e404-ccft   node removed by cluster autoscaler
 8m42s       Normal    NodeNotReady                                                                                                node/gke-scaling-demo-default-pool-b182e404-ccft   Node gke-scaling-demo-default-pool-b182e404-ccft status is now: NodeNotReady
 7m38s       Normal    Deleting node gke-scaling-demo-default-pool-b182e404-ccft because it does not exist in the cloud provider   node/gke-scaling-demo-default-pool-b182e404-ccft   Node gke-scaling-demo-default-pool-b182e404-ccft event: DeletingNode
 7m37s       Normal    RemovingNode
</code></pre>

Autoscaler Limitations

Cluster autoscaler provides an efficient way to manage resources but it has its own limitations. Let’s look at a few limitations of cluster autoscaler and how we can overcome them. 

  • The cluster autoscaler feature is not available for on-prem deployments, the only way is to implement it in on-premise environments.
  • The cluster autoscaler takes a few minutes to scale up and down. This means that the pod will be in a pending state for a few minutes.
  • The interdependencies between pods can be a hurdle when removing pods. Autoscaler can’t remove a pod that has local volume bindings from other pods even if it’s the right choice.
  • Cluster Autoscaler can lead to resource waste due to misconfiguration. It allocated the nodes based on request, not usage. If the request is more than what’s actually required the resources will get wasted. To avoid this additional tools can be used to analyze the resource usage and request efficiently. 
  • Cluster Autoscaler adds additional nodes but administrators are responsible for defining the right size for each node. The same commercial tools help with optimizing the node size while also identifying the cluster’s wasted capacity due to unused resource requests
  • Despite using cluster autoscaler the administrators should still define the size for each node. There are tools available to set the optimized node size and reduce resource wastage. 

These limitations can be manageable but will impact the K8s deployment when dealing with large-scale applications. You can further read more about the optimization of container infrastructure to deal with complex environments.

Facing Challenges in Cloud, DevOps, or Security?
Let’s tackle them together!

get free consultation sessions

In case you prefer e-mail first:

Thank you! Your message has been received!
We will contact you shortly.
Oops! Something went wrong while submitting the form.
By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information. If you wish to disable storing cookies, click here.