How to Auto-Scale your application running on Kubernetes cluster using KEDA with Prometheus scaler.

ยท

4 min read

Table of contents

No heading

No headings in the article.

Before jumping on the main tutorial, first, we have to understand "what is scaling?" and "what is the importance of it?"

Scaling refers to the ability to adjust the capacity of a system or application based on workload demands. It involves increasing or decreasing the available resources to accommodate changes in traffic, processing requirements, or user demand.

The importance of scaling lies in effectively managing workload demands to ensure optimal performance, resource utilization, and user experience. For example, an application running on the Kubernetes cluster has 2 replicas, and let's assume there is too much traffic coming into those pods that they can't handle. Now you have to increase the replica count manually to keep your application working smoothly. Also vice-versa there may be too many replicas running compare to network traffics and at this moment your resources are used without any need. That's where auto-scaling saves us.

In this blog, I discussed how to auto-scale with KEDA. KEDA (Kubernetes-based Event Driven Autoscaling) is an open-source framework that provides efficient autoscaling for applications running in a Kubernetes environment. KEDA allows you to scale your application based on external metrics from different sources like message queues (Prometheus, Kafka, RabbitMQ).

I have already installed KEDA on my minikube cluster. Here is how you can install KEDA by helm in your cluster in just a few steps -

helm repo add kedacore https://kedacore.github.io/charts

helm repo update

kubectl create namespace keda

helm install keda kedacore/keda --namespace keda

If the installation is successful then you should see Keda pods running on keda namespace on Kubernetes cluster-

For this time, I am gonna use Prometheus metrics for auto-scaling my application. So you need Prometheus installed in your Kubernetes cluster and configure it to fetch metrics from your application running on Kubernetes. Here I already installed Prometheus and deploy a demo application on my Kubernetes cluster.

Here is the YAML file of my deployment running on port 3000 --

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-deployment
  labels:
    app: api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
        - name: api
          image: rono007/demo-app:5
          ports:
            - containerPort: 3000

---
apiVersion: v1
kind: Service
metadata:
  name: api-service
  labels:
    job: node-api
    app: api
spec:
  type: ClusterIP
  selector:
    app: api
  ports:
    - name: web
      protocol: TCP
      port: 3000
      targetPort: 3000

Now you have to connect your application with KEDA operator by a ScaledObject.

A KEDA ScaledObject is a Kubernetes custom resource used to define the autoscaling behaviour for a specific workload or application in a Kubernetes environment. It is part of the KEDA (Kubernetes-based Event Driven Autoscaling) framework.

A ScaledObject specifies the scaling rules, triggers, and the target workload that needs to be scaled. It defines the minimum and maximum replica counts, polling intervals, and metric thresholds to trigger scaling actions. The ScaledObject acts as a bridge between the KEDA controller and the target workload, enabling autoscaling based on external metrics.

Here is the ScaledObject that I am gonna use -

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: demo-scaledobject
  namespace: default
spec:
  scaleTargetRef:
    kind: Deployment
    name: api-deployment # name of the deployment, must be in the same namespace as ScaledObject
  minReplicaCount: 1
  maxReplicaCount: 10
  pollingInterval: 10
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://<prometheus-host>:9090
      metricName: api_all_request_total
      threshold: '20'
      query: sum(rate(api_all_request_total[2m]))

Now Let's discuss those components.

metadata: Here you need to provide the name of ScaledObject.

scaleTargetRef: It specifies the target workload to be scaled, such as a Deployment, StatefulSet, or ReplicaSet. The ScaledObject monitors this workload and adjusts the replica count based on the defined scaling rules.

minReplicaCount and maxReplicaCount: These parameters define the minimum and maximum number of replicas allowed for the target workload.

pollingInterval: It defines the frequency at which the ScaledObject checks the metrics and triggers scaling actions. Here it is 10 means KEDA ScaledObject will check for metric updates and trigger scaling actions every 10 seconds.

triggers: Triggers define the metrics and thresholds that determine when scaling should occur. For example here I am gonna use Prometheus scaler so its type should be prometheus, and the serverAddress indicates where Prometheus is running which contains the configured metric defined in metricName or query. You can change metricName,threshold and query according to your need.

Before applying this Scaled-Object we have to configure the serverAddress of trigger Prometheus. To get the actual Prometheus-Host we need to get the IP of this - prometheus-prometheus-kube-prometheus-prometheus-0 pod.

kubectl describe pod prometheus-prometheus-kube-prometheus-prometheus-0

Copy the IP and paste it in <prometheus-host> section in serverAddress.

- type: prometheus
    metadata:
      serverAddress: http://10.244.1.25:9090
      metricName: api_all_request_total
      threshold: '20'
      query: sum(rate(api_all_request_total[2m]))

Next, apply the ScaledObject.

kubectl apply -f scaledobject.yaml

Now let's expose a pod on localhost:3000 of my application and create some workload on it to produce Prometheus metrics.

To create workloads I am gonna use Apache-Benchmark.

ab -n 10000 -c 800 http://localhost:3000/

This will simulate a scenario where 10,000 requests are made to the specified URL with a concurrency level of 800 so that we can see auto-scaling on my application.

After a few seconds, you will see the replica counts of API-deployment automatically scaled up.

Vice-versa When workload will decrease the replica count will be scaled down.

That's all for today, hope you enjoyed ๐Ÿ˜Š.

ย