[AEWS-3기] Auto Scaling (1) HPA, KEDA, VPA에 대해서

카테고리 없음

[AEWS-3기] Auto Scaling (1) HPA, KEDA, VPA에 대해서

james_janghun 2025. 3. 8. 19:37

가시다님과 함께하는 AEWS 스터디 내용을 정리하였습니다.

쿠버네티스의 확장방식 3가지

쿠버네티스에서는 다음과 같은 3가지 확장방식이 있습니다.

HPA(Horizontal Pod Autoscaler) : 말 그대로 pod의 수를 늘리는 방식이다.
VPA(Vertical Pod Autoscaler) : 노드의 스펙을 증가시키는 방식이다.
CA(Cluster Autoscaler) : 그냥 클러스터 자체를 늘리는 방식이다.

HPA의 경우 수치측정은 어떻게하는가?

cAdvisor가 메트릭 수집을 하게되고, 이를 kubelet과 통신합니다. metrics-server는 kubelet을 통해 이 정보들을 수집하고 있습니다.

기본적으로 대부분의 오토스케일러들은 metrics-server를 통해서 메트릭을 수집하고 있습니다.

부하 테스트

HPA가 잘 작동하는지 알아보기 위해서 부하 테스트를 진행합니다.

이 애플리케이션은 접속할때마다 100만번 연산합니다. 따라서 CPU 부하가 발생합니다.

cat << EOF > php-apache.yaml
apiVersion: apps/v1
kind: Deployment
metadata: 
  name: php-apache
spec: 
  selector: 
    matchLabels: 
      run: php-apache
  template: 
    metadata: 
      labels: 
        run: php-apache
    spec: 
      containers: 
      - name: php-apache
        image: registry.k8s.io/hpa-example
        ports: 
        - containerPort: 80
        resources: 
          limits: 
            cpu: 500m
          requests: 
            cpu: 200m
---
apiVersion: v1
kind: Service
metadata: 
  name: php-apache
  labels: 
    run: php-apache
spec: 
  ports: 
  - port: 80
  selector: 
    run: php-apache
EOF
kubectl apply -f php-apache.yaml

해당 php 코드를보면 다음과 같습니다.

<?php
$x = 0.0001;
for ($i = 0; $i <= 1000000; $i++) {
	$x += sqrt($x);
}
echo "OK!";
?>

이제 HPA를 설정해봅시다.

cat <<EOF | kubectl apply -f -
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        averageUtilization: 50
        type: Utilization
EOF
혹은
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

확인하면 다음과 같습니다. 해당 hpa는 cpu 평균 사용량이 50%가 넘을 경우 최대 10개의 레플리카를 만드는 것입니다.

kubectl describe hpa

자 이제 모니터링을 위해서 다음과 같이 watch 명령어로 현 상황을 살펴보겠습니다.

watch -d 'kubectl get hpa,pod;echo;kubectl top pod;echo;kubectl top node'
kubectl exec -it deploy/php-apache -- top

해당 상태로 부하를 걸어보도록 하겠습니다.

해당 명령어는 pod를 호출할때마다 부하가 증가하도록 설정되어, 계속 pod를 호출하게 됩니다.

kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

부하 전의 모습

부하를 걸면서 계속 pod가 증가하는 모습을 볼 수 있습니다.

pod가 계속 증가하면서 CPU 부하가 분산되면서 Targets에 보면 평균 cpu 사용량이 50% 밑으로 떨어지게되면서 더 이상 pod가 생기지 않고 7개에서 멈추는 모습입니다.

자 이제는 부하를 멈추고 있어보겠습니다. describe를 통해서 hpa에 대해서 구체적인 상황 파악도 가능하게 됩니다.

kubectl describe hpa

KEDA - Kubernetes based Event Driven Autoscaler

KEDA는 HPA와 무엇이 다를까요?

HPA는 리소스 기반의 스케일링으로 리소스 사용량에 대한 반응을 합니다. 리소스에 대해서 명확하게 파악가능하다는 점이 있고 사용도 간편한 장점이 있지만, 리소스 사용량이 높아지는 것은 워크로드가 이미 증가하고 난 이후이므로 스케일링 결정에 지연이 될 수 있다는 점입니다.

KEDA는 이를 보완할 수 있습니다. 이벤트 기반 스케일링 기술이므로 조금 더 워크로드나 이벤트에 직접적으로 반응합니다.

따라서 리소스 사용량 대신 메시지 수나 대기 중인 작업과 같은 실제 워크로드를 기반으로 결정할 수 있습니다. 그렇기 때문에 워크로드가 증가하기 전에도 스케일링을 시작할 수 있고, 필요하지 않다고 판단되면 파드를 0으로 스케일 다운하는 기능도 있어 리소스를 조금 더 효율적으로 사용한다고 볼 수 있습니다.

KEDA의 핵심 구성 요소 3가지

1. Agent

keda-operator가 효율적인 리소스 관리를 사용합니다.

- 이벤트가 없을 때 replicas를 0으로 스케일 다운한다.

- 이벤트가 발생하면 다시 활성화하여 스케일 업한다.

2. Metrics-Server

keda-operator-metrics-apiserver가 메트릭서버 역할을 합니다.

- KEDA는 이벤트 기반 동작이므로 기존의 metrics-server가 리소스만 메트릭하는 것과 다르게 다양한 이벤트 데이터를 파악합니다.

3. Admission Webhooks

- 리소스 변경을 자동으로 검증합니다.

- 예를들면 scaledObjects가 동일한 대상을 타겟팅하는 것을 방지합니다.

이론만 이야기하면 잘 이해가 안가니 실제로 해보면서 파악해봅시다.

KEDA 설치

cat <<EOT > keda-values.yaml
metricsServer:
  useHostNetwork: true

prometheus:
  metricServer:
    enabled: true
    port: 9022
    portName: metrics
    path: /metrics
    serviceMonitor:
      # Enables ServiceMonitor creation for the Prometheus Operator
      enabled: true
    podMonitor:
      # Enables PodMonitor creation for the Prometheus Operator
      enabled: true
  operator:
    enabled: true
    port: 8080
    serviceMonitor:
      # Enables ServiceMonitor creation for the Prometheus Operator
      enabled: true
    podMonitor:
      # Enables PodMonitor creation for the Prometheus Operator
      enabled: true
  webhooks:
    enabled: true
    port: 8020
    serviceMonitor:
      # Enables ServiceMonitor creation for the Prometheus webhooks
      enabled: true
EOT


helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda --version 2.16.0 --namespace keda --create-namespace -f keda-values.yaml

KEDA 설치 확인

kubectl get crd | grep keda
kubectl get all -n keda
kubectl get validatingwebhookconfigurations keda-admission -o yaml
kubectl get podmonitor,servicemonitors -n keda
kubectl get apiservice v1beta1.external.metrics.k8s.io -o yaml

KEDA에서는 다음과 같이 별도의 metrics-server가 존재해서 다양한 지표를 제공하고 있습니다.

ScaledObject 설치

ScaledObject는 KEDA의 핵심 리소스로, 어떤 대상을 어떻게 스케일링할지 정의하는 커스텀 리소스입니다.

cat <<EOT > keda-cron.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: php-apache-cron-scaled
spec:
  minReplicaCount: 0
  maxReplicaCount: 2  # Specifies the maximum number of replicas to scale up to (defaults to 100).
  pollingInterval: 30  # Specifies how often KEDA should check for scaling events
  cooldownPeriod: 300  # Specifies the cool-down period in seconds after a scaling event
  scaleTargetRef:  # Identifies the Kubernetes deployment or other resource that should be scaled.
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  triggers:  # Defines the specific configuration for your chosen scaler, including any required parameters or settings
  - type: cron
    metadata:
      timezone: Asia/Seoul
      start: 00,15,30,45 * * * *
      end: 05,20,35,50 * * * *
      desiredReplicas: "1"
EOT
kubectl apply -f keda-cron.yaml -n keda

중요한 내용이니 조금 더 자세히 살펴보겠습니다.

스케일링 규칙(Scaling Rules) : 최소/최대 레플리카 수, 스케일링 동작(polling interval, cooldown period 등), 고급 스케일링 알고리즘 등을 정의합니다.

spec:
  minReplicaCount: 0
  maxReplicaCount: 2  # Specifies the maximum number of replicas to scale up to (defaults to 100).
  pollingInterval: 30  # Specifies how often KEDA should check for scaling events
  cooldownPeriod: 300

스케일링 대상(Scale Target): 어떤 Kubernetes 리소스(주로 Deployment나 StatefulSet)를 스케일링할지 지정합니다.

  scaleTargetRef:  # Identifies the Kubernetes deployment or other resource that should be scaled.
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache

트리거(Triggers): 어떤 이벤트나 메트릭이 스케일링을 유발할지 정의합니다. 예를들어 RabbitMQ 큐의 메시지 수, prometheus의 메트릭 등 세부적으로 정의가 가능합니다.

  triggers: 
  - type: cron
    metadata:
      timezone: Asia/Seoul
      start: 00,15,30,45 * * * *
      end: 05,20,35,50 * * * *
      desiredReplicas: "1"

이번 실습에서 설정한 것은 cron설정으로 00-05에 1개, 15-20분에 1개, 30-35분에 1개, 45-50분에 1개 이렇게 설정한 것으로

아래 그라파나에서도 pod의 수가 계속 변경된 것을 확인할 수 있습니다.

VPA - Vertical Pod Autoscaler

VPA는 HPA와 함께 사용할 수 없고, pod를 최적값으로 수정하기 위해 pod를 재실행하기도 합니다.

계산 방식은 '기준값(파드가 동작하는데 필요한 최소한의 값)을 결정해 약간의 적절한 마진을 추가합니다.

https://devocean.sk.com/blog/techBoardDetail.do?ID=164786

Pod CPU/Memory 리소스 최적화하기 (VPA 및 Kubecost 추천로직 분석)

devocean.sk.com

기본적으로 VPA는 파드의 리소스(CPU, Mem)을 최적화해주는 도구입니다. 너무 많은 리소스를 요청하거나 반대로 너무 부족하면 느려지기 때문에 딱 맞는 값으로 조정해줍니다. 즉 resources.requests의 값을 자동으로 조절하는데 매우 중요한 기능입니다.

샘플 앱 설치

# [운영서버 EC2] 코드 다운로드
git clone https://github.com/kubernetes/autoscaler.git # userdata 로 설치 되어 있음
cd ~/autoscaler/vertical-pod-autoscaler/
tree hack

# openssl 버전 확인
openssl version
OpenSSL 1.0.2k-fips  26 Jan 2017

# 1.0 제거
yum remove openssl -y

# openssl 1.1.1 이상 버전 확인
yum install openssl11 -y
openssl11 version
OpenSSL 1.1.1g FIPS  21 Apr 2020

# 스크립트파일내에 openssl11 수정
sed -i 's/openssl/openssl11/g' ~/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/gencerts.sh
git status
git config --global user.email "you@example.com"
git config --global user.name "Your Name"
git add .
git commit -m "openssl version modify"

# Deploy the Vertical Pod Autoscaler to your cluster with the following command.
watch -d kubectl get pod -n kube-system
cat hack/vpa-up.sh
./hack/vpa-up.sh

# 재실행!
sed -i 's/openssl/openssl11/g' ~/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/gencerts.sh
./hack/vpa-up.sh

kubectl get crd | grep autoscaling
kubectl get mutatingwebhookconfigurations vpa-webhook-config
kubectl get mutatingwebhookconfigurations vpa-webhook-config -o json | jq

VPA가 자동으로 request를 조정하는 것을 확인할 수 있다. 잘 보면 일정 시간 뒤에 Requests cpu가 2배 상승한 것을 확인할 수 있다.

다음과 같이 모니터링해보면서 리소스가 조절되는 것을 확인할 수 있었다.

watch -d "kubectl top pod;echo "----------------------";kubectl describe pod | grep Requests: -A2"

# 공식 예제 배포
cd ~/autoscaler/vertical-pod-autoscaler/
cat examples/hamster.yaml
kubectl apply -f examples/hamster.yaml && kubectl get vpa -w

# 파드 리소스 Requestes 확인
kubectl describe pod | grep Requests: -A2