[AEWS-3기] Auto Scaling (2) CA - CSA, CPA에 대해서

카테고리 없음

[AEWS-3기] Auto Scaling (2) CA - CSA, CPA에 대해서

james_janghun 2025. 3. 8. 20:21

Cluster Autoscaler(CA)란?

Cluster Autoscaler는 쿠버네티스 클러스터의 크기를 자동으로 조절하는 컴포넌트입니다. 쿠버네티스 1.8 버전과 함께 GA(Generally Available) 1.0 버전이 출시되었습니다. CA는 다음과 같은 기능을 수행합니다.

- 리소스 부족으로 파드가 실행되지 못하는 경우 → 노드 추가 (스케일 아웃)

- 노드가 장시간 충분히 활용되지 않는 경우 → 노드 제거 (스케일 인)

https://catalog.us-east-1.prod.workshops.aws/workshops/9c0aa9ab-90a9-44a6-abe1-8dff360ae428/ko-KR/100-scaling/200-cluster-scaling

Cluster Autoscaler의 작동 원리

Cluster Autoscaler는 일반적으로 클러스터 내에서 Deployment로 실행되며, 다음과 같은 원리로 작동합니다.

스케일 아웃 (Scale Out)

- Cluster Autoscaler는 주기적으로 클러스터 상태를 모니터링합니다.

- Pending 상태의 파드를 발견하면, 해당 파드가 실행될 수 있는 새로운 노드가 필요한지 확인합니다.

- 새 노드가 필요하다고 판단되면, 클라우드 제공업체의 API를 통해 새 노드를 요청합니다.

- 새 노드가 클러스터에 추가되면, 쿠버네티스 스케줄러가 자동으로 pending 파드를 새 노드에 배치합니다.

스케일 인 (Scale In)

- Cluster Autoscaler는 각 노드의 활용도를 주기적으로 검사합니다.

- 노드가 설정된 임계값보다 낮은 활용도로 일정 시간 동안 유지되면, 해당 노드의 파드들이 다른 노드로 옮겨질 수 있는지 검사합니다.

- 가능하다면, 해당 노드의 파드들을 다른 노드로 이동시키고(drain), 해당 노드를 클러스터에서 제거합니다.

CA는 반드시 다음과 같은 태그가 있어야합니다.

ASG 정보확인

다음 명령어로 AutoScalingGroupName에 대해서 MinSize, MaxSize, DesiredCapacity 이렇게 3가지 정보를 뽑아봤습니다.

aws autoscaling describe-auto-scaling-groups \
    --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" \
    --output table

MAX SIZE를 6개로 수정해보겠습니다.

export ASG_NAME=$(aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].AutoScalingGroupName" --output text)
aws autoscaling update-auto-scaling-group --auto-scaling-group-name ${ASG_NAME} --min-size 3 --desired-capacity 3 --max-size 6

CAS 배포

service account, clusterrole, role, 각 binding, deployment로 단순하게 구성되어있네요.

curl -s -O https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml


sed -i -e "s|<YOUR CLUSTER NAME>|$CLUSTER_NAME|g" cluster-autoscaler-autodiscover.yaml
kubectl apply -f cluster-autoscaler-autodiscover.yaml

모니터링을 걸어놓고 샘플앱을 배포해보겠습니다. 이 앱은 일부러 스팩이 좀 있는 앱을 넣었습니다.

cat << EOF > nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-to-scaleout
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        service: nginx
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx-to-scaleout
        resources:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 500m
            memory: 512Mi
EOF
kubectl apply -f nginx.yaml
kubectl get deployment/nginx-to-scaleout

이제 이 앱을 15개로 늘려서 리소스 사용량을 늘려주겠습니다.

kubectl scale --replicas=15 deployment/nginx-to-scaleout && date

기존에 노드는 3개였습니다.

부하가 증가하면서 노드도 어느덧 3개가 더 추가되어 6개의 노드가 되었습니다.

샘플 앱을 삭제해보겠습니다. 약 10분정도 후에 노드가 줄어드는 모습을 확인할 수 있습니다.

아무래도 노드 스케일링의 경우는 더 민감하기 때문에 빠르게 줄어들도록 하지는 않습니다.

kubectl delete -f nginx.yaml && date

CPA - Cluster Proportional Autoscaler

CPA는 클러스터의 노드 수에 비례해 애플리케이션(파드)의 수를 자동으로 조절해주는 쿠버네티스 컨트롤러입니다. 클러스터가 커짐에 따라 일부 핵심 서비스는 그 규모에 비례해 확장하고, 클러스터가 작아지면 애플리케이션 수도 감소하게 사용하는 방식입니다.

그럼 CPA 주로 어디에 필요할까요?

- DNS 서비스(coredns/kube-dns) : 노드와 파드가 많아질수록 DNS 쿼리 수가 증가하게 됩니다.

- 모니터링 에이전트 / 로깅 컬렉터 : 더 많은 노드는 더 많은 모니터링이 필요하게 됩니다.

- 서비스 메시 컴포넌트 : 서비스 메시를 쓰는 경우 서비스 간 통신량이 노드 수에 비례해 증가하게 됩니다.

기본적으로 HPA랑 큰 차이는 HPA는 리소스에 대해서 모니터링 하게되지만 CPA는 노드수를 기준으로 하기 때문에 그 목적이 매우 다름을 알 수 있습니다.

CPA의 작동원리

- 클러스터 모니터링 : 주기적으로 클러스터 내 노드 수를 모니터링

- 스케일링 계산 : 사전 정의된 수식에 따라 필요한 파드 수를 계산

- 자동 조정 : 계산된 숫자에 맞게 대상 애플리케이션의 replicas 수를 조절

스케일링 알고리즘

- 선형(linear) 모델 : replicas = cores * coresPerReplica 혹은 replicas = nodes * nodesPerReplica 계산식으로 결정

- 계단형(ladder) 모델 : 미리 정의된 구간에 맞게 replicas 수를 결정

이번 예시에서는 ladder 모델을 사용합니다.

CPA 설치

CPA 규칙을 먼저 설정하고, helm을 배포하도록 하겠습니다.

helm repo add cluster-proportional-autoscaler https://kubernetes-sigs.github.io/cluster-proportional-autoscaler

helm upgrade --install cluster-proportional-autoscaler cluster-proportional-autoscaler/cluster-proportional-autoscaler

샘플 앱도 배포하겠습니다.

cat <<EOT > cpa-nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        resources:
          limits:
            cpu: "100m"
            memory: "64Mi"
          requests:
            cpu: "100m"
            memory: "64Mi"
        ports:
        - containerPort: 80
EOT
kubectl apply -f cpa-nginx.yaml

CPA 규칙을 설정하고, helm을 업그레이드합니다. 예시에서는 ladder 모델을 사용해 보겠습니다.

node가 1개일때, replicas가 1개, 4개일때 3개, 5개일때 5개로 설정한 것입니다.

cat <<EOF > cpa-values.yaml
config:
  ladder:
    nodesToReplicas:
      - [1, 1]
      - [2, 2]
      - [3, 3]
      - [4, 3]
      - [5, 5]
options:
  namespace: default
  target: "deployment/nginx-deployment"
EOF
kubectl describe cm cluster-proportional-autoscaler

helm upgrade --install cluster-proportional-autoscaler -f cpa-values.yaml cluster-proportional-autoscaler/cluster-proportional-autoscaler

노드를 5개로 증가해보겠습니다. 바로 5개로 증가가 가능합니다.

설정대로 5개의 노드가 있을때 5개의 pod로 확장하는 것을 확인할 수 있습니다.

export ASG_NAME=$(aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks-1']].AutoScalingGroupName" --output text)
aws autoscaling update-auto-scaling-group --auto-scaling-group-name ${ASG_NAME} --min-size 5 --desired-capacity 5 --max-size 5
aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks-1']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" --output table

노드를 4개로 축소해보겠습니다. 바로 1개의 노드에서 drain 작업을 시작합니다.

그럼 이제 설정값에 의하면 3개의 pod가 하는것을 확인할 수 있습니다.

aws autoscaling update-auto-scaling-group --auto-scaling-group-name ${ASG_NAME} --min-size 4 --desired-capacity 4 --max-size 4
aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks-1']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" --output table

리소스를 삭제합니다.

helm uninstall cluster-proportional-autoscaler && kubectl delete -f cpa-nginx.yaml