参考:
- https://zhuanlan.zhihu.com/p/32328591
- https://juejin.im/post/5d397288f265da1bce3e1585
- http://www.yfshare.vip/2019/01/28/k8s%E9%9B%86%E7%BE%A4%E6%B0%B4%E5%B9%B3%E6%89%A9%E5%B1%95-HPA/
简介
Horizontal Pod Autoscaling,简称HPA,是Kubernetes中实现POD水平自动伸缩的功能。
K8S集群可以通过Replication Controller的scale机制完成服务的扩容或缩容,实现具有伸缩性的服务。
K8S自动伸缩分为:
- sacle手动伸缩。见k8s滚动升级(RollingUpdate)
- autoscale自动伸缩,见HPA。
自动扩展主要分为两种:
- 水平扩展(scale out),针对于实例数目的增减。
- 垂直扩展(scal up),即单个实例可以使用的资源的增减, 比如增加cpu和增大内存。
HPA属于前者。它可以根据CPU使用率或应用自定义metrics自动扩展Pod数量(支持 replication controller、deployment 和 replica set)。
获取metrics的两种方式:
- Heapster:heapster提供metrics服务,但是在v1(autoscaling/v1)版本中仅支持以CPU作为扩展度量指标。而其他比如:内存,网络流量,qps等目前处于beta阶段(autoscaling/v2beta1)。
- Cousom:同样处于beta阶段(autoscaling/v2beta1),但是涉及到自定义的REST API的开发,复杂度会大一些,并且当需要从自定义的监控中获取数据时,只能设置绝对值,无法设置使用率。
工作流程
- 创建HPA资源,设定目标CPU使用率限额,以及最大/最小实例数,一定要设置Pod的资源限制参数: request,否则HPA不会工作。
- 控制管理器每隔30s(在kube-controller-manager.service中可以通过
–-horizontal-pod-autoscaler-sync-period
修改)查询metrics的资源使用情况。 - 然后与创建时设定的值和指标做对比(平均值之和/限额),求出目标调整的实例个数。
- 目标调整的实例数不能超过第一条中设定的最大/最小实例数。如果没有超过,则扩容;超过,则扩容至最大的实例个数。
- 重复第2-4步。
自动伸缩算法
HPA Controller会通过调整副本数量使得CPU使用率尽量向期望值靠近,而且不是完全相等。另官方考虑到自动扩展的决策可能需要一段时间才会生效:例如当pod所需要的CPU负荷过大,从而在创建一个新pod的过程中,系统的CPU使用量可能会同样在有一个攀升的过程。所以在每一次作出决策后的一段时间内,将不再进行扩展决策。对于扩容而言,这个时间段为3分钟,缩容为5分钟(可以通过--horizontal-pod-autoscaler-downscale-delay
,--horizontal-pod-autoscaler-upscale-delay
进行调整)。
- HPA Controller中有一个tolerance(容忍力)的概念,它允许一定范围内的使用量的不稳定,现在默认为0.1,这也是出于维护系统稳定性的考虑。例如设定HPA调度策略为cpu使用率高于50%触发扩容,那么只有当使用率大于55%或者小于45%才会触发伸缩活动,HPA会尽力把Pod的使用率控制在这个范围之间。
- 具体的每次扩容或者缩容的多少Pod的算法为:Ceil(前采集到的使用率 / 用户自定义的使用率) * Pod数量)。
- 每次最大扩容pod数量不会超过当前副本数量的2倍。
HPA apiversion的三个版本
- autoscaling/v1 只支持CPU一种伸缩指标
- autoscaling/v2beta1 支持了
Resource Metrics
和Custom Metrics
- autoscaling/v2beta2 中额外增加了
External Metrics
的支持
基本原理
假设存在一个叫 A 的 Deployment
,包含3个 Pod
,每个副本的 Request 值是 1 核,当前 3 个 Pod
的 CPU 利用率分别是 60%、70% 与 80%,此时我们设置 HPA
阈值为 50%,最小副本为 3,最大副本为 10。接下来我们将上述的数据带入公式中:
- 总的 Pod 的利用率是 60%+70%+80% = 210%
- 当前的 Target 是 3
- 算式的结果是 70%,大于50%阈值,因此当前的 Target 数目过小,需要进行扩容
- 重新设置 Target 值为 5,此时算式的结果为 42% 低于 50%,判断还需要扩容两个容器
- 此时 HPA 设置 Replicas 为 5,进行 Pod 的水平扩容。
经过上面的推演,可以协助开发者快速理解 HPA 最核心的原理,不过上面的推演结果和实际情况下是有所出入的,如果开发者进行试验的话,会发现 Replicas 最终的结果是 6 而不是 5。这是由于 HPA 中一些细节的处理导致的,主要包含如下三个主要的方面:
噪声处理
通过上面的公式可以发现,Target 的数目很大程度上会影响最终的结果,而在 Kubernetes 中,无论是变更或者升级,都更倾向于使用 Recreate 而不是 Restart 的方式进行处理。这就导致了在 Deployment 的生命周期中,可能会出现某一个时间,Target 会由于计算了 Starting 或者 Stopping 的 Pod 而变得很大。这就会给 HPA 的计算带来非常大的噪声,在 HPA Controller 的计算中,如果发现当前的对象存在 Starting 或者 Stopping 的 Pod 会直接跳过当前的计算周期,等待状态都变为 Running 再进行计算。
冷却周期
在弹性伸缩中,冷却周期是不能逃避的一个话题,很多时候我们期望快速弹出与快速回收,而另一方面,我们又不希望集群震荡,所以一个弹性伸缩活动冷却周期的具体数值是多少,一直被开发者所挑战。在 HPA 中,默认的扩容冷却周期是 3 分钟,缩容冷却周期是 5 分钟。
界值计算
我们回到刚才的计算公式,第一次我们算出需要弹出的容器数目是 5,此时扩容后整体的负载是 42%,但是我们似乎忽略了一个问题:一个全新的 Pod 启动会不会自己就占用了部分资源?此外,8% 的缓冲区是否就能够缓解整体的负载情况?要知道当一次弹性扩容完成后,下一次扩容要最少等待 3 分钟才可以继续扩容。为了解决这些问题,HPA 引入了边界值 △,目前在计算边界条件时,会自动加入 10% 的缓冲,这也是为什么在刚才的例子中最终的计算结果为 6 的原因。
测试前准备
环境说明
IP | 角色 |
---|---|
192.168.1.155 | master |
192.168.1.156 | node01 |
192.168.1.157 | node02 |
$ kubectl get hpa
No resources found.
[root@master rbtest]# kubectl cluster-info
Kubernetes master is running at https://192.168.1.155:6443
KubeDNS is running at https://192.168.1.155:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
$ kubectl get componentstatuses
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health": "true"}
wget http://download.baiyongjie.com/kubernetes/yaml/heapster-1.5.3.tar.gz
tar zxf heapster-1.5.3.tar.gz
cd heapster
kubectl apply -f .
kubectl apply -f influxdb/influxdb.yaml
kubectl apply -f influxdb/heapster.yaml
#查看启动状态
$ kubectl get pods -n kube-system |grep -E 'heap|influxdb'
heapster-5478bf8664-qttvz 1/1 Running 0 23s
monitoring-influxdb-c5c9dfd5d-tqxtm 1/1 Running 0 27s
测试heapster是否生效
# kubectl top nodes node02
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
node02 62m 0% 792Mi 21%
# kubectl top pod -n kube-system heapster-5478bf8664-qttvz
NAME CPU(cores) MEMORY(bytes)
heapster-5478bf8664-qttvz 2m 39Mi
测试HPA(apiVersion: autoscaling/v1)
部署deployment用于测试
$ vim hpatest.yaml
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: hpatest
spec:
replicas: 2
selector:
matchLabels:
app: hpatest
template:
metadata:
labels:
app: hpatest
spec:
containers:
- name: hpatest
image: nginx:1.10
imagePullPolicy: IfNotPresent
command: ["/bin/sh"]
args: ["-c","/usr/sbin/nginx; while true;do echo `hostname -I` > /usr/share/nginx/html/index.html; sleep 120;done"]
ports:
- name: http
containerPort: 80
resources:
requests:
cpu: 1m
memory: 100Mi
limits:
cpu: 3m
memory: 400Mi
---
apiVersion: v1
kind: Service
metadata:
name: hpatest-svc
spec:
selector:
app: hpatest
ports:
- port: 80
targetPort: 80
protocol: TCP
# 创建svc和deploy
$ kubectl apply -f hpatest.yaml
# 查看启动状态
$ kubectl get pods,svc | grep hpatest
pod/hpatest-5fb79d5cd-9w2kv 1/1 Running 0 9s
pod/hpatest-5fb79d5cd-k4pb8 1/1 Running 0 16s
service/hpatest-svc ClusterIP 10.99.75.184 <none> 80/TCP 42s
修改kube-controller-manager参数
因为我是用kubeadm安装的,所以修改/etc/kubernetes/manifests/kube-controller-manager.yaml
spec:
containers:
- command:
......
- --horizontal-pod-autoscaler-use-rest-clients=false #新增
image: registry.aliyuncs.com/google_containers/kube-controller-manager:v1.13.4
==如果是二进制方式安装的话则修改/etc/systemd/system/kube-controller-manager.service服务文件==
创建hpa资源文件
$ vim hpatest-hpa.yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: haptest-nginx
spec:
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: haptest
minReplicas: 2
maxReplicas: 6
targetCPUUtilizationPercentage: 50
# 查看是否获取到pod cpu值
# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpatest-nginx Deployment/hpatest 0%/50% 2 6 2 2m
模拟压测,查看是否可以自动伸缩
压测脚本
$ vim hpatest.sh
while true
do
wget -q -O- http://10.99.75.184
done
$ sh hpatest.sh
关注压测过程中pod副本数变化
开始压测前监控pod状态,查看是可以做到自动扩容
$ kubectl get pods -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
hpatest-5fb79d5cd-9w2kv 1/1 Running 0 46m 10.244.1.121 node01 <none> <none>
hpatest-5fb79d5cd-k4pb8 1/1 Running 0 46m 10.244.0.201 master <none> <none>
--- 可以看到下面新增pod的过程 ---
hpatest-5fb79d5cd-w7q8z 0/1 Pending 0 0s <none> <none> <none> <none>
hpatest-5fb79d5cd-wcpf4 0/1 Pending 0 0s <none> <none> <none> <none>
hpatest-5fb79d5cd-w7q8z 0/1 Pending 0 0s <none> node02 <none> <none>
hpatest-5fb79d5cd-wcpf4 0/1 Pending 0 0s <none> node01 <none> <none>
hpatest-5fb79d5cd-w7q8z 0/1 ContainerCreating 0 0s <none> node02 <none> <none>
hpatest-5fb79d5cd-wcpf4 0/1 ContainerCreating 0 0s <none> node01 <none> <none>
hpatest-5fb79d5cd-w7q8z 1/1 Running 0 6s 10.244.2.156 node02 <none> <none>
hpatest-5fb79d5cd-wcpf4 1/1 Running 0 7s 10.244.1.122 node01 <none> <none>
hpatest-5fb79d5cd-4vgpb 0/1 Pending 0 0s <none> <none> <none> <none>
hpatest-5fb79d5cd-4zp2q 0/1 Pending 0 0s <none> <none> <none> <none>
hpatest-5fb79d5cd-4vgpb 0/1 Pending 0 0s <none> node02 <none> <none>
hpatest-5fb79d5cd-4zp2q 0/1 Pending 0 0s <none> master <none> <none>
hpatest-5fb79d5cd-4vgpb 0/1 ContainerCreating 0 0s <none> node02 <none> <none>
hpatest-5fb79d5cd-4zp2q 0/1 ContainerCreating 0 0s <none> master <none> <none>
hpatest-5fb79d5cd-4zp2q 1/1 Running 0 6s 10.244.0.202 master <none> <none>
hpatest-5fb79d5cd-4vgpb 1/1 Running 0 6s 10.244.2.157 node02 <none> <none>
停止压测后查看pod能否在自动伸缩
# kubectl get pods -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
hpatest-5fb79d5cd-4vgpb 1/1 Running 0 2m4s 10.244.2.157 node02 <none> <none>
hpatest-5fb79d5cd-4zp2q 1/1 Running 0 2m4s 10.244.0.202 master <none> <none>
hpatest-5fb79d5cd-9w2kv 1/1 Running 0 50m 10.244.1.121 node01 <none> <none>
hpatest-5fb79d5cd-k4pb8 1/1 Running 0 50m 10.244.0.201 master <none> <none>
hpatest-5fb79d5cd-w7q8z 1/1 Running 0 2m19s 10.244.2.156 node02 <none> <none>
hpatest-5fb79d5cd-wcpf4 1/1 Running 0 2m19s 10.244.1.122 node01 <none> <none>
--- 可以看到已经开始删除了 ---
hpatest-5fb79d5cd-w7q8z 1/1 Terminating 0 6m45s 10.244.2.156 node02 <none> <none>
hpatest-5fb79d5cd-wcpf4 1/1 Terminating 0 6m45s 10.244.1.122 node01 <none> <none>
hpatest-5fb79d5cd-4vgpb 1/1 Terminating 0 6m30s 10.244.2.157 node02 <none> <none>
hpatest-5fb79d5cd-4zp2q 1/1 Terminating 0 6m30s 10.244.0.202 master <none> <none>
查看自动伸缩产生的事件记录
$ kubectl describe hpa hpatest-nginx
.....
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 21m horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 21m horizontal-pod-autoscaler New size: 6; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 14m horizontal-pod-autoscaler New size: 2; reason: All metrics below target
测试HPA(autoscaling/v2beta1)
删除autoscaling/v1的HPA
$ kubectl delete horizontalpodautoscalers.autoscaling hpatest-nginx
horizontalpodautoscaler.autoscaling "hpatest-nginx" deleted
创建autoscaling/v2beta1的HPA
$ vim hpatest-hpa-v2beta1.yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: hpav2beta1
spec:
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: hpatest
minReplicas: 2
maxReplicas: 8
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 80
- type: Resource
resource:
name: memory
targetAverageValue: 200Mi
$ kubectl apply -f hpatest-hpa-v2beta1.yaml
horizontalpodautoscaler.autoscaling/hpav2beta1 created
$ kubectl get hpa hpav2beta1
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpav2beta1 Deployment/hpatest 4%/70%, 0%/80% 2 8 2 14m
模拟压测,查看是否可以自动伸缩
获取svc的IP地址
$ kubectl get svc|grep hpatest
hpatest-svc ClusterIP 10.99.75.184 <none> 80/TCP 4h32m
$ vim hpatest.sh
while true
do
wget -q -O- http://10.99.75.184
done
$ sh hpatest.sh
以下监控同时进行 不分先后顺序
$ kubectl get pods -w
NAME READY STATUS RESTARTS AGE
hpatest-59fc9f47b6-fmnm6 1/1 Running 0 36m
hpatest-59fc9f47b6-vmgz7 1/1 Running 0 36m
--- 可以看到下面新增pod的过程 ---
hpatest-59fc9f47b6-xhvv4 0/1 Pending 0 0s
hpatest-59fc9f47b6-k5gmz 0/1 Pending 0 0s
hpatest-59fc9f47b6-xhvv4 0/1 Pending 0 0s
hpatest-59fc9f47b6-k5gmz 0/1 Pending 0 0s
hpatest-59fc9f47b6-xhvv4 0/1 ContainerCreating 0 0s
hpatest-59fc9f47b6-k5gmz 0/1 ContainerCreating 0 0s
hpatest-59fc9f47b6-k5gmz 1/1 Running 0 7s
hpatest-59fc9f47b6-xhvv4 1/1 Running 0 8s
hpatest-59fc9f47b6-lx7dv 0/1 Pending 0 0s
hpatest-59fc9f47b6-mdjbl 0/1 Pending 0 0s
hpatest-59fc9f47b6-lx7dv 0/1 Pending 0 0s
hpatest-59fc9f47b6-dj6dj 0/1 Pending 0 0s
hpatest-59fc9f47b6-zncgh 0/1 Pending 0 0s
hpatest-59fc9f47b6-mdjbl 0/1 Pending 0 0s
hpatest-59fc9f47b6-dj6dj 0/1 Pending 0 0s
hpatest-59fc9f47b6-zncgh 0/1 Pending 0 0s
hpatest-59fc9f47b6-mdjbl 0/1 ContainerCreating 0 0s
hpatest-59fc9f47b6-zncgh 0/1 ContainerCreating 0 0s
hpatest-59fc9f47b6-lx7dv 0/1 ContainerCreating 0 0s
hpatest-59fc9f47b6-dj6dj 0/1 ContainerCreating 0 0s
hpatest-59fc9f47b6-mdjbl 1/1 Running 0 7s
hpatest-59fc9f47b6-dj6dj 1/1 Running 0 7s
hpatest-59fc9f47b6-lx7dv 1/1 Running 0 7s
hpatest-59fc9f47b6-zncgh 1/1 Running 0 9s
$ kubectl get hpa -o wide -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpav2beta1 Deployment/hpatest 4%/70%, 50%/80% 2 8 2 20m
hpav2beta1 Deployment/hpatest 4%/70%, 850%/80% 2 8 2 21m
hpav2beta1 Deployment/hpatest 4%/70%, 850%/80% 2 8 4 21m
hpav2beta1 Deployment/hpatest 4%/70%, 850%/80% 2 8 8 21m
hpav2beta1 Deployment/hpatest 4%/70%, 750%/80% 2 8 8 22m
hpav2beta1 Deployment/hpatest 4%/70%, 587%/80% 2 8 8 23m
停止监控后的监控 不分先后顺序
$ kubectl get pods -w
NAME READY STATUS RESTARTS AGE
hpatest-59fc9f47b6-dj6dj 1/1 Running 0 2m42s
hpatest-59fc9f47b6-fmnm6 1/1 Running 0 41m
hpatest-59fc9f47b6-k5gmz 1/1 Running 0 2m58s
hpatest-59fc9f47b6-lx7dv 1/1 Running 0 2m42s
hpatest-59fc9f47b6-mdjbl 1/1 Running 0 2m42s
hpatest-59fc9f47b6-vmgz7 1/1 Running 0 41m
hpatest-59fc9f47b6-xhvv4 1/1 Running 0 2m58s
hpatest-59fc9f47b6-zncgh 1/1 Running 0 2m42s
--- 可以看到已经开始删除了 ---
hpatest-59fc9f47b6-zncgh 1/1 Terminating 0 8m16s
hpatest-59fc9f47b6-mdjbl 1/1 Terminating 0 8m16s
hpatest-59fc9f47b6-xhvv4 1/1 Terminating 0 8m32s
hpatest-59fc9f47b6-k5gmz 1/1 Terminating 0 8m32s
hpatest-59fc9f47b6-xhvv4 0/1 Terminating 0 9m4s
hpatest-59fc9f47b6-mdjbl 0/1 Terminating 0 8m48s
hpatest-59fc9f47b6-zncgh 0/1 Terminating 0 8m48s
hpatest-59fc9f47b6-zncgh 0/1 Terminating 0 8m48s
hpatest-59fc9f47b6-mdjbl 0/1 Terminating 0 8m49s
hpatest-59fc9f47b6-mdjbl 0/1 Terminating 0 8m49s
hpatest-59fc9f47b6-xhvv4 0/1 Terminating 0 9m5s
hpatest-59fc9f47b6-xhvv4 0/1 Terminating 0 9m5s
hpatest-59fc9f47b6-k5gmz 0/1 Terminating 0 9m5s
hpatest-59fc9f47b6-k5gmz 0/1 Terminating 0 9m10s
hpatest-59fc9f47b6-k5gmz 0/1 Terminating 0 9m10s
hpatest-59fc9f47b6-zncgh 0/1 Terminating 0 8m54s
hpatest-59fc9f47b6-zncgh 0/1 Terminating 0 8m54s
$ kubectl get hpa -o wide -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpav2beta1 Deployment/hpatest 4%/70%, 50%/80% 2 8 2 20m
hpav2beta1 Deployment/hpatest 4%/70%, 850%/80% 2 8 2 21m
hpav2beta1 Deployment/hpatest 4%/70%, 850%/80% 2 8 4 21m
hpav2beta1 Deployment/hpatest 4%/70%, 850%/80% 2 8 8 21m
hpav2beta1 Deployment/hpatest 4%/70%, 750%/80% 2 8 8 22m
hpav2beta1 Deployment/hpatest 4%/70%, 587%/80% 2 8 8 23m
--- 观察REPLICAS的变化,可以观察到收缩操作要5分钟以后 ---
hpav2beta1 Deployment/hpatest 4%/70%, 387%/80% 2 8 8 24m
hpav2beta1 Deployment/hpatest 4%/70%, 0%/80% 2 8 8 25m
hpav2beta1 Deployment/hpatest 4%/70%, 37%/80% 2 8 8 26m
hpav2beta1 Deployment/hpatest 4%/70%, 0%/80% 2 8 8 27m
hpav2beta1 Deployment/hpatest 4%/70%, 0%/80% 2 8 8 28m
hpav2beta1 Deployment/hpatest 4%/70%, 0%/80% 2 8 8 29m
hpav2beta1 Deployment/hpatest 5%/70%, 0%/80% 2 8 4 30m
hpav2beta1 Deployment/hpatest 4%/70%, 0%/80% 2 8 4 31m
hpav2beta1 Deployment/hpatest 4%/70%, 0%/80% 2 8 2 32m