生产环境中,遇到coredns Pending问题,如下
# kubectl get pod -n kube-system |grep coredns
coredns-5479d79657-6gvvs 1/1 Running 3 28d
coredns-5479d79657-7l7tn 1/1 Running 3 28d
coredns-5479d79657-98qz8 1/1 Running 3 28d
coredns-5479d79657-bsx7h 1/1 Running 3 28d
coredns-5479d79657-btbl8 1/1 Running 3 28d
coredns-5479d79657-f6pwq 1/1 Running 3 28d
coredns-5479d79657-fbht5 0/1 Pending 0 28d
coredns-5479d79657-g7xhz 1/1 Running 3 28d
coredns-5479d79657-gw27m 1/1 Running 5 28d
coredns-5479d79657-h7g29 1/1 Running 3 28d
coredns-5479d79657-jqhj9 1/1 Running 2 28d
coredns-5479d79657-k94lh 1/1 Running 0 28d
coredns-5479d79657-kg5hv 1/1 Running 3 28d
coredns-5479d79657-khjdk 1/1 Running 3 28d
coredns-5479d79657-khp2l 0/1 Pending 0 2d22h
coredns-5479d79657-lwjb7 0/1 Pending 0 28d
coredns-5479d79657-p7ks6 1/1 Running 6 28d
coredns-5479d79657-p8c4v 1/1 Running 3 28d
coredns-5479d79657-tqdhz 0/1 Pending 0 7h15m
coredns-5479d79657-v6qfb 1/1 Running 3 28d
coredns-5479d79657-wcq7t 1/1 Running 3 28d
coredns-5479d79657-zbbck 0/1 Pending 0 28d
当前业务并无异常,只是pod状态不正常。删除pending状态的coredns,会立即启动一个,但依然是pending状态。查看一个pending状态的pod详细描述,可以看到报错原因
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m18s (x2460351 over 21d) default-scheduler 0/17 nodes are available: 17 node(s) didn't match pod affinity/anti-affinity, 17 node(s) didn't satisfy existing pods anti-affinity rules.
大概意思是现有的17个节点不满足节点亲和性,所以pod无法运行。
# kubectl get pod -n kube-system |grep coredns |wc -l
22
# kubectl get pod -n kube-system |grep coredns |grep Pending |wc -l
5
当前k8s集群里有17个node,coredns启动了22个,有5个为Pending状态。结合当前业务正常的情况猜测节点亲和性设置为每个节点只能运行一个coredns,于是有5个pod在其节点上由于已经有运行的coredns pod,无法运行,只能为Pending状态。查看节点亲和性。
# kubectl get ep -n kube-system
NAME ENDPOINTS AGE
coredns 10.233.64.116:53,10.233.65.138:53,10.233.66.25:53 + 48 more... 33d
kube-controller-manager <none> 33d
kube-scheduler <none> 33d
kubernetes-dashboard 10.233.67.14:8443 33d
prometheus-operator-coredns 10.233.64.116:9153,10.233.65.138:9153,10.233.66.25:9153 + 14 more... 33d
prometheus-operator-kube-controller-manager <none> 33d
prometheus-operator-kube-etcd <none> 33d
prometheus-operator-kube-scheduler <none> 33d
prometheus-operator-kubelet 172.29.11.10:10255,172.29.11.12:10255,172.29.11.14:10255 + 48 more... 33d
tiller-deploy 10.233.66.24:44134 33d
# kubectl edit deployment coredns -n kube-system
//只关注亲和性/反亲和性设置这一段
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: node-role.kubernetes.io/master
operator: In
values:
- ""
weight: 100
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
k8s-app: coredns
topologyKey: kubernetes.io/hostname
集群中node节点是17个,为什么coredns设置为22个?只好先看下副本管理器中coredns设置的副本数。查看旧版本的k8s副本管理器用kubectl get rc -n kube-system
,而比较新的版本用rs代替rc。
# kubectl get rs -n kube-system
NAME DESIRED CURRENT READY AGE
coredns-5479d79657 22 22 17 28d
dns-autoscaler-55944959bd 1 1 1 28d
kubernetes-dashboard-86b759667 1 1 1 28d
tiller-deploy-597b9b5f7c 1 1 1 28d
看到副本管理器中确实设定了副本数位22,先将副本数改为17,观察Pending状态的pod是否会被删除。
# kubectl edit rs coredns-5479d79657 -n kube-system
//这个命令可以修改coredns-5479d79657这个rs的配置,按照文档只修改spec.replicas的值
spec:
replicas: 17 //改为跟node数一致
selector:
matchLabels:
k8s-app: coredns
pod-template-hash: 5479d7965
//修改完毕保存退出
replicaset.extensions/coredns-5479d79657 edited
但是保存此配置后,删除Pending状态的pod,还是会自动启动一个,总数并没有改变。使用命令修改副本数:
# kubectl scale rs coredns-5479d79657 -n kube-system --replicas=17
replicaset.extensions/coredns-5479d79657 scaled
# kubectl get pod -n kube-system |grep coredns |wc -l
22
提示修改成功,但coredns pod数量还是没有改变。
尝试修改deployments
kubectl edit deployments coredns -n kube-system
//只修改spec.replicas的值
spec:
progressDeadlineSeconds: 2147483647
replicas: 17 //修改为与node数量一致
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: coredns
保存退出后,发现pod数量依然是22个。
使用patch修改deployment副本数,结果pod数量还是不变。
kubectl patch deployment coredns -p '{"spec":{"replicas":17}}' -n kube-system
由此可以推测coredns数量由某个进程或配置管理,不受rs、deployments管理。
这时注意到dns-autoscaler这个deployment,然后联想到cluster-autoscaler。CA(cluster-autoscaler)是用来弹性伸缩kubernetes集群的,dns-autoscaler应该是弹性伸缩coredns这个pod集群的。
# kubectl get deployment -n kube-system
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
coredns 22 22 22 22 33d
dns-autoscaler 1 1 1 1 33d
kubernetes-dashboard 1 1 1 1 33d
tiller-deploy 1 1 1 1 33d
为了验证猜想,先停掉dns-autoscaler,再将pod数量调整为17个。
# kubectl scale deployment --replicas=0 dns-autoscaler -n kube-system
deployment.extensions/dns-autoscaler scaled
# kubectl patch deployment coredns -p '{"spec":{"replicas":17}}' -n kube-system
deployment.extensions/coredns patched
# kubectl get pod -n kube-system |grep coredns |wc -l
17
再查看coredns pod数量,已经变为17个, 而且全都是running状态,问题解决。后续再研究下dns-autoscaler为什么会把coredns pod目标数量设定为22个,怎么修改这个预设数量。