目标版本: kubernetes 1.17.5
集群系统: centos 7
注: 不同版本的kubernetes性能测试有差异,该指南不保证在其他版本有效
前言: kubemark是一个节点模拟工具,目的是用于测试大规模集群下kubernetes接口时延和调度的性能。
我们需要搭建A,B两个集群,B是我们需要测试的集群,A是跑kubemark负载的集群,
kubemark容器在A中以deployment的形式部署,每个副本都会以节点的形式注册到B中,
部署完成后在A中有多少个kubemark的副本,B集群就有多少个node节点
现假设已有两个可用集群A,B
编译并打包kubemark镜像
下载kurbernetes源码,版本需要和B集群一致
编译kubemark二进制文件
./hack/build-go.sh cmd/kubemark/
cp $GOPATH/src/k8s.io/kubernetes/_output/bin/kubemark $GOPATH/src/k8s.io/kubernetes/cluster/images/kubemark/
- 构建kubemark镜像
cd $GOPATH/src/k8s.io/kubernetes/cluster/images/kubemark/
make build
- 创建namespace、configmap、secret、rbac
kubectl create ns kubemark
kubectl create cm node-configmap --from-literal=content.type="" --from-file=kernel.monitor="kernel-monitor.json" -n kubemark (./test/kubemark/resources/kernel-monitor.json)
kubectl create secret generic kubeconfig --type=Opaque --from-file=kubelet.kubeconfig=kubemark.kubeconfig --from-file=kubeproxy.kubeconfig=kubemark.kubeconfig --from-file=npd.kubeconfig=kubemark.kubeconfig --from-file=heapster.kubeconfig=kubemark.kubeconfig --from-file=cluster_autoscaler.kubeconfig=kubemark.kubeconfig --from-file=dns.kubeconfig=kubemark.kubeconfig(kubermark.kubeconfig是B集群的kubeconfig /root/.kube/config)
kubectl apply -f addons/ -n kubemark(./test/kubemark/resources/manifests/addons)
- 创建kubemark负载
kubectl apply -f hollow-node_template.yaml -n kubemark
参考配置:
apiVersion: v1
kind: ReplicationController
metadata:
name: hollow-node
labels:
name: hollow-node
spec:
replicas: 200
selector:
name: hollow-node
template:
metadata:
labels:
name: hollow-node
spec:
initContainers:
- name: init-inotify-limit
image: busybox
command: ['sysctl', '-w', 'fs.inotify.max_user_instances=1000']
securityContext:
privileged: true
volumes:
- name: kubeconfig-volume
secret:
secretName: kubeconfig
- name: kernelmonitorconfig-volume
configMap:
name: node-configmap
- name: logs-volume
hostPath:
path: /var/log
- name: no-serviceaccount-access-to-real-master
emptyDir: {}
containers:
- name: hollow-kubelet
image: test.cargo.io/release/kubemark:latest
ports:
- containerPort: 4194
- containerPort: 10250
- containerPort: 10255
env:
- name: CONTENT_TYPE
valueFrom:
configMapKeyRef:
name: node-configmap
key: content.type
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
command:
- /bin/sh
- -c
- /kubemark --morph=kubelet --name=$(NODE_NAME) --kubeconfig=/kubeconfig/kubelet.kubeconfig $(CONTENT_TYPE) --alsologtostderr 1>>/var/log/kubelet-$(NODE_NAME).log 2>&1
volumeMounts:
- name: kubeconfig-volume
mountPath: /kubeconfig
readOnly: true
- name: logs-volume
mountPath: /var/log
resources:
requests:
cpu: 20m
memory: 50M
securityContext:
privileged: true
- name: hollow-proxy
image: test.cargo.io/release/kubemark:latest
env:
- name: CONTENT_TYPE
valueFrom:
configMapKeyRef:
name: node-configmap
key: content.type
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
command:
- /bin/sh
- -c
- /kubemark --morph=proxy --name=$(NODE_NAME) --kubeconfig=/kubeconfig/kubeproxy.kubeconfig $(CONTENT_TYPE) --alsologtostderr 1>>/var/log/kubeproxy-$(NODE_NAME).log 2>&1
volumeMounts:
- name: kubeconfig-volume
mountPath: /kubeconfig
readOnly: true
- name: logs-volume
mountPath: /var/log
resources:
requests:
cpu: 20m
memory: 50M
- name: hollow-node-problem-detector
image: test.cargo.io/release/node-problem-detector:v0.8.0
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
command:
- /bin/sh
- -c
- /node-problem-detector --system-log-monitors=/config/kernel.monitor --apiserver-override="https://192.168.0.16:6443?inClusterConfig=false&auth=/kubeconfig/npd.kubeconfig" --alsologtostderr 1>>/var/log/npd-$(NODE_NAME).log 2>&1
volumeMounts:
- name: kubeconfig-volume
mountPath: /kubeconfig
readOnly: true
- name: kernelmonitorconfig-volume
mountPath: /config
readOnly: true
- name: no-serviceaccount-access-to-real-master
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
readOnly: true
- name: logs-volume
mountPath: /var/log
resources:
requests:
cpu: 20m
memory: 50M
securityContext:
privileged: true
- 运行e2e性能用例
这里有一个大坑,网上的搜到的文章大多数都是编译e2e的二进制文件直接运行
make WHAT="test/e2e/e2e.test"
./e2e.test --kube-master=192.168.0.16 --host=https://192.168.0.16:6443 --ginkgo.focus="\[Performance\]" --provider=local --kubeconfig=kubemark.kubeconfig --num-nodes=10 --v=3 --ginkgo.failFast --e2e-output-dir=. --report-dir=.
但其实e2e的性能用例已经被移出主库了 https://github.com/kubernetes/kubernetes/pull/83322,所以在2019.10.1之后出的版本用上面的命令是无法运行性能测试的
利用perf-tests库进行性能测试(切换到B集群的master节点运行)
下载对应kubernetes版本的perf-tests源码 https://github.com/kubernetes/perf-tests
运行测试命令(节点数需要>=100,否则需要改动job.yaml里面的参数)
./run-e2e.sh --testconfig=job.yaml --kubeconfig=config.yaml --provider=local --masterip=192.168.0.16,192.168.0.23,192.168.0.40 --mastername=kube-master-1,kube-master-2,kube-master-3 --master-internal-ip=192.168.0.16,192.168.0.23,192.168.0.40 --enable-prometheus-server --tear-down-prometheus-server=false
问题调试:
如果targets的端口需要修改,直接编辑service并修改相应的endpoints
-
收集etcd-metrics的时候会报错: etcdmetrics: failed to collect etcd database size,是由于脚本直接通过2379无法拉取数据,需要修改源码:
perf-tests/clusterloader2/pkg/measurement/common/etcd_metrics.go "curl http://localhost:2379/metrics" -> "curl -L https://localhost:2379/metrics --key /etc/kubernetes/etcd/etcd.key --cert /etc/kubernetes/etcd/etcd.crt --insecure""
-
prometheus的target列表中有几个接口可能没法直接收集数据,会报401的权限问题,解决办法如下
编辑对应的servicemonitor,在endpoints字段里加上bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 5s
port: apiserver
scheme: https
tlsConfig:
insecureSkipVerify: true
kubelet加上--authentication-token-webhook=true,--authorization-mode=Webhook并重启
把system:kubelet-api-admin的ClusterRole绑定到prometheus-k8s的ServiceAccount上面
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus-k8s-1
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kubelet-api-admin
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: monitoring