Profiling and monitoring
以 etcd 3.5 为例,其默认不会开启 pprof,因此可以修改其启动参数
--enable-pprof 'false'
Enable runtime profiling data via HTTP server. Address is at client URL + "/debug/pprof/"
--metrics 'basic'
Set level of detail for exported metrics, specify 'extensive' to include server side grpc histogram metrics.
--listen-metrics-urls ''
List of URLs to listen on for the metrics and health endpoints.
pprof
由于 google/pprof 支持 TLS 认证,因此可以直接下载使用 pprof 工具,如下所示:
go install github.com/google/pprof@latest
pprof -tls_cert /etc/kubernetes/pki/etcd/peer.crt -tls_key /etc/kubernetes/pki/etcd/peer.key -tls_ca /etc/kubernetes/pki/etcd/ca.crt https://10.9.8.72:2379/debug/pprof/heap
/etc/kubernetes/manifests > pprof -tls_cert /etc/kubernetes/pki/etcd/peer.crt -tls_key /etc/kubernetes/pki/etcd/peer.key -tls_ca /etc/kubernetes/pki/etcd/ca.crt https://10.9.8.72:2379/debug/pprof/heap
Fetching profile over HTTP from https://10.9.8.72:2379/debug/pprof/heap
Saved profile in /root/pprof/pprof.etcd.alloc_objects.alloc_space.inuse_objects.inuse_space.002.pb.gz
File: etcd
Type: inuse_space
Time: Sep 8, 2023 at 6:18pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 28101.11kB, 83.20% of 33775.50kB total
Showing top 10 nodes out of 169
flat flat% sum% cum cum%
11317.15kB 33.51% 33.51% 11317.15kB 33.51% go.etcd.io/etcd/raft/v3/raftpb.(*Entry).Unmarshal
4839.24kB 14.33% 47.83% 4839.24kB 14.33% go.etcd.io/etcd/server/v3/etcdserver/api/rafthttp.startStreamWriter
2419.62kB 7.16% 55.00% 7258.86kB 21.49% go.etcd.io/etcd/server/v3/etcdserver/api/rafthttp.startPeer
2368.55kB 7.01% 62.01% 2368.55kB 7.01% go.etcd.io/etcd/server/v3/etcdserver/api/rafthttp.newMsgAppV2Decoder
2368.55kB 7.01% 69.02% 2368.55kB 7.01% go.etcd.io/etcd/server/v3/etcdserver/api/rafthttp.newMsgAppV2Encoder
1184.27kB 3.51% 72.53% 1184.27kB 3.51% go.etcd.io/etcd/server/v3/wal.newEncoder
1027.18kB 3.04% 75.57% 1027.18kB 3.04% vendor/golang.org/x/net/http2/hpack.(*headerFieldTable).addEntry
1024.54kB 3.03% 78.60% 1024.54kB 3.03% crypto/tls.(*Conn).readHandshake
783.76kB 2.32% 80.93% 783.76kB 2.32% go.etcd.io/etcd/raft/v3.(*MemoryStorage).Append
768.26kB 2.27% 83.20% 768.26kB 2.27% go.uber.org/zap/zapcore.newCounters
也可以使用如下容器的形式:
NODE_IP='10.9.8.72'
docker run --rm -ti -p 10086:10086 -e NODE_IP=${NODE_IP} \
-v /etc/kubernetes/pki/etcd/:/etc/kubernetes/pki/etcd/:ro \
registry.cn-shenzhen.aliyuncs.com/rancher/tools \
pprof \
-tls_ca /etc/kubernetes/pki/etcd/ca.crt \
-tls_cert /etc/kubernetes/pki/etcd/peer.crt \
-tls_key /etc/kubernetes/pki/etcd/peer.key \
-http=0.0.0.0:10086 https://`echo ${NODE_IP}`:2379/debug/pprof/heap
docker run --rm -ti -e NODE_IP=${NODE_IP} \
-v /etc/kubernetes/pki/etcd/:/etc/kubernetes/pki/etcd/:ro \
registry.cn-shenzhen.aliyuncs.com/rancher/tools \
pprof \
-tls_ca /etc/kubernetes/pki/etcd/ca.crt \
-tls_cert /etc/kubernetes/pki/etcd/peer.crt \
-tls_key /etc/kubernetes/pki/etcd/peer.key \
https://`echo ${NODE_IP}`:2379/debug/pprof/heap
NOTE: go tool pprof 在 issue https://github.com/golang/go/issues/20939 中说是支持 https,但是试了一下好像不行。
/etc/kubernetes/manifests > go tool pprof https+insecure://10.9.8.72:2379/debug/pprof/heap
Fetching profile over HTTP from https+insecure://10.9.8.72:2379/debug/pprof/heap
https+insecure://10.9.8.72:2379/debug/pprof/heap: Get "https://10.9.8.72:2379/debug/pprof/heap": remote error: tls: bad certificate
failed to fetch any source profiles
metrics
curl --cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/peer.crt \
--key /etc/kubernetes/pki/etcd/peer.key \
https://localhost:2379/metrics
curl http://127.0.0.1:2381/metrics
参考链接
https://www.xtplayer.cn/etcd/etcd-pprof-performance-analysis/
https://etcd.io/docs/v3.5/op-guide/configuration/#profiling-and-monitoring
https://github.com/google/pprof
https://etcd.io/docs/v3.5/op-guide/monitoring/