背景及监控目的
团队开发的平台采用微服务容器架构,部署在linux虚机服务器上。运行过程中对平台各容器及服务器的状态基本上处于未知的状态,出现问题后需要登陆到服务器上查看。
通过监控想要查看哪些东西:
- 服务器CPU、内存、硬盘、网络等使用情况
- 各容器的运行情况
需要哪些监控组件
- promethus 不再多说,真的牛批
- cadvisor 监控容器
- node-exporter 监控服务器
- grafana 自定义可视化视图
监控部署
四个监控组件都采用容器的部署方式,方便起见docker容器内都用了root用户运行,否则prometus和grafana可能会出现权限问题。prometus访问cadvisor和node-exporter也未鉴权。
promethus
配置文件promethus.yml放在主机/data/prom目录下,
global:
scrape_interval: 10s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 10s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
- job_name: "node-34"
static_configs:
- targets: ['192.168.3.34:58080', '192.168.3.34:9100']
labels:
Node: 192.168.3.34:9100
执行docker run 启动promethus
docker run \
-d -u root \
-p 9090:9090 \
-v /data/prom:/etc/prometheus \
-v "/etc/localtime:/etc/localtime" \
--name=prometheus \
prom/prometheus:v2.30.3
cadvisor
官方镜像 gcr.io被墙,使用docker hub上的cadvisor镜像
docker run \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--volume=/dev/disk/:/dev/disk:ro \
--publish=58080:8080 \
--detach=true \
--name=cadvisor \
--privileged \
--device=/dev/kmsg \
peytonyip/cadvisor:v0.39.2
node-exporter
docker run -d -p 59100:9100 \
-v "/proc:/host/proc" \
-v "/sys:/host/sys" \
-v "/:/rootfs" \
-v "/etc/localtime:/etc/localtime" \
--name=node-exporter \
prom/node-exporter
grafana
docker run \
-d -u root --name=grafana \
-p 3000:3000 \
-v "/etc/localtime:/etc/localtime" \
-v /data/grafana:/var/lib/grafana \
grafana/grafana-enterprise:8.2.1
grafana 可视化dashboards
配置promethus数据源
设置promethus的 ip、port
grafana官方有很多dashboard 模板,导入即可使用