1. Alertmanager 告警配置
前面Alertmanager控制器已部署好了alertmanager实例
# kubectl get po -n monitoring |grep alertmanager
alertmanager-main-0 2/2 Running 0 14m
1.2. 修改alertmanager配置
因为默认的配置并不能满足告警需求,需要进行修改
用operator部署的alertmanager配置默认是通过base64加密后通过secret挂载到容器中的,所以接下来我们去修改
#进入你存放prometheus-operator配置的默认路径下
cd /opt/basic-server-charts/prometheus/kube-prometheus-0.10.0/manifests/
cp alertmanager-secret.yaml{,.bak}
vim alertmanager-secret.yaml
apiVersion: v1
kind: Secret
metadata:
labels:
app.kubernetes.io/component: alert-router
app.kubernetes.io/instance: main
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.23.0
name: alertmanager-main
namespace: monitoring
stringData:
alertmanager.yaml: |-
"global": #全局配置
"resolve_timeout": "5m" # 如果在 resolve_timeout 时间内,相关条件再次达到触发阈值,警报将保持在 "firing" 状态,即未解决状态。如果在 resolve_timeout 时间内,相关条件没有再次触发,警报将被标记为 "resolved"(已解决)。被标记为 "resolved" 的警报通常不再发送通知。
"inhibit_rules": #抑制规则
- "equal": #当源警报的 "namespace" 和 "alertname" 与目标被抑制的警报的 "namespace" 和 "alertname" 匹配,并且源警报的严重性为 "critical",同时目标被抑制的警报的严重性为 "warning" 或 "info" 时,触发了抑制规则,目标被抑制的警报将不会发送通知
- "namespace"
- "alertname"
"source_matchers":
- "severity = critical"
"target_matchers":
- "severity =~ warning|info"
- "equal":
- "namespace"
- "alertname"
"source_matchers":
- "severity = warning"
"target_matchers":
- "severity = info"
"receivers":
- "name": "prometheusalert"
"webhook_configs":
- "url": 'http://prometheusalert-dingding:8080/prometheusalert?type=dd&tpl=prometheus-dd&ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx'
"route": #根路由,该模块用于该根路由下的节点及子路由routes的定义. 子树节点如果不对相关配置进行配置,则默认会从父路由树继承该配置选项。每一条告警都要进入route,即要求配置选项group_by的值能够匹配到每一条告警的至少一个labelkey(即通过POST请求向altermanager服务接口所发送告警的labels项所携带的<labelname>),告警进入到route后,将会根据子路由routes节点中的配置项match_re或者match来确定能进入该子路由节点的告警(由在match_re或者match下配置的labelkey: labelvalue是否为告警labels的子集决定,是的话则会进入该子路由节点,否则不能接收进入该子路由节点).
"group_by":
- "namespace"
"group_interval": "5m" # 再次告警时间间隔
"group_wait": "30s" # 若一组新的告警产生,则会等group_wait后再发送通知,该功能主要用于当告警在很短时间内接连产生时,在group_wait内合并为单一的告警后再发送
"receiver": "prometheusalert" # 默认告警通知接收者
"repeat_interval": "10m" # 如果一条告警通知已成功发送,且在间隔repeat_interval后,该告警仍然未被设置为resolved,则会再次发送该告警通知
"routes": # 子路由树
- "matchers": #匹配到告警规则标签为severity = critical时,将告警发送给prometheusalert
- "severity = critical"
"receiver": "prometheusalert"
- "matchers":
- "severity = warning"
"receiver": "prometheusalert"
type: Opaque
配置修改完成后刷新并重启服务,到此为止,Alertmanager配置完成
kubectl apply -f alertmanager-secret.yaml
kubectl delete po -n monitoring alertmanager-main-0
注:Alertmanager的webhook_configs字段配置详解
接口说明:https://github.com/feiyu563/PrometheusAlert/blob/master/doc/readme/base-restful.md
- "url": 'http://prometheusalert-dingding:8080/prometheusalert?type=dd&tpl=prometheus-dd&ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx'
prometheusalert-dingding : 对应下面prometheusalert-dingding 服务的svc名称
8080: 对应下面prometheusalert-dingding 服务的端口
type=dd
tpl=prometheus-dd&ddurl
2. prometheusalert告警推送
参考:https://github.com/feiyu563/PrometheusAlert
2.1 部署(部署方式有很多种,我这里选择helm部署)
参考:https://github.com/feiyu563/PrometheusAlert/blob/master/doc/readme/base-install.md
helm部署模版支持配置Ingress域名,可在values.yaml中进行配置
git clone https://github.com/feiyu563/PrometheusAlert.git
cd PrometheusAlert/example/helm/prometheusalert
ll
total 40
drwxr-xr-x 4 root root 4096 Oct 20 14:03 ./
drwxr-xr-x 3 root root 4096 Oct 20 13:29 ../
-rw-r--r-- 1 root root 399 Oct 20 13:29 Chart.yaml
drwxr-xr-x 2 root root 4096 Oct 20 15:36 config/
-rw-r--r-- 1 root root 333 Oct 20 13:29 .helmignore
-rw-r--r-- 1 root root 9355 Oct 20 13:29 README.md
drwxr-xr-x 2 root root 4096 Oct 20 14:01 templates/
-rw-r--r-- 1 root root 828 Oct 20 14:03 values.yaml
修改配置,我这里只配置钉钉告警
cd config && cp app.conf{,.bak}
vim app.conf
#---------------------↓全局配置-----------------------
appname = PrometheusAlert
#登录用户名
login_user=prometheusalert
#登录密码
login_password=prometheusalert
#监听地址
httpaddr = "0.0.0.0"
#监听端口
httpport = 8080
runmode = dev
#设置代理 proxy = http://123.123.123.123:8080
proxy =
#开启JSON请求
copyrequestbody = true
#告警消息标题
title=PrometheusAlert
#是否前台输出file or console
logtype=file
#日志文件路径
logpath=logs/prometheusalertcenter.log
#转换Prometheus,graylog告警消息的时区为CST时区(如默认已经是CST时区,请勿开启)
prometheus_cst_time=0
#数据库驱动,支持sqlite3,mysql,postgres如使用mysql或postgres,请开启db_host,db_port,db_user,db_password,db_name的注释
db_driver=sqlite3
#---------------------↓webhook-----------------------
#是否开启钉钉告警通道,可同时开始多个通道0为关闭,1为开启
open-dingding=1
#默认钉钉机器人地址
ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxxxxxxxxxx
#是否开启 @所有人(0为关闭,1为开启)
dd_isatall=1
2.1.1 启动
helm upgrade --install prometheusalert-dingding prometheusalert -n monitoring
# kubectl get po -n monitoring |grep prometheusalert-dingding
prometheusalert-dingding-56d7848dd8-lkjwc 1/1 Running 0 11d
# kubectl get svc -n monitoring |grep prometheusalert-dingding
prometheusalert-dingding ClusterIP 192.168.124.248 <none> 8080/TCP 11d
# kubectl get ing -n monitoring |grep prometheusalert-dingding
prometheusalert-dingding <none> test.prom-alter.test.cn 172.23.11.36 80 11d