一句话总结:跟 prometheus 结合,监控网页。
更准确的描述如下:
基于blackbox_exporter实现对URL状态、IP可用性、端口状态、TLS证书的过期时间监控。
我们监控主机的资源用量、容器的运行状态、数据库中间件的运行数据。这些都是支持业务和服务的基础设施,通过白盒能够了解其内部的实际运行状态,通过对监控指标的观察能够预判可能出现的问题,从而对潜在的不确定因素进行优化。
而从完整的监控逻辑的角度,除了大量的应用白盒监控以外,还应该添加适当的黑盒监控。黑盒监控即以用户的身份测试服务的外部可见性,常见的黑盒监控包括HTTP探针、TCP探针等用于检测站点或者服务的可访问性,以及访问效率等。
黑盒监控相较于白盒监控最大的不同在于黑盒监控是以故障为导向当故障发生时,黑盒监控能快速发现故障,而白盒监控则侧重于主动发现或者预测潜在的问题。一个完善的监控目标是要能够从白盒的角度发现潜在问题,能够在黑盒的角度快速发现已经发生的问题。
一、blackbox_exporter介绍
Blackbox Exporter是Prometheus社区提供的官方黑盒监控解决方案,其允许用户通过:HTTP、HTTPS、DNS、TCP以及ICMP的方式对网络进行探测。
HTTP/HTPPS: URL/API 可用性检测
TCP: 端口监听检测
ICMP: 主机存活检测
DNS: 域名解析
二、安装部署blackbox_exporter
假设Prometheus,Alert Manager 已经安装配置。
安装
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.22.0/blackbox_exporter-0.22.0.linux-amd64.tar.gz
tar -xf blackbox_exporter-0.22.0.linux-amd64.tar.gz -C /apps/
cd /apps/
mv blackbox_exporter-0.22.0.linux-amd64/ blackbox_exporter
自启动
[root@monitoring ~]# vim /etc/systemd/system/blackbox-exporter.service
[root@monitoring ~]# cat /etc/systemd/system/blackbox-exporter.service
[Unit]
Description=Prometheus Blackbox Exporter
After=network.target
[Service]
Type=simple
User=root
Group=root
ExecStart=/apps/blackbox_exporter/blackbox_exporter \
--config.file=/apps/blackbox_exporter/blackbox.yml \
--web.listen-address=:9115
Restart=on-failure
[Install]
WantedBy=multi-user.target
[root@monitoring ~]# systemctl enable --now blackbox-exporter.service
Created symlink /etc/systemd/system/multi-user.target.wants/blackbox-exporter.service → /etc/systemd/system/blackbox-exporter.service.
[root@monitoring ~]# systemctl status blackbox-exporter.service
● blackbox-exporter.service - Prometheus Blackbox Exporter
Loaded: loaded (/etc/systemd/system/blackbox-exporter.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2022-09-27 16:56:04 CST; 1min 13s ago
Main PID: 29832 (blackbox_export)
Tasks: 8 (limit: 49440)
Memory: 4.9M
CGroup: /system.slice/blackbox-exporter.service
└─29832 /apps/blackbox_exporter/blackbox_exporter --config.file=/apps/blackbox_exporter/blackbox.yml --web.listen-address=:9115
Sep 27 16:56:04 monitoring systemd[1]: Started Prometheus Blackbox Exporter.
...
Creating BlackBox job in Prometheus
Go to the installation directory of our Prometheus e.g. /opt/prometheus
and edit the prometheus.yml
file
under scrape_configs
add a new job named blackbox
with the following snippet
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx] # Look for a HTTP 200 response.
static_configs:
- targets:
- https://gritfy.com
- https://www.google.com
- https://middlewareinventory.com
- https://devopsjunction.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115
you can change the URLs of your choice. but the relabel_configs
should remain the same.
这里针对每一个探针服务(如http_2xx)定义一个采集任务,并且直接将任务的采集目标定义为我们需要探测的站点。在采集样本数据之前通过relabel_configs对采集任务进行动态设置。
通过以上3个relabel步骤,即可大大简化Prometheus任务配置的复杂度
Configuring Alerts and Rules in Prometheus
As part of Alert triggering, we are going to setup alerts for two scenarios
- SSLCertExpiringSoon ( with in 24 days )
- TargetUrlDown (Endpoint down or returning invalid response)
To generate alerts we need to create rules in Prometheus first.
If the rules are satisfied Prometheus
would send the alert to AlertManager
Create a new Rule file
Go to Prometheus installation directory i.e /opt/prometheus
and create a new directory named rules
under rules directory. create a new file named blackbox-rules.yml
groups:
- name: Blackbox rules
rules:
- alert: SSLCertExpiringSoon
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 24
for: 1m
labels:
severity: warning
annotations:
description: "TLS certificate will expire in {{ $value | humanizeDuration }} (instance {{ $labels.instance }})"
- alert: EndpointDown
expr: probe_success == 0
for: 10m
labels:
severity: "critical"
annotations:
summary: "Endpoint {{ $labels.instance }} down"
for the SSL Cert Expiry: If the earliest cert expiry value is below 86400 * 24
24 days it would trigger an alert.
Adding a rule into prometheus.yml
the rule file can now be added into our prometheus.yml
configuration file
under the rule_files
add our recently created rule filename rules/blackbox-rules.yml
rule_files:
- "rules/blackbox-rules.yml"
Enabling the alert manager in prometheus.yml
While you are adding new rules. you have to also enable the alertmanager
configuration which is disabled by default
It should point to the localhost:9093
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
refs:
https://www.jianshu.com/p/0c2d2528f310