在使用nginx的反向代理过程中,对于服务器节点的健康检查和故障转移很重要。
早期使用nginx的时候,用来做故障转移用到的是如下配置(比较粗暴)
upstream portals
{
server172.16.68.134:8082 max_fails=1 fail_timeout=5;
server172.16.68.135:8082 max_fails=1 fail_timeout=5;
server172.16.68.136:8082 max_fails=1 fail_timeout=5;
server172.16.68.137:8082 max_fails=1 fail_timeout=5;
}
经过实际测试,在5s内,如果第一个服务器节点都不返回,在这5s内,请求不会向这台服务器转发,5s的超时时间到了,再次发起请求,就按照轮转规则,该到这台服务器还是会过去,这时候再经历5s,请求不会到这台服务器。这样子实际达不到想要的效果,在服务没恢复以前,请求不到这台服务器。
之后在网上找关于nginx健康检查的模块、组件 ,找到了淘宝的nginx_upstream_check_module。
安装过程比较简单,nginx增加这个module的编译即可。
Github地址:
https://github.com/yaoweibin/nginx_upstream_check_module
下载地址:
https://pan.baidu.com/s/1cSdTwiQPlIxGahjXzdCuEA
nginx、pcre的编译目录均默认/home/soft,将下载的nginx_upstream_check_module-0.3.0.tar.gz上传至/home/soft后解压:
tar -zxvf nginx_upstream_check_module-0.3.0.tar.gz
重新编译:
#进入编译目录
cd /home/soft/nginx-1.10.1
#打补丁
patch -p0 < ../nginx_upstream_check_module-0.3.0/check_1.11.1+.patch
#确认configure参数与现网一致,只增加一个module
./configure --prefix=/usr/local/nginx --with-pcre=/home/soft/pcre-8.36/ --with-http_stub_status_module --with-http_ssl_module --add-module=/home/soft/nginx_upstream_check_module-0.3.0/
#执行make
make
#备份现网nginx执行文件
cd /usr/local/nginx/sbin
mv nginx nginx.old.20181016
#拷贝升级后的执行文件到现网目录
cp /home/soft/nginx-1.10.1/objs/nginx /usr/local/nginx/sbin
#测试nginx版本及是否正常
./nginx -V
./nginx -t
在nginx中用到的配置是:
upstream portals {
server 192.166.62.137:8080;
server 192.166.66.85:8080;
server 192.166.62.231:8080;
server 192.166.66.88:8080;
check interval=5000 rise=2 fall=5 timeout=1000 type=http;
check_http_send"HEAD / HTTP/1.0\r\n\r\n"; check_http_expect_alive http_2xx http_3xx;
}
server {
listen 8080;
charset utf-8;
location /status {
check_status;
access_log off;
#allow 192.166.62.25;
#deny all;
}
location / {
proxy_pass http://portal_service_pool;
index index.html;
}
interval间隔5s,连续失败5次,连续成功2次,超时时间1s,使用http协议,发送一个请求头,如果是2xx或者3xx状态(比如200,302等)表示服务正常运行。
可以开启注释的配置,以使只有固定ip可以查看status页面,其他ip无法访问此location。
修改完成后reload nginx使配置生效。
sbin/nginx -s reload
观察主动的健康检查效果:
在一台服务器上执行ab并发测试:
ab -n 20000 -c 10 "http://192.166.62.104:8080/PortalServer-App/index.html"
查看104上nginx日志:
tail -f logs/access.log
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.62.231:8080""0.002"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.66.88:8080""0.002"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.66.88:8080""0.002"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 567 "-" "ApacheBench/2.3" "-" "192.166.62.137:8080""0.001"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.62.231:8080""0.001"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 567 "-" "ApacheBench/2.3" "-" "192.166.62.137:8080""0.000"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.66.88:8080""0.002"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.62.231:8080""0.001"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 567 "-" "ApacheBench/2.3" "-" "192.166.62.137:8080""0.001"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.62.231:8080""0.001"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.66.88:8080""0.002"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.66.88:8080""0.002"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 567 "-" "ApacheBench/2.3" "-" "192.166.62.137:8080""0.001"
192.166.62.100 - - [16/Oct/2018:13:46:44 +0800] "GET /PortalServer-App/index.html HTTP/1.0" 200 541 "-" "ApacheBench/2.3" "-" "192.166.62.231:8080""0.001"
可见status中正常的后端可以被负载到,从而实现主动健康检查的效果。