1 本文背景
作为云行业的新人,把在云上构建系统的一点一滴记录下来,有坑填坑,没坑挖坑再填平,同时也希望能给看到此文章的人提供一定的实操及经验指南。
下文中所有云中操作均以京东云为基础,但道理都是一脉相通的,只要能做到触类旁通,往什么云上漂都是来去自由的。
2 概念摘要
京东云中应用负载均衡的具体概念和描述参见其产品文档,这里罗列一些笔者学习时重要的点:
- 应用负载均衡提供基于4层(TCP、TLS)、7层(HTTP、HTTPS)的流量监听、转发服务
- 京东云的应用负载均衡基于nginx架构实现
3 负载均衡架构图
这里并不画已经存在于京东云文档中的架构图,这里描述的是本文所搭建的应用负载均衡的具体样例架构图,之后的部署、性能测试均已此图为准。
4 负载均衡节点准备
4.1 准备用于镜像的应用节点
先在京东云上准备一个单核4G内存的CentOS 7.4 64云主机
VPC: manan_test
sub_net: manan_sub_lb
AZ:可用区A
4.1.1 安装JDK
[root@manan-inst-tomcat-base ~]# ls
apache-tomcat-9.0.21.tar.gz jdk-12.0.1_linux-x64_bin.rpm
[root@manan-inst-tomcat-base ~]# rpm -ivh jdk-12.0.1_linux-x64_bin.rpm
warning: jdk-12.0.1_linux-x64_bin.rpm: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEY
Preparing... ################################# [100%]
Updating / installing...
1:jdk-12.0.1-2000:12.0.1-ga ################################# [100%]
[root@manan-inst-tomcat-base ~]# java -version
java version "12.0.1" 2019-04-16
Java(TM) SE Runtime Environment (build 12.0.1+12)
Java HotSpot(TM) 64-Bit Server VM (build 12.0.1+12, mixed mode, sharing)
[root@manan-inst-tomcat-base ~]# rpm -q --whatprovides java
jdk-12.0.1-12.0.1-ga.x86_64
[root@manan-inst-tomcat-base ~]#
4.1.2 安装Tomcat
创建tomcat用户
[root@manan-inst-tomcat-base ~]# adduser tomcat
[root@manan-inst-tomcat-base ~]# passwd tomcat
Changing password for user tomcat.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[root@manan-inst-tomcat-base ~]#
su到tomcat
[root@manan-inst-tomcat-base ~]# su - tomcat
[tomcat@manan-inst-tomcat-base ~]$
把apache-tomcat-9.0.21.tar.gz解压到/tomcat
[root@manan-inst-tomcat-base /]# mkdir tomcat
[root@manan-inst-tomcat-base /]# chown tomcat:tomcat tomcat/
[root@manan-inst-tomcat-base /]# su - tomcat
[root@manan-inst-tomcat-base /]# su - tomcat
Last login: Tue Jun 11 20:45:24 CST 2019 on pts/1
[tomcat@manan-inst-tomcat-base ~]$ ls
[tomcat@manan-inst-tomcat-base ~]$ cd /tmp
[tomcat@manan-inst-tomcat-base tmp]$ tar zxvf apache-tomcat-9.0.21.tar.gz -C /tomcat
apache-tomcat-9.0.21/conf/
apache-tomcat-9.0.21/conf/catalina.policy
apache-tomcat-9.0.21/conf/catalina.properties
....
apache-tomcat-9.0.21/bin/tool-wrapper.sh
apache-tomcat-9.0.21/bin/version.sh
检查解压结果
[tomcat@manan-inst-tomcat-base tmp]$ cd /tomcat/
[tomcat@manan-inst-tomcat-base tomcat]$ ls
apache-tomcat-9.0.21
[tomcat@manan-inst-tomcat-base tomcat]$ cd apache-tomcat-9.0.21/
[tomcat@manan-inst-tomcat-base apache-tomcat-9.0.21]$ ls -lrt
total 124
drwxr-x--- 2 tomcat tomcat 6 Jun 5 04:19 work
drwxr-x--- 2 tomcat tomcat 6 Jun 5 04:19 logs
drwxr-x--- 7 tomcat tomcat 81 Jun 5 04:21 webapps
-rw-r----- 1 tomcat tomcat 16262 Jun 5 04:23 RUNNING.txt
-rw-r----- 1 tomcat tomcat 6852 Jun 5 04:23 RELEASE-NOTES
-rw-r----- 1 tomcat tomcat 3255 Jun 5 04:23 README.md
-rw-r----- 1 tomcat tomcat 2333 Jun 5 04:23 NOTICE
-rw-r----- 1 tomcat tomcat 57092 Jun 5 04:23 LICENSE
-rw-r----- 1 tomcat tomcat 5407 Jun 5 04:23 CONTRIBUTING.md
drwx------ 2 tomcat tomcat 238 Jun 5 04:23 conf
-rw-r----- 1 tomcat tomcat 18982 Jun 5 04:23 BUILDING.txt
drwxr-x--- 2 tomcat tomcat 4096 Jun 11 20:56 lib
drwxr-x--- 2 tomcat tomcat 30 Jun 11 20:56 temp
drwxr-x--- 2 tomcat tomcat 4096 Jun 11 20:56 bin
[tomcat@manan-inst-tomcat-base apache-tomcat-9.0.21]$
设置CATALINA_HOME和JAVA_HOME
[tomcat@manan-inst-tomcat-base ~]$ vi .bash_profile
export CATALINA_HOME=/tomcat/apache-tomcat-9.0.21
export JAVA_HOME=/usr/java/jdk-12.0.1
[tomcat@manan-inst-tomcat-base ~]$ . .bash_profile
[tomcat@manan-inst-tomcat-base ~]$ set | grep CATALINA
CATALINA_HOME=/tomcat/apache-tomcat-9.0.21
[tomcat@manan-inst-tomcat-base bin]$ set | grep JAVA
JAVA_HOME=/usr/java/jdk-12.0.1
启动测试
[tomcat@manan-inst-tomcat-base bin]$ ./startup.sh
Using CATALINA_BASE: /tomcat/apache-tomcat-9.0.21
Using CATALINA_HOME: /tomcat/apache-tomcat-9.0.21
Using CATALINA_TMPDIR: /tomcat/apache-tomcat-9.0.21/temp
Using JRE_HOME: /usr/java/jdk-12.0.1
Using CLASSPATH: /tomcat/apache-tomcat-9.0.21/bin/bootstrap.jar:/tomcat/apache-tomcat-9.0.21/bin/tomcat-juli.jar
Tomcat started.
[tomcat@manan-inst-tomcat-base bin]$
首页访问测试
停止tomcat进程
[tomcat@manan-inst-tomcat-base bin]$ ./shutdown.sh
Using CATALINA_BASE: /tomcat/apache-tomcat-9.0.21
Using CATALINA_HOME: /tomcat/apache-tomcat-9.0.21
Using CATALINA_TMPDIR: /tomcat/apache-tomcat-9.0.21/temp
Using JRE_HOME: /usr/java/jdk-12.0.1
Using CLASSPATH: /tomcat/apache-tomcat-9.0.21/bin/bootstrap.jar:/tomcat/apache-tomcat-9.0.21/bin/tomcat-juli.jar
NOTE: Picked up JDK_JAVA_OPTIONS: --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED
[tomcat@manan-inst-tomcat-base bin]$
4.2 制作应用服务器的镜像
4.2.1 停止4.1中制作好的云主机
[root@manan-inst-tomcat-base ~]# shutdown -h now
Connection closing...Socket close.
Connection closed by foreign host.
Disconnected from remote host(114.67.69.210_manan) at 10:11:19.
Type `help' to learn how to use Xshell prompt.
[D:\~]$
确认停止状态
4.2.2 制作镜像
4.3 用镜像制作第二个应用节点
注意第二个应用节点的与第一个网络节点要在同一个VPC及sub_net下的不同可用区
VPC: manan_test
sub_net: manan_sub_lb
AZ:可用区B
4.4 启动两个应用节点
之后按照4.1.2中的方法启动两个节点的tomcat,这里不在截图熬述
5 配置应用负载均衡
5.1 新建应用负载均衡实例
注意:
- 新建的应用负载均衡实例要与后端服务器在同一VPC
- 如果后端服务器分别在不同可用区,那我们应该在相应可用区都建立负载均衡实例,如图lb_create_3.png
5.2 为负载均衡实例配置监听器
配置监听器对外监听的端口
配置监听器向后端转发的端口、协议及策略,我们把这些归为一个逻辑概念——后端服务
配置监听器判断后端服务是否健康的判断条件
添加虚拟服务器组,把真正需要被负载均衡的服务器添加到此组
监听器、后端服务、虚拟服务器组等配置完成
5.3 检查负载均衡功能
5.3.1 调整后端服务器应用
为了判断负载均衡器生效,我们需要在两个后端应用服务器上修改一个Web应用,增添标记,以从前端页面可以判断出是不同服务器对用户请求做出的响应。
[tomcat@manan-inst-tomcat-base simpletag]$ pwd
/tomcat/apache-tomcat-9.0.21/webapps/examples/jsp/jsp2/simpletag
[tomcat@manan-inst-tomcat-base simpletag]$ vi hello.jsp
<%@ taglib prefix="mytag" uri="/WEB-INF/jsp2/jsp2-example-taglib.tld" %>
<html>
<head>
<title>JSP 2.0 Examples - Hello World SimpleTag Handler</title>
</head>
<body>
<h1>JSP 2.0 Examples - Hello World SimpleTag Handler</h1>
<hr>
<p>This tag handler simply echos "Hello, World!" It's an example of
a very basic SimpleTag handler with no body.</p>
<br>
<b><u>Result:</u></b>
<mytag:helloWorld/> node1
</body>
</html>
[tomcat@manan-inst-tomcat simpletag]$ pwd
/tomcat/apache-tomcat-9.0.21/webapps/examples/jsp/jsp2/simpletag
[tomcat@manan-inst-tomcat simpletag]$ vi hello.jsp
<%@ taglib prefix="mytag" uri="/WEB-INF/jsp2/jsp2-example-taglib.tld" %>
<html>
<head>
<title>JSP 2.0 Examples - Hello World SimpleTag Handler</title>
</head>
<body>
<h1>JSP 2.0 Examples - Hello World SimpleTag Handler</h1>
<hr>
<p>This tag handler simply echos "Hello, World!" It's an example of
a very basic SimpleTag handler with no body.</p>
<br>
<b><u>Result:</u></b>
<mytag:helloWorld/> node2
</body>
</html>
5.3.2 访问应用测试负载均衡生效
重复访问网址:http://localhost:30002/examples/jsp/jsp2/simpletag/hello.jsp, 我们会发现返回的页面在node1和node2之间来回切换,这说明应用负载均衡已经在正常工作了。
注意:上面给出的网址中带有localhost,这是因为笔者并未配置公网的负载均衡服务器,连入京东云负载均衡服务器的方式为SSH端口转发,这不是本文的目标内容,这里不做熬述。
第一次访问:
再刷新一下:
6 性能测试
性能问题是一个非常复杂和难以阐述清楚的问题,影响性能的因素实在太多:CPU,内存,硬盘,OS及其参数,中间件及其配置参数,应用架构,代码质量等等,这些都使我们很难给出一个保证:这个机器跑成这样就非常理想了。
本文对性能方面的阐述的目标是:检查使用了京东云的应用负载均衡器对比单应用服务器而言,能够承载的用户并发量和每秒请求数能够有怎样程度的提高。为此,我们做出如下约定:
- 本文只对比单应用服务器与负载均衡器+双应用服务器的数据
- 对比过程中的应用服务器的配置,除了主机名,IP及返回页面标志以外,其他各方面均一致
- 应用服务器除了本文所列出的内容外,未做任何特殊改动
- 应用服务器所使用的应用是一个非常简单的JSP,见谅避免数据库读写、磁盘读写等其他因素带来的影响
- 公网网络环境复杂,带宽有限,因此性能测试的所有步骤都在京东云内部网实施
- 在测试节点上做以下修改,增大可以打开的文件数
[root@manan-inst-tomcat-base ~]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15085
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 15085
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[root@manan-sub1-A-inst1 ~]# ulimit -n 65535
[root@manan-sub1-A-inst1 ~]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31212
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 31212
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
- 应用服务器上需要做以下修改,禁用系统保护
[root@manan-inst-tomcat-base ~]# vim /etc/sysctl.conf
net.ipv4.tcp_syncookies = 0
[root@manan-inst-tomcat-base ~]# sysctl -p
net.ipv4.tcp_syncookies = 0
6.1 压测出单应用服务器的性能
经过反复试验,在测试节点上发起并发为2000的800000个请求的时候,应用服务器CPU使用率达到70%以上,load average高达14.73,已经构成了相当大的负载。
测试节点:
[root@manan-sub1-A-inst1 ~]# ab -n 800000 -c 2000 -r http://10.0.48.9:8080/examples/jsp/jsp2/simpletag/hello.jsp
This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 10.0.48.9 (be patient)
Completed 80000 requests
Completed 160000 requests
Completed 240000 requests
Completed 320000 requests
Completed 400000 requests
Completed 480000 requests
Completed 560000 requests
Completed 640000 requests
Completed 720000 requests
Completed 800000 requests
Finished 800000 requests
Server Software:
Server Hostname: 10.0.48.9
Server Port: 8080
Document Path: /examples/jsp/jsp2/simpletag/hello.jsp
Document Length: 379 bytes
Concurrency Level: 2000
Time taken for tests: 117.972 seconds
Complete requests: 800000
Failed requests: 0
Write errors: 0
Total transferred: 480000000 bytes
HTML transferred: 303200000 bytes
Requests per second: 6781.24 [#/sec] (mean)
Time per request: 294.931 [ms] (mean)
Time per request: 0.147 [ms] (mean, across all concurrent requests)
Transfer rate: 3973.39 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 2 197 1468.1 2 63155
Processing: 2 27 56.0 15 1659
Waiting: 2 27 55.5 15 1659
Total: 3 225 1470.6 18 63336
Percentage of the requests served within a certain time (ms)
50% 18
66% 21
75% 24
80% 26
90% 220
95% 1022
98% 3019
99% 3036
100% 63336 (longest request)
应用服务器节点
top - 16:10:23 up 4:28, 1 user, load average: 14.73, 5.15, 2.00
Tasks: 87 total, 3 running, 84 sleeping, 0 stopped, 0 zombie
%Cpu(s): 74.2 us, 9.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 16.7 si, 0.0 st
KiB Mem : 3881804 total, 2324396 free, 898560 used, 658848 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 2738864 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11380 tomcat 20 0 3377808 758332 22116 S 99.0 19.5 2:25.03 java
应用服务器的性能监控图表
6.2 加入负载均衡器后的性能测试
6.2.1 相同压力下的负载均衡服务器和应用服务器的表现
在测试节点上仍旧发起并发为2000的800000个请求,只不过这回的对象是负载均衡器
[root@manan-sub1-A-inst1 ~]# ab -n 800000 -c 2000 -r http://10.0.32.4:8080/examples/jsp/jsp2/simpletag/hello.jsp
This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 10.0.32.4 (be patient)
Completed 80000 requests
Completed 160000 requests
Completed 240000 requests
Completed 320000 requests
Completed 400000 requests
Completed 480000 requests
Completed 560000 requests
Completed 640000 requests
Completed 720000 requests
Completed 800000 requests
Finished 800000 requests
Server Software: nginx
Server Hostname: 10.0.32.4
Server Port: 8080
Document Path: /examples/jsp/jsp2/simpletag/hello.jsp
Document Length: 379 bytes
Concurrency Level: 2000
Time taken for tests: 88.741 seconds
Complete requests: 800000
Failed requests: 22547
(Connect: 0, Receive: 0, Length: 22547, Exceptions: 0)
Write errors: 0
Non-2xx responses: 22547
Total transferred: 485145712 bytes
HTML transferred: 298397489 bytes
Requests per second: 9014.96 [#/sec] (mean)
Time per request: 221.853 [ms] (mean)
Time per request: 0.111 [ms] (mean, across all concurrent requests)
Transfer rate: 5338.83 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 24 169.1 0 3009
Processing: 1 192 551.4 36 4101
Waiting: 1 192 551.4 36 4101
Total: 2 217 574.0 36 6029
Percentage of the requests served within a certain time (ms)
50% 36
66% 45
75% 52
80% 60
90% 1036
95% 1061
98% 3008
99% 3012
100% 6029 (longest request)
我们可以看到最显著的两点:
- 总耗费时间(Time taken for tests)从117.972 seconds下降为88.741 seconds,时间缩短了24.%
- 最长请求(longest request)从63336ms下降到了6029ms,时间缩短了90.5%
接下来再来看看应用服务器上的图表:
负载均衡器:
应用服务器1:
应用服务器2:
对比应用服务器1在单服务器模式下的表现,CPU占用率从80%以上跌落到60%以下;另外一台应用服务器吃的CPU峰值也未达到80%;表现尚可。
6.2.2 尝试负载均衡服务器所能承受的负载
虽然笔者一开始并没有直接把负载提高为单服务器时的两倍,但最后该负载均衡架构还是完成了测试,即并发为4000的1600000个请求
测试节点:
[root@manan-sub1-A-inst1 ~]# ab -n 1600000 -c 4000 -r http://10.0.32.4:8080/examples/jsp/jsp2/simpletag/hello.jsp
This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 10.0.32.4 (be patient)
Completed 160000 requests
Completed 320000 requests
Completed 480000 requests
Completed 640000 requests
Completed 800000 requests
Completed 960000 requests
Completed 1120000 requests
Completed 1280000 requests
Completed 1440000 requests
Completed 1600000 requests
Finished 1600000 requests
Server Software: nginx
Server Hostname: 10.0.32.4
Server Port: 8080
Document Path: /examples/jsp/jsp2/simpletag/hello.jsp
Document Length: 379 bytes
Concurrency Level: 4000
Time taken for tests: 555.016 seconds
Complete requests: 1600000
Failed requests: 746305
(Connect: 0, Receive: 81, Length: 746143, Exceptions: 81)
Write errors: 0
Non-2xx responses: 746062
Total transferred: 757147337 bytes
HTML transferred: 447458095 bytes
Requests per second: 2882.80 [#/sec] (mean)
Time per request: 1387.539 [ms] (mean)
Time per request: 0.347 [ms] (mean, across all concurrent requests)
Transfer rate: 1332.22 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 21 145.2 0 3008
Processing: 1 1311 1788.3 67 106096
Waiting: 0 1311 1788.0 67 106096
Total: 1 1332 1805.0 68 106096
Percentage of the requests served within a certain time (ms)
50% 68
66% 3006
75% 3008
80% 3008
90% 3008
95% 3010
98% 3045
99% 4010
100% 106096 (longest request)
[root@manan-sub1-A-inst1 ~]#
尽管完成了测试,但是可以看到Failed requests中已经开始有Exception类别的错误,longest request也高达106096ms, 这实际上说明应用服务已经处于崩溃的边缘了。相对来说,下面并发为4000涌来1200000个请求时,负载均衡器表现的更稳定一些,没有Exception,每秒能处理8629.85个请求,相对于并发为2000涌来800000个请求的时候,处理速度(9014.96)没有明显的下降,相对于单应用服务器的时候(6781.24)有21.4%的提升。也就是说负载均衡架构能够带来的性能提升是:
- 处理速度提升20%左右
- 吞吐量提升70%左右(并发:2000-->400; 总数800000-->1200000)
[root@manan-sub1-A-inst1 ~]# ab -n 1200000 -c 4000 -r http://10.0.32.4:8080/examples/jsp/jsp2/simpletag/hello.jsp
This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 10.0.32.4 (be patient)
Completed 120000 requests
Completed 240000 requests
Completed 360000 requests
Completed 480000 requests
Completed 600000 requests
Completed 720000 requests
Completed 840000 requests
Completed 960000 requests
Completed 1080000 requests
Completed 1200000 requests
Finished 1200000 requests
Server Software: nginx
Server Hostname: 10.0.32.4
Server Port: 8080
Document Path: /examples/jsp/jsp2/simpletag/hello.jsp
Document Length: 379 bytes
Concurrency Level: 4000
Time taken for tests: 139.052 seconds
Complete requests: 1200000
Failed requests: 48759
(Connect: 0, Receive: 0, Length: 48759, Exceptions: 0)
Write errors: 0
Non-2xx responses: 48759
Total transferred: 723177264 bytes
HTML transferred: 444414333 bytes
Requests per second: 8629.85 [#/sec] (mean)
Time per request: 463.507 [ms] (mean)
Time per request: 0.116 [ms] (mean, across all concurrent requests)
Transfer rate: 5078.87 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 212 594.2 0 15041
Processing: 1 248 634.7 47 4649
Waiting: 1 248 634.7 47 4649
Total: 1 460 910.7 50 15488
Percentage of the requests served within a certain time (ms)
50% 50
66% 62
75% 248
80% 1044
90% 1242
95% 3014
98% 3185
99% 4024
100% 15488 (longest request)
服务器图表:
负载均衡器:
图表中第一个波峰为并发4000总数为1200000个请求的测试,第二个波峰为并发4000总数为1600000个请求的测试,可以看到第二个波峰的很多数值远低于第一个波峰,也印证了第二次的测试压力实际已经超出了负载均衡器所能承受的最大压力。
应用服务器1:
应用服务器2:
应用服务器的第一个波峰为并发4000总数为1200000个请求的测试;第二个波峰为并发4000总数为1600000个请求的测试,可以看到CPU的负载基本是满负荷了