本次性能调优项目中由于涉及的环节较多,最好能够将生成环境中的所有内容进行监控,同时考虑最低开销,这样就从应用服务器和数据库服务器两个服务器进行,以nmon作为监控基础数据,同时监控JVM和数据库告警和快照。
所有监控的内容都是手段,只有从海量的监控日志中得到规律性、有意义的数据才是性能优化的基础。有了数据就是对数据的分析,本文将首先介绍需要获取的数据,内容也将是我从项目获取的经验。
基础环境:两台数据库服务器,做的数据库集群。
应用服务器 - JVM线程
项目中主要使用tongweb(老系统版本很低),监控内容类似如下:
监控内容
...
"2018-01-11T02:25:55.663+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","NumConnCreated","10",
"2018-01-11T02:25:55.663+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","NumConnAcquired","111292",
"2018-01-11T02:25:55.663+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","NumConnNotSuccessfullyMatched","0",
"2018-01-11T02:26:25.670+0800","com.tongtech.tongweb:type=jvm,category=monitor,server=server","UpTime","222520621",
"2018-01-11T02:26:25.670+0800","com.tongtech.tongweb:type=jvm,category=monitor,server=server","HeapSize","2143485952",
"2018-01-11T02:26:25.671+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","NumConnUsed","0",
"2018-01-11T02:26:25.671+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","NumConnSuccessfullyMatched","0",
"2018-01-11T02:26:25.671+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","WaitQueueLength","0",
"2018-01-11T02:26:25.671+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","NumConnDestroyed","0",
"2018-01-11T02:26:25.671+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","ConnRequestWaitTime","4",
"2018-01-11T02:26:25.672+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","NumConnFailedValidation","0",
"2018-01-11T02:26:25.672+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","NumConnReleased","111292",
"2018-01-11T02:26:25.672+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","NumConnFree","10",
...
关注内容
tongweb的监控数据获取连接池状态等信息,我们的方法是通过Excel宏的方式将日志内转换成可读数据,并进行图形分析。具体内容将单独说明。
JVM线程监控说明
监控意义
通过对tongweb的JVM监控,可初步判定性能高峰时间点、连接池是否满,同时进一步判定连接高峰期的性能瓶颈是否出现在应用上,这对今后的性能分析尤为重要,可将主要性能问题归类,减少不必要的工作。
应用服务器 - netstat
在Internet RFC标准中,Netstat的定义是: Netstat是在内核中访问网络连接状态及其相关信息的程序,它能提供TCP连接,TCP和UDP监听,进程内存管理的相关报告。
监控内容
以下是在项目中获取的日志摘取
...
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:2049 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:139 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:427 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:427 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:58862 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:2544 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:21 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:631 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:445 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:669 0.0.0.0:* LISTEN
...
应用服务器 - nmon
作为本次性能优化主要的分析手段,nmon起着尤为重要的作用,以下是wiki的解释,有时间可以了解
nmon collects the following operating system statistics:
CPU and CPU threads Utilisation
CPU frequency for servers or virtual machines that can alter their clock rate
GPU stats including utilisation, MHz and temperatures
Physical and Virtual Memory use
Disk read & write and transfers
Disk Groups decided by the user
Swap and Paging
Network read & write and transfers
Local File-systems
Network File-system (NFS)
Top Processes by CPU use, Memory size and I/O rates
Kernel stats including Run Queue, context-switch, fork, Load Average & Uptime
Large and Huge memory pages
Virtual Machine stats (depending on the hardware) - useful for Linux running KVM to host virtual machines
Resources in the Server and virtual machine
总结其实nmon更像是系统性能开销的快照,结合对nmon的分析工具可以很清楚的掌握系统的各项指标。
下载分析工具
数据库服务器 - 告警
了解数据库的告警日志也是掌握当前性能的关键环节。
日志如下,如出现error可以针对具体情况进行分析解决。
2018-01-11-00.36.36.090562+480 I13363168A459 LEVEL: Error
PID : 2228842 TID : 142490 PROC : db2sysc
INSTANCE: db2 NODE : 000 DB : TRADE
EDUID : 142490 EDUNAME: db2agent (**) 0
FUNCTION: DB2 UDB, Query Gateway, sqlqg_fedstp_hook, probe:40
MESSAGE : Unexpected error returned from outer RC=
DATA #1 : Hexdump, 4 bytes
0x07000007053F28D0 : 8126 0012 .&..
数据库服务器 - 快照
数据库日志快照将作为主要分析依据,在快照中可以分析数据库时间的开销情况,如下:
...
Number of automatic storage paths = 1
Automatic storage path = /db2data
Node number = 0
State = In Use
File system ID = 9223372079804448776
Storage path free space (bytes) = 69730709504
File system used space (bytes) = 139648946176
File system total space (bytes) = 209379655680
...
本文只是列出了分析的方法,具体操作有时间我会慢慢总结。
工具的利用固然重要,但是性能调优并不是仅仅如此,必须步步为营
做好长期作战的准备。