Java应用内存泄露排查

JVM如果出现内存泄露，典型的现象就是系统FullGC比较频繁。到最后干脆OOM(Out of Memory)了。

当发现应用内存溢出或长时间使用内存很高的情况下，通过内存dump进行分析可找到原因。

当发现cpu使用率很高时，通过线程dump定位具体哪个线程在做哪个工作占用了过多的资源。

内存dump是指通过jmap -dump <pid>输出的文件，而线程dump是指通过jstack <pid>输出的信息。

在linux操作系统下（已安装jdk），执行jps命令，列出正在运行的java程序的进程ID。

jps

使用top查看目前正在运行的进程使用系统资源情况。

top

首先是内存dump：

jmap –dump:live,format=b,file=heap.bin <pid>

可以用jmap -h查看详细信息：

where <option> is one of:
    <none>               to print same info as Solaris pmap
    -heap                to print java heap summary
    -histo[:live]        to print histogram of java object heap; if the "live"
                         suboption is specified, only count live objects
    -permstat            to print permanent generation statistics
    -finalizerinfo       to print information on objects awaiting finalization
    -dump:<dump-options> to dump java heap in hprof binary format
                         dump-options:
                           live         dump only live objects; if not specified,
                                        all objects in the heap are dumped.
                           format=b     binary format
                           file=<file>  dump heap to <file>
                         Example: jmap -dump:live,format=b,file=heap.bin <pid>
    -F                   force. Use with -dump:<dump-options> <pid> or -histo
                         to force a heap dump or histogram when <pid> does not
                         respond. The "live" suboption is not supported
                         in this mode.
    -h | -help           to print this help message
    -J<flag>             to pass <flag> directly to the runtime system

详细说明点这里：https://docs.oracle.com/javase/6/docs/technotes/tools/share/jmap.html

其次是线程dump，比如说：

jstack -m <pid> >jvm_deadlocks.txt
jstack -l <pid> >jvm_listlocks.txt

可以用 jstack -h 查看命令详细信息

Options:
    -F  to force a thread dump. Use when jstack <pid> does not respond (process is hung)
    -m  to print both java and native frames (mixed mode)
    -l  long listing. Prints additional information about locks
    -h or -help to print this help message

但是dump堆要花较长的时间，并且文件巨大，再从服务器上拖回本地导入工具，这个过程太折腾不到万不得已最好别这么干。

可以用更轻量级的在线分析，用jmap查看存活的对象情况（jmap -histo:live [pid]）。

jmap

比如上图所示，HashTable占用了大量的内存，如何找到导致这个事情发生的原因？可以进一步使用btrace来排查。

首先写btrace脚本TracingHashTable.java：

import com.sun.btrace.annotations.*;
import static com.sun.btrace.BTraceUtils.*;
 
@BTrace
public class TracingHashTable {
        @OnMethod(
            clazz="java.util.Hashtable",
            method="put",
            location=@Location(Kind.RETURN))
        public static void traceExecute(@Self java.util.Hashtable object){
                println("调用堆栈！！");
                jstack();
        }

然后运行：

bin/btrace -cp build 4947 TracingHashTable.java

可以看到下面的堆栈:

堆栈

至此就定位到具体出问题的代码行了。

但是说实话，内存泄露这种问题，好好写代码用好static变量和容器，是很容易避免的。

那么考虑一种特殊的很难发现和避免的情况，永久代内存泄露。

内存详细信息

上图说明Perm不足. Perm存放class,method相关对象，以及运行时常量对象. 如果一个应用加载了大量的class, 那么Perm区存储的信息一般会比较大.另外大量的intern String对象也会导致Perm区不断增长。此区域大小由-XX:MaxPermSize参数进行设置。

一般出现这种问题的原因，目前只发现两种：Groovy动态编译class、String.intern。

本质原因，是ClassLoader.defineClass和java.lang.String.intern在大量不适宜的场景被调用。

使用btrace相关工具输出调用ClassLoader.defineClass栈信息, 从栈信息来追溯问题. (代码如下图). 但Btrace 不能trace native方法。

batrace

此外，可以用Jprofiler来trace String.intern方法栈，这块就需要单独去研究了。

参考文献：
Jvm dump介绍与使用（内存与线程）
一个java内存泄漏的排查案例
 如何排查Java内存泄露(内附各种排查工具介绍)

最后编辑于：2018.09.07 22:04:37