1. 现象
由于种种原因,kafka进程宕机了一段时间(比如可能是被人为kill了)。然后在某个时间点尝试启动kafka的时候,发现启动失败,进程起不起来,查看日志,发现日志报错OOM,类似如下
[2018-11-27 13:54:58,726] INFO Recovering unflushed segment 0 in log dcs_async_scr_info-4. (kafka.log.Log)
[2018-11-27 13:54:58,727] INFO Completed load of log dcs_async_scr_info-4 with log end offset 0 (kafka.log.Log)
[2018-11-27 13:54:58,731] INFO Recovering unflushed segment 92049 in log dcs_async_dev_sid_info-6. (kafka.log.Log)
[2018-11-27 13:54:58,737] INFO Completed load of log dcs_async_dev_sid_info-6 with log end offset 92146 (kafka.log.Log)
...skipping...
[2018-11-27 13:54:59,450] INFO Recovering unflushed segment 0 in log dcs_storm_collect_info-3. (kafka.log.Log)
[2018-11-27 13:54:59,552] ERROR There was an error in one of the threads during logs loading: java.io.IOException: Map failed (kafka.log.LogManager)
[2018-11-27 13:54:59,554] FATAL [Kafka Server 0], Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:919)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:286)
at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
at kafka.utils.Utils$.inLock(Utils.scala:535)
at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
at kafka.log.LogSegment.recover(LogSegment.scala:170)
at kafka.log.Log.recoverLog(Log.scala:205)
at kafka.log.Log.loadSegments(Log.scala:177)
at kafka.log.Log.<init>(Log.scala:67)
at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$7$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:142)
at kafka.utils.Utils$$anon$1.run(Utils.scala:54)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:916)
... 15 more
[2018-11-27 13:54:59,556] INFO [Kafka Server 0], shutting down (kafka.server.KafkaServer)
2. 资源监控
因为日志报错是说内存溢出,因此启动过程中启动过程中抓取进程的资源消耗消息,发现如下
2.1 查看内存状态
因为报错OOM,因此首先抓取了内存状态
total used free shared buffers cached
Mem: 7761 5794 1966 16 150 3048
-/+ buffers/cache: 2595 5165
Swap: 0 0 0
整个过程中抓了好几次,每次结果都可以看出,可用内存还是很充足的
2.2 查看内存映射文件
kafka会将所有的分区数据片段的index文件做mmap,以加速文件读写。另我们知道map是有大小限制的,因此抓取map数量,看看是否超过map控制。
抓取结果:
2055 7f1549679000-7f154a079000 rw-s 00000000 fd:01 1704740 /home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-2/00000000000000000000.index
2056 7f154a079000-7f154aa79000 rw-s 00000000 fd:01 1704740 /home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-2/00000000000000000000.index
2057 7f154aa79000-7f154b479000 rw-s 00000000 fd:01 1704740 /home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-2/00000000000000000000.index
2058 7f154b479000-7f154be79000 rw-s 00000000 fd:01 1704505 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_storm_collect_info-14/00000000000000000000.index
2059 7f154be79000-7f154c879000 rw-s 00000000 fd:01 1704505 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_storm_collect_info-14/00000000000000000000.index
2060 7f154c879000-7f154d279000 rw-s 00000000 fd:01 1704505 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_storm_collect_info-14/00000000000000000000.index
2061 7f154d279000-7f154dc79000 rw-s 00000000 fd:01 1704840 /home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-47/00000000000000000000.index
2062 7f154dd2b000-7f154e72b000 rw-s 00000000 fd:01 1704840 /home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-47/00000000000000000000.index
2063 7f154e72b000-7f154f12b000 rw-s 00000000 fd:01 1704840 /home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-47/00000000000000000000.index
2064 7f154f12b000-7f154fb2b000 rw-s 00000000 fd:01 1704530 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_storm_collect_info_gray-6/00000000000000000000.index
2065 7f154fb2b000-7f155052b000 rw-s 00000000 fd:01 1704530 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_storm_collect_info_gray-6/00000000000000000000.index
2066 7f15505d9000-7f1550fd9000 rw-s 00000000 fd:01 1704530 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_storm_collect_info_gray-6/00000000000000000000.index
2067 7f1550fd9000-7f15519d9000 rw-s 00000000 fd:01 1704852 /home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-28/00000000000000000000.index
2068 7f15519d9000-7f15523d9000 rw-s 00000000 fd:01 1704852 /home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-28/00000000000000000000.index
2069 7f15523d9000-7f1552dd9000 rw-s 00000000 fd:01 1704852 /home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-28/00000000000000000000.index
2070 7f1552dd9000-7f15537d9000 rw-s 00000000 fd:01 1704783 /home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-15/00000000000000000000.index
2071 7f15537d9000-7f15541d9000 rw-s 00000000 fd:01 1704783 /home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-15/00000000000000000000.index
2072 7f15541d9000-7f1554bd9000 rw-s 00000000 fd:01 1704783 /home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-15/00000000000000000000.index
2073 7f1554bd9000-7f15555d9000 rw-s 00000000 fd:01 1704990 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_dev_info_batch_del-7/00000000000000057329.index
2074 7f15555d9000-7f15555da000 rw-s 00000000 fd:01 1704990 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_dev_info_batch_del-7/00000000000000057329.index
2075 7f15555da000-7f1555fda000 rw-s 00000000 fd:01 1704990 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_dev_info_batch_del-7/00000000000000057329.index
2076 7f1555fda000-7f15569da000 rw-s 00000000 fd:01 1704990 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_dev_info_batch_del-7/00000000000000057329.index
2077 7f15569da000-7f15573da000 rw-s 00000000 fd:01 1705044 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_device_to_db-10/00000000000000183949.index
2078 7f15573da000-7f15573db000 rw-s 00000000 fd:01 1705044 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_device_to_db-10/00000000000000183949.index
2079 7f15573db000-7f1557ddb000 rw-s 00000000 fd:01 1705044 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_device_to_db-10/00000000000000183949.index
2080 7f1557ddb000-7f15587db000 rw-s 00000000 fd:01 1705044 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_device_to_db-10/00000000000000183949.index
2081 7f15587db000-7f15591db000 rw-s 00000000 fd:01 1704654 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_scr_info-5/00000000000000000000.index
2082 7f15591db000-7f1559bdb000 rw-s 00000000 fd:01 1704654 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_scr_info-5/00000000000000000000.index
2083 7f1559bdb000-7f155a5db000 rw-s 00000000 fd:01 1704654 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_scr_info-5/00000000000000000000.index
2084 7f155a5db000-7f155afdb000 rw-s 00000000 fd:01 1704682 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_device_to_db-9/00000000000000091638.index
2085 7f155afdb000-7f155afdc000 rw-s 00000000 fd:01 1704682 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_device_to_db-9/00000000000000091638.index
2086 7f155afdc000-7f155b9dc000 rw-s 00000000 fd:01 1704682 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_device_to_db-9/00000000000000091638.index
2087 7f155b9dc000-7f155c3dc000 rw-s 00000000 fd:01 1704682 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_device_to_db-9/00000000000000091638.index
2088 7f155c3dc000-7f155cddc000 rw-s 00000000 fd:01 1704870 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_scr_info-13/00000000000000000000.index
2089 7f155cddc000-7f155d7dc000 rw-s 00000000 fd:01 1704870 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_scr_info-13/00000000000000000000.index
2090 7f155d7dc000-7f155e1dc000 rw-s 00000000 fd:01 1704870 /home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_scr_info-13/00000000000000000000.index
这里给出的只是各种index的片段,还有其他的映射文件,比如加载的jar包等,此处忽略。
统计量下map的总数量,600多。查看系统限制
cat /proc/sys/vm/max_map_count
65530
可以看到,map size远小于系统限制
2.3 查看进程status
然后抓取status,
cat /proc/$PID/status
可以看出,VmPeak和VMSize比较大,快7G
----------------------------------------Tue Nov 27 13:54:59 CST 2018 catch process status info start--------------------------------
Name: java
State: S (sleeping)
Tgid: 7492
Pid: 7492
PPid: 7491
TracerPid: 0
Uid: 1006 1006 1006 1006
Gid: 1000 1000 1000 1000
FDSize: 256
Groups: 16 33 1000
VmPeak: 6899700 kB
VmSize: 6899700 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 106168 kB
VmRSS: 106104 kB
VmData: 2416668 kB
VmStk: 140 kB
VmExe: 4 kB
VmLib: 14460 kB
VmPTE: 1248 kB
VmSwap: 0 kB
Threads: 17
SigQ: 0/127537
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000003
SigCgt: 2000000181005ccc
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
Cpus_allowed: f
Cpus_allowed_list: 0-3
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00
000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 7
nonvoluntary_ctxt_switches: 3
----------------------------------------Tue Nov 27 13:54:59 CST 2018 catch process status info end--------------------------------
因此怀疑是VmPeak反映出的虚拟内存占用过大导致的
2.4 查看ulimit限制
查看ulimit
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 127537
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) 7331164
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 127537
virtual memory (kbytes, -v) 6899920
file locks (-x) unlimited
可以发现, virtual memory的上线也是快7G,因此大概率是vm不够用了。
3. 调大vm-解决
申请了root权限,调整了ulimit -v
ulimit -v unlimited
再启动kafka,完美启动。查看status
Name: java
State: S (sleeping)
Tgid: 19687
Pid: 19687
PPid: 1
TracerPid: 0
Uid: 1006 1006 1006 1006
Gid: 1000 1000 1000 1000
FDSize: 512
Groups: 16 33 1000
VmPeak: 7566500 kB
VmSize: 4704148 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 261668 kB
VmRSS: 261668 kB
VmData: 2453684 kB
VmStk: 140 kB
VmExe: 4 kB
VmLib: 14460 kB
VmPTE: 1788 kB
VmSwap: 0 kB
Threads: 59
SigQ: 0/127537
可以看出,vm峰值的确超过了7G,可以应证是之前的vm限制问题。
总结
kafka的设计和实现中大量使用了PageCache和文件映射。其中文件映射需要用到虚拟内存。对于每个topic的每个partition的数据片段,kafka都将为其生成一个index文件,这些index文件会被kafka做mmap映射到内存中,如果index文件太多,虚拟内存可能会不够用,因此有两处可以缓解这种现象
- 调整系统vm限制,如本文中设置ulimit -v unlimited
- 合理配置log.segment.bytes的大小,不要配置的过小,否则在生产环境中,可能会因为滚动生成太多的segment文件而附加生成太多的index文件,最终使得mmap在映射时造成OOM报错。一般log.segment.bytes配置不小于200M,log.retention.bytes配置推荐为1G即以上(是log.segment.bytes的倍数)
4. 抓取脚本
catchKafkaMemInfo.sh
PID=`jps|grep Kafka|grep -v grep|awk '{print $1}'`
STATUS_FILE=process_status_$PID.log
MAPS_FILE=process_maps_$PID.log
FREEMEM_FILE=process_free_$PID.log
if [ "x$PID" == "x" ] ; then
echo "no pid found for kafka"
else
echo "catch status of pid:$PID"
NOW_TIME=`date`
echo "----------------------------------------$NOW_TIME catch process status info start--------------------------------" >> $FREEMEM_FILE
free -m >> $FREEMEM_FILE
echo "----------------------------------------$NOW_TIME catch process mem info end--------------------------------" >> $FREEMEM_FILE
echo "----------------------------------------$NOW_TIME catch process status info start--------------------------------" >> $STATUS_FILE
cat /proc/$PID/status >> $STATUS_FILE
echo "----------------------------------------$NOW_TIME catch process status info end--------------------------------" >> $STATUS_FILE
echo "process status file has been created as file $STATUS_FILE"
echo "----------------------------------------$NOW_TIME catch process map info start--------------------------------" >> $MAPS_FILE
cat /proc/$PID/maps >> $MAPS_FILE
echo "----------------------------------------$NOW_TIME catch process map info end--------------------------------" >> $MAPS_FILE
MAP_SIZE=`wc -l $MAPS_FILE|awk '{print $1}'`
echo "process maps file has been created as file $MAPS_FILE, map size:$MAP_SIZE"
fi
startKafkaAndCatch.sh
//启动kafka server
nohup bin/kafka-server-start.sh config/server.properties > /dev/null 2>&1 &
//开始抓资源状态,每100ms抓取一次
sh catchKafkaMemInfo.sh
for i in {1..10}
do
echo "catch now..."
sh catchKafkaMemInfo.sh
sleep 0.1
done
主要抓取了map count、内存使用信息等。
status相关字段解释
VmPeak: Peak virtual memory usage
VmSize: Current virtual memory usage
VmLck: Current mlocked memory
VmHWM: Peak resident set size
VmRSS: Resident set size
VmData: Size of "data" segment
VmStk: Size of stack
VmExe: Size of "text" segment
VmLib: Shared library usage
VmPTE: Pagetable entries size
VmSwap: Swap space used