hbase full gc问题解决 - vah101的专栏 - 博客频道 - CSDN.NET http://blog.csdn.net/vah101/article/details/22923013
日志中打印了以下内容:
2014-01-30 20:53:19,137 WARN org.apache.Hadoop.Hbase.util.Sleeper: We slept 49779ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
...
2014-01-30 20:53:19,170 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 49014ms for sessionid 0x244cdbfa70019d4, closing socket connection and attempting reconnect
...
2014-01-30 20:53:23,685 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server Processor101,62110,1476920820701: Unhandled exception: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing Processor101,62110,1476920820701 as dead serverorg.apache.hadoop.hbase.YouAreDeadException: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing Processor101,62110,1476920820701 as dead server
第一条日志,说明了HBase发生了full gc,造成region server睡眠了49秒,其后导致了region server与zookeeper连接中断,从而被zookeeper认为已经离线,触发了org.apache.hadoop.hbase.YouAreDeadException异常,造成了region server下线
优化的方法是在hbase-env.sh中加入配置项:
export HBASE_REGIONSERVER_OPTS="-Xmx16g -Xms16g -Xmn128m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"
其中,-Xmx16g -Xms16g的数值按照hbase-env.sh中的由export HBASE_HEAPSIZE=来配置,
“-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log”是用来配置gc日志的,生产环境下可以不用打印日志。
可以参考:
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
http://wiki.apache.org/hadoop/PerformanceTuning
http://www.cnblogs.com/cenyuhai/p/3235101.html
http://blog.csdn.NET/sunmenggmail/article/details/10172631