1 问题
近期,刚上线不久的生产系统的数据库连接池 C3P0 (版本为0.9.5.2)突然报出 APPARENT DEADLOCK!!! 错误。
1.1 错误日志
错误日志如下。
com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@7cf60134 -- APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending tasks!
com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@7cf60134 -- APPARENT DEADLOCK!!! Complete Status:
Managed Threads: 3
Active Threads: 3
Active Tasks:
com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask@3ce7f8aa
on thread: C3P0PooledConnectionPoolManager[identityToken->1hgepsj9u1w3a4je54sly0|70165722]-HelperThread-#1
com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask@17f3c7a1
on thread: C3P0PooledConnectionPoolManager[identityToken->1hgepsj9u1w3a4je54sly0|70165722]-HelperThread-#2
com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask@65dc473c
on thread: C3P0PooledConnectionPoolManager[identityToken->1hgepsj9u1w3a4je54sly0|70165722]-HelperThread-#0
Pending Tasks:
com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@744937e8
com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@77e8b363
com.mchange.v2.resourcepool.BasicResourcePool$1DestroyResourceTask@7eafacf0
com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@79cde1be
com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@44067f18
com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask@76c7bf50
Pool thread stack traces:
Thread[C3P0PooledConnectionPoolManager[identityToken->1hgepsj9u1w3a4je54sly0|70165722]-HelperThread-#0,5,main]
java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
java.net.SocketInputStream.read(SocketInputStream.java:171)
java.net.SocketInputStream.read(SocketInputStream.java:141)
oracle.net.ns.Packet.receive(Unknown Source)
oracle.net.ns.DataPacket.receive(Unknown Source)
oracle.net.ns.NetInputStream.getNextPacket(Unknown Source)
oracle.net.ns.NetInputStream.read(Unknown Source)
oracle.net.ns.NetInputStream.read(Unknown Source)
oracle.net.ns.NetInputStream.read(Unknown Source)
oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1099)
oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1070)
oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:478)
oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:213)
oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:796)
oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1031)
oracle.jdbc.driver.T4CPreparedStatement.executeMaybeDescribe(T4CPreparedStatement.java:836)
oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1124)
oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3285)
oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3329)
oracle.jdbc.OracleDatabaseMetaData.getTables(OracleDatabaseMetaData.java:2471)
com.mchange.v2.c3p0.impl.DefaultConnectionTester$1.activeCheckConnectionNoQuery(DefaultConnectionTester.java:84)
com.mchange.v2.c3p0.impl.DefaultConnectionTester$3.activeCheckConnectionNoQuery(DefaultConnectionTester.java:182)
com.mchange.v2.c3p0.impl.DefaultConnectionTester.activeCheckConnection(DefaultConnectionTester.java:275)
com.mchange.v2.c3p0.AbstractConnectionTester.activeCheckConnection(AbstractConnectionTester.java:79)
com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.testPooledConnection(C3P0PooledConnectionPool.java:504)
com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.testPooledConnection(C3P0PooledConnectionPool.java:464)
com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.refurbishIdleResource(C3P0PooledConnectionPool.java:436)
com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask.run(BasicResourcePool.java:2211)
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:696)
Thread[C3P0PooledConnectionPoolManager[identityToken->1hgepsj9u1w3a4je54sly0|70165722]-HelperThread-#1,5,main]
java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
java.net.SocketInputStream.read(SocketInputStream.java:171)
java.net.SocketInputStream.read(SocketInputStream.java:141)
oracle.net.ns.Packet.receive(Unknown Source)
oracle.net.ns.DataPacket.receive(Unknown Source)
oracle.net.ns.NetInputStream.getNextPacket(Unknown Source)
oracle.net.ns.NetInputStream.read(Unknown Source)
oracle.net.ns.NetInputStream.read(Unknown Source)
oracle.net.ns.NetInputStream.read(Unknown Source)
oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1099)
oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1070)
oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:478)
oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:213)
oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:796)
oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1031)
oracle.jdbc.driver.T4CPreparedStatement.executeMaybeDescribe(T4CPreparedStatement.java:836)
oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1124)
oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3285)
oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3329)
oracle.jdbc.OracleDatabaseMetaData.getTables(OracleDatabaseMetaData.java:2471)
com.mchange.v2.c3p0.impl.DefaultConnectionTester$1.activeCheckConnectionNoQuery(DefaultConnectionTester.java:84)
com.mchange.v2.c3p0.impl.DefaultConnectionTester$3.activeCheckConnectionNoQuery(DefaultConnectionTester.java:182)
com.mchange.v2.c3p0.impl.DefaultConnectionTester.activeCheckConnection(DefaultConnectionTester.java:275)
com.mchange.v2.c3p0.AbstractConnectionTester.activeCheckConnection(AbstractConnectionTester.java:79)
com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.testPooledConnection(C3P0PooledConnectionPool.java:504)
com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.testPooledConnection(C3P0PooledConnectionPool.java:464)
com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.refurbishIdleResource(C3P0PooledConnectionPool.java:436)
com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask.run(BasicResourcePool.java:2211)
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:696)
Thread[C3P0PooledConnectionPoolManager[identityToken->1hgepsj9u1w3a4je54sly0|70165722]-HelperThread-#2,5,main]
java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
java.net.SocketInputStream.read(SocketInputStream.java:171)
java.net.SocketInputStream.read(SocketInputStream.java:141)
oracle.net.ns.Packet.receive(Unknown Source)
oracle.net.ns.DataPacket.receive(Unknown Source)
oracle.net.ns.NetInputStream.getNextPacket(Unknown Source)
oracle.net.ns.NetInputStream.read(Unknown Source)
oracle.net.ns.NetInputStream.read(Unknown Source)
oracle.net.ns.NetInputStream.read(Unknown Source)
oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1099)
oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1070)
oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:478)
oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:213)
oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:796)
oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1031)
oracle.jdbc.driver.T4CPreparedStatement.executeMaybeDescribe(T4CPreparedStatement.java:836)
oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1124)
oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3285)
oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3329)
oracle.jdbc.OracleDatabaseMetaData.getTables(OracleDatabaseMetaData.java:2471)
com.mchange.v2.c3p0.impl.DefaultConnectionTester$1.activeCheckConnectionNoQuery(DefaultConnectionTester.java:84)
com.mchange.v2.c3p0.impl.DefaultConnectionTester$3.activeCheckConnectionNoQuery(DefaultConnectionTester.java:182)
com.mchange.v2.c3p0.impl.DefaultConnectionTester.activeCheckConnection(DefaultConnectionTester.java:275)
com.mchange.v2.c3p0.AbstractConnectionTester.activeCheckConnection(AbstractConnectionTester.java:79)
com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.testPooledConnection(C3P0PooledConnectionPool.java:504)
com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.testPooledConnection(C3P0PooledConnectionPool.java:464)
com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.refurbishIdleResource(C3P0PooledConnectionPool.java:436)
com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask.run(BasicResourcePool.java:2211)
com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:696)
Task com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask@65dc473c (in deadlocked PoolThread) failed to complete in maximum time 60000ms. Trying interrupt().
Task com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask@3ce7f8aa (in deadlocked PoolThread) failed to complete in maximum time 60000ms. Trying interrupt().
Task com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask@17f3c7a1 (in deadlocked PoolThread) failed to complete in maximum time 60000ms. Trying interrupt().
上面的日志报出了 APPARENT DEADLOCK!!! 错误,然后输出了
ThreadPoolAsynchronousRunner 的状态,包括线程数,当前正在处理的任务,队列中等待处理的任务,以及 ThreadPoolAsynchronousRunner 任务线程的堆栈。
1.2 配置
生产上的配置如下。
<bean id="dataSource" class="com.mchange.v2.c3p0.ComboPooledDataSource" >
<property name="driverClass" value="oracle.jdbc.driver.OracleDriver"></property>
<property name="jdbcUrl" value="${datasource.jdbcUrl}"></property>
<property name="user" value="${datasource.user}"></property>
<property name="password" value="${datasource.pass}"></property>
<property name="maxPoolSize" value="${datasource.maxPoolSize}"></property>
<property name="maxIdleTime" value="180"></property>
<property name="idleConnectionTestPeriod" value="60"></property>
<property name="checkoutTimeout" value="30000"></property>
</bean>
基本相当于使用默认配置。
1.3 初步判断
根据日志,C3P0判断其线程池(ThreadPoolAsynchronousRunner)中的任务出现死锁(DEADLOCK),此时线程处于检查空闲数据链接状态是否可用。单从日志字面暂时无法判断问题所在。
查阅了网上的资料,对此问题原因分析较少,绝大部分不甚了了。比较多的文章是尝试通过调整参数来避免问题的出现的帖子。
秉着 “知其所以然” 的态度,再加上一点点的好奇心,我翻阅了一下C3P0的代码,试图从中发现端倪。
2 C3P0的死锁(DeadLock)机制分析
2.1 C3P0类结构
为了后面展示代码的需要,先附上一个我整理的C3P0的类结构图,有助于我们分析问题时确定问题的位置。如果下面的图太小,可以点击这里浏览
我们重点关注右下角的几个类。
2.2 ThreadPoolAsynchronousRunner线程池
在 C3P0PooledConnectionPoolManager 类在初始化时中创建了一个taskRunner,其类型为 ThreadPoolAsynchronousRunner , 是一个 C3P0 自定实现的线程池。
public final class C3P0PooledConnectionPoolManager{
...
ThreadPoolAsynchronousRunner taskRunner;
...
private synchronized void _poolsInit()
{
...
this.taskRunner = createTaskRunner(
num_task_threads, matt, timer, idStr + "-HelperThread" );
...
}
...
private ThreadPoolAsynchronousRunner createTaskRunner(
int num_threads,
int matt /* maxAdministrativeTaskTime */,
Timer timer,
String threadLabel )
{
...
out = new ThreadPoolAsynchronousRunner(
num_threads, true, timer, threadLabel );
return out;
...
}
...
}
ThreadPoolAsynchronousRunner 线程池被作为成员注入到多个对象之中,最终用于执行 BasicResourcePool 类中定义的 ScatteredAcquireTask, AcquireTask, RemoveTask, DestroyResourceTask, RefurbishCheckinResourceTask, AsyncTestIdleResourceTask 等任务。
我们来看一下 ThreadPoolAsynchronousRunner 的代码。
public final class ThreadPoolAsynchronousRunner implements AsynchronousRunner
{
...
TimerTask deadlockDetector = new DeadlockDetector();
...
public ThreadPoolAsynchronousRunner(
int num_threads, boolean daemon,
Timer sharedTimer, String threadLabel )
{
this( num_threads,
daemon,
DFLT_MAX_INDIVIDUAL_TASK_TIME,
DFLT_DEADLOCK_DETECTOR_INTERVAL,
DFLT_INTERRUPT_DELAY_AFTER_APPARENT_DEADLOCK,
sharedTimer,
false,
threadLabel );
}
...
private ThreadPoolAsynchronousRunner( int num_threads,
boolean daemon,
int max_individual_task_time,
int deadlock_detector_interval,
int interrupt_delay_after_apparent_deadlock,
Timer myTimer,
boolean should_cancel_timer,
String threadLabel )
{
this.num_threads = num_threads;
this.daemon = daemon;
this.max_individual_task_time = max_individual_task_time;
this.deadlock_detector_interval =
deadlock_detector_interval;
this.interrupt_delay_after_apparent_deadlock =
interrupt_delay_after_apparent_deadlock;
this.myTimer = myTimer;
this.should_cancel_timer = should_cancel_timer;
this.threadLabel = threadLabel;
recreateThreadsAndTasks();
myTimer.schedule( deadlockDetector,
deadlock_detector_interval,
deadlock_detector_interval );
}
...
}
可以看到, ThreadPoolAsynchronousRunner 线程池默认创建3个执行线程,并且创建了一个 死锁检测(DeadLockDetector) 的定时任务,默认情况下每10秒执行一次。
2.3 DeadLockDetector内部机制
public final class ThreadPoolAsynchronousRunner implements AsynchronousRunner
{
...
class DeadlockDetector extends TimerTask
{
LinkedList last = null;
LinkedList current = null;
public void run()
{
...
current = (LinkedList) pendingTasks.clone();
...
if ( current.equals( last ) )
{
//System.err.println(this + " -- APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending tasks!");
...
recreateThreadsAndTasks();
run_stray_tasks = true;
}
...
if (run_stray_tasks)
{
AsynchronousRunner ar = new ThreadPerTaskAsynchronousRunner( DFLT_MAX_EMERGENCY_THREADS, max_individual_task_time );
for ( Iterator ii = current.iterator(); ii.hasNext(); )
ar.postRunnable( (Runnable) ii.next() );
ar.close( false ); //tell the emergency runner to close itself when its tasks are complete
last = null;
}
else
last = current;
...
可以看到,DeadLockDetector是
ThreadPoolAsynchronousRunner 的内部类,是一个默认每10秒执行一次的定时任务。其工作原理是定时检查任务队列(pendingTasks)中的任务,如果前后两次检查发现待执行的任务没有变化,就认为可能产生了死锁,并额外创建线程执行等待任务。
3 问题分析
3.1 线程等待
从代码中可以看出,APPARENT DEADLOCK!!!报错原因主要是ThreadPoolAsynchronousRunner线程池的中的等待任务队列在经过一定时长(默认10秒)前后没有变化。
结合问题日志。我们可以初步判断问题的原因,是由于某些任务(如日志中显示AsyncTestIdleResourceTask)执行过程中失去响应(超过60s没有收到响应),导致后续的任务长时间等待,进而报错。
3.2 具体分析
进而观察执行线程的堆栈,可以发现三个任务都在执行检测连接有效性时,向服务端发送请求数据后,Oracle数据库未返回响应,导致线程长时间等待。
由于此问题无法通过构造场景重现,只能猜测,产生这种原因可能是连接在服务端已经超时,由服务端主动关闭连接,但由于TCP协议断开连接需要完成4次握手,而服务端只完成了前两次握手,这样就导致了客户端可以发送消息但接收不到任何内容,长时间等待直到超时。
3.3 解决方案
根据以上分析,我认为问题是maxIdleTime(连接最大空闲时间)与idleConnectionTestPeriod(空闲检测周期)参数配置与数据库不匹配引起的。
根据DBA的建议,将maxIdleTime调整为30秒,将idleConnectionTestPeriod调整为10秒。
4 后续
调整完后系统稳定运行,至今未再报出同样的错误。
在查看资料的过程中,发现有与我类似的情况,请看这里。问题出在同一块代码,但现象与原因均与我不同。我并不认为这是C3P0的缺陷,而是具体环境上的配置没有相匹配。
我再说一点废话,C3P0中大量使用了synchronized关键字进行加锁,由于单例没有使用double-check,导致正常代码中大量加锁,在高并发下效率很低。因此,如果性能对系统较为重要,还是推荐大家使用HikariCP替换C3P0。