线上服务,在使用thrift的过程中,客户端会不定时出现一些org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe异常,平时数量不多,会有突然增多的情况。
线上服务情况:
thrift 0.9.0, TBinaryProtocol协议,TFramedTransport 传输,通过vip访问server端。
client端使用common-pool封装了连接池
线上错误信息:
e=org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe (Write failed)
at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:161)
at org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:158)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65)
at com.didichuxing.dp.its.service.RouteBrokerService$Client.send_BatchP2P(RouteBrokerService.java:526)
at com.didichuxing.dp.its.service.RouteBrokerService$Client.BatchP2P(RouteBrokerService.java:518)
at com.didichuxing.dexter.service.impl.ThriftServiceImpl.getEta(ThriftServiceImpl.java:120)
at com.didichuxing.dexter.consumers.BeginChargeMqConsumer.processJson(BeginChargeMqConsumer.java:109)
at com.didichuxing.dexter.consumers.BaseMqConsumer$1.process(BaseMqConsumer.java:54)
at com.xiaojukeji.carrera.consumer.thrift.client.SimpleCarreraConsumer.doProcessMessage(SimpleCarreraConsumer.java:155)
at com.xiaojukeji.carrera.consumer.thrift.client.SimpleCarreraConsumer.doProcessMessage(SimpleCarreraConsumer.java:22)
at com.xiaojukeji.carrera.consumer.thrift.client.BaseCarreraConsumer.processResponse(BaseCarreraConsumer.java:189)
at com.xiaojukeji.carrera.consumer.thrift.client.BaseCarreraConsumer.startConsume(BaseCarreraConsumer.java:128)
at com.xiaojukeji.carrera.consumer.thrift.client.BaseCarreraConsumerPool$2.run(BaseCarreraConsumerPool.java:157)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:159)
... 15 more
处理耗时0ms,请求没有发送出去。
问题产生原因
broken pipe产生的原因通常是当前管道读端没有在读,而管道的写端继续有线程在写,造成管道终端。对应这里,是thrift客户端向一个已经关闭的socket写入数据导致异常发生。redis之前也也有遇到过。
具体到这里,错误的原因是thrift从连接池获取连接时,拿到的连接时一个无效的, 连接池在获取连接时,对连接做了有效性验证后使用它,thrift其实没有真正验证连接有效性的办法。
判断连接是否有效,使用TTransport类的isOpen()函数进行判断,isOpen()函数的源码中使用了jdk中Socket类的isConnected()方法判断。
jdk源码如下:
/**
* Returns the connection state of the socket.
*
* Note: Closing a socket doesn't clear its connection state, which means
* this method will return {@code true} for a closed socket
* (see {@link #isClosed()}) if it was successfuly connected prior
* to being closed.
*
* @return true if the socket was successfuly connected to a server
* @since 1.4
*/
public boolean isConnected() {
// Before 1.3 Sockets were always connected during creation
return connected ||oldImpl;
}
所以,isConnected方法得到的并不是Socket的当前连接状态,而是只要是Socket连接曾经成功过,isConnected始终返回true。
thrift并没有提供一个可以获取当前连接状态的方法。
查找问题过程
1、首先,broken pipe的问题一直出现,耗时为0ms,定位到问题是client端问题,获取的连接信息用问题
2、这种问题一般是因为没有开启连接的test功能导致,这里已经开启了test
3、开启了验证,仍然获取了有问题的socket,可能是验证这里没有做到真正的验证
4、查看验证的代码,加搜索问题名字,看是否有人被坑过
5、查到了