OKHttp重试机制剖析
OKHttp拥有网络连接失败时的重试功能:
OkHttp perseveres when the network is troublesome: it will silently recover from common connection problems. If your service has multiple IP addresses OkHttp will attempt alternate addresses if the first connect fails. This is necessary for IPv4+IPv6 and for services hosted in redundant data centers. OkHttp initiates new connections with modern TLS features (SNI, ALPN), and falls back to TLS 1.0 if the handshake fails.
要了解OKHttp的重试机制,我们最关心的就是RetryAndFollowUpInterceptor
, 在遭遇网络异常时,OKHttp的网络异常相关的重试都在RetryAndFollowUpInterceptor
完成。具体我们先从RetryAndFollowUpInterceptor
的#intercept(Chain chian)
方法开始入手,下面的代码片段已经去掉了非核心逻辑:
//StreamAllocation init...
Response priorResponse = null;
while (true) {
if (canceled) {
streamAllocation.release();
throw new IOException("Canceled");
}
Response response;
boolean releaseConnection = true;
try {
response = realChain.proceed(request, streamAllocation, null, null);
releaseConnection = false;
} catch (RouteException e) {
//socket连接阶段,如果发生连接失败,会统一封装成该异常并抛出
`RouteException`:通过路由的尝试失败了,请求将不会被发送,此时会尝试通过调用`#recover`来恢复;
// The attempt to connect via a route failed. The request will not have been sent.
if (!recover(e.getLastConnectException(), false, request)) {
throw e.getLastConnectException();
}
releaseConnection = false;
continue;
} catch (IOException e) {
//socket连接成功后,发生请求阶段时抛出的各类网络异常
// An attempt to communicate with a server failed. The request may have been sent.
boolean requestSendStarted = !(e instanceof ConnectionShutdownException);
if (!recover(e, requestSendStarted, request)) throw e;
releaseConnection = false;
continue;
} finally {
// We're throwing an unchecked exception. Release any resources.
if (releaseConnection) {
streamAllocation.streamFailed(null);
streamAllocation.release();
}
}
接下来看核心的recover方法:
/**
* Report and attempt to recover from a failure to communicate with a server. Returns true if
* {@code e} is recoverable, or false if the failure is permanent. Requests with a body can only
* be recovered if the body is buffered or if the failure occurred before the request has been
* sent.
*/
private boolean recover(IOException e, boolean requestSendStarted, Request userRequest) {
streamAllocation.streamFailed(e);
// The application layer has forbidden retries. 应用层禁止重试则不再重试
if (!client.retryOnConnectionFailure()) return false;
// We can't send the request body again. 如果请求已经发出,并且请求的body不支持重试则不再重试
if (requestSendStarted && userRequest.body() instanceof UnrepeatableRequestBody) return false;
// This exception is fatal. //致命错误
if (!isRecoverable(e, requestSendStarted)) return false;
// No more routes to attempt. 没有更多route发起重试
if (!streamAllocation.hasMoreRoutes()) return false;
// For failure recovery, use the same route selector with a new connection.
return true;
}
在该方法中,首先是通过调用streamAllocation.streamFailed(e)
来记录该次异常,进而在RouteDatabase
中记录错误的route以降低优先级,避免下次相同address的请求依然使用这个失败过的route。如果没有更多可用的连接线路则不能重试连接
public final class RouteDatabase {
private final Set<Route> failedRoutes = new LinkedHashSet<>();
/** Records a failure connecting to {@code failedRoute}. */
public synchronized void failed(Route failedRoute) {
failedRoutes.add(failedRoute);
}
/** Records success connecting to {@code route}. */
public synchronized void connected(Route route) {
failedRoutes.remove(route);
}
/** Returns true if {@code route} has failed recently and should be avoided. */
public synchronized boolean shouldPostpone(Route route) {
return failedRoutes.contains(route);
}
}
接着我们重点再关注isRecoverable
方法:
private boolean isRecoverable(IOException e, boolean requestSendStarted) {
// If there was a protocol problem, don't recover. 协议错误不再重试
if (e instanceof ProtocolException) {
return false;
}
// If there was an interruption don't recover, but if there was a timeout connecting to a route
// we should try the next route (if there is one)
if (e instanceof InterruptedIOException) {
return e instanceof SocketTimeoutException && !requestSendStarted;
}
// Look for known client-side or negotiation errors that are unlikely to be fixed by trying
// again with a different route.
if (e instanceof SSLHandshakeException) {
// If the problem was a CertificateException from the X509TrustManager,
// do not retry.
if (e.getCause() instanceof CertificateException) {
return false;
}
}
//使用 HostnameVerifier 来验证 host 是否合法,如果不合法会抛出 SSLPeerUnverifiedException
// 握手HandShake#getSeesion 抛出的异常,属于握手过程中的一环
if (e instanceof SSLPeerUnverifiedException) {
// e.g. a certificate pinning error.
return false;
}
// An example of one we might want to retry with a different route is a problem connecting to a
// proxy and would manifest as a standard IOException. Unless it is one we know we should not
// retry, we return true and try a new route.
return true;
}
常见网络异常分析:
UnknowHostException
产生原因:
- 网络中断
- DNS 服务器故障
- 域名解析劫持
解决办法:
- HttpDNS
- 合理的兜底策略
![Uploading image_079055.png . . .]
InterruptedIOException
产生原因:
- 请求读写阶段,请求线程被中断
解决办法:
- 检查是否符合业务逻辑
SocketTimeoutException
产生原因:
- 带宽低、延迟高
- 路径拥堵、服务端负载吃紧
- 路由节点临时异常
解决办法:
- 合理设置重试
- 切换ip重试
要特别注意: 请求时因为读写超时等原因产生的SocketTimeoutException,OkHttp内部是不会重试的
因此如果app层特别关心该异常,则应该自定义intercetors,对该异常进行特殊处理。
SSLHandshakeException
产生原因:
- Tls协议协商失败/握手格式不兼容
- 办法服务器证书的CA未知
- 服务器证书不是由CA签名的,而是自签名
- 服务器配置缺少中间CA(不完整的证书链)
- 服务器主机名不匹配(SNI);
- 遭遇了中间人攻击。
解决办法:
- 指定SNI
- 证书锁定
- 降级Http。。。
- 联系SA
SSLPeerUnverifiedException
产生原因:
- 证书域名校验错误
解决办法:
- 指定SNI
- 证书锁定
- 降级Http。。。
- 联系SA