概述
HDFS 客户端在使用过程中,有下面两个过程:
- 向 NameNode 进行 RPC 请求
- 向 DataNode 进行 IO 读写。
无论哪个过程,如果出现异常,一般都不会导致业务失败,也都有重试机制,实际上,业务想要失败是很难的。在实际使用过程中,客户端和 NN 之间的 RPC 交互一般不会有什么报错,大部分报错都出现在和 DN 的 IO 交互过程中,这篇文章主要总结一下常见的 DN IO 报错。
客户端常见的 IO 报错
-
客户端写过程中,因为种种原因,无法成功建立流水线,此时会放弃出错的 DN,重新申请新的 DN 并建立流水线,几个典型的情况如下:
- 流水线中的第一个 DN 挂掉,导致流水线建立失败,由于这个 DN 是和客户端直连的,因此客户端能拿到具体的出错原因:
21/02/22 15:34:23 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741830_1006
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:541)
at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:254)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1740)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
21/02/22 15:34:23 WARN hdfs.DataStreamer: Abandoning BP-239523849-192.168.202.11-1613727437316:blk_1073741830_1006
21/02/22 15:34:23 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]- 流水线中的第一个 DN 负载超标,导致流水线建立失败,日志如下:
21/02/22 16:03:12 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741842_1019
java.io.EOFException: Unexpected EOF while trying to read response from server
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:461)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1776)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
21/02/22 16:03:12 WARN hdfs.DataStreamer: Abandoning BP-239523849-192.168.202.11-1613727437316:blk_1073741842_1019
21/02/22 16:03:12 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]- 流水线中的其它 DN 出现问题(挂掉或者负载超标),导致流水线建立失败,由于这些 DN 并不和客户端直连,因此客户端往往拿不到具体的出错原因,只能知道出错 DN 的 IP:
21/02/22 15:51:21 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741835_1012
java.io.IOException: Got error, status=ERROR, status message , ack with firstBadLink as 192.168.202.12:9003
at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:121)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1792)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
21/02/22 15:51:21 WARN hdfs.DataStreamer: Abandoning BP-239523849-192.168.202.11-1613727437316:blk_1073741835_1012
21/02/22 15:51:21 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.12:9003,DS-b76f5779-927e-4f8c-b4fe-9db592ecadfa,DISK]- 由于某些原因(如 DN IO 并发负载实在太高导致严重争锁),导致流水线建立超时(3副本默认75s 超时),日志如下:
21/06/17 15:51:28 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073742830_2006
java.net.SocketTimeoutException: 75000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.13:56994 remote=/192.168.202.13:9003]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1776)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
21/06/17 15:51:28 WARN hdfs.DataStreamer: Abandoning BP-358940719-192.168.202.11-1623894544733:blk_1073742830_2006
21/06/17 15:51:28 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.13:50010,DS-5bfd7a2e-9963-40b0-9f5d-50ffecde15c1,DISK] 写过程中,流水线中的某个 DN 突然挂掉,此时会进行一次错误恢复,日志如下:
21/02/22 15:47:39 WARN hdfs.DataStreamer: Exception for BP-239523849-192.168.202.11-1613727437316:blk_1073741834_1010
java.io.EOFException: Unexpected EOF while trying to read response from server
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:461)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1092)
21/02/22 15:47:39 WARN hdfs.DataStreamer: Error Recovery for BP-239523849-192.168.202.11-1613727437316:blk_1073741834_1010 in pipeline [DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK], DatanodeInfoWithStorage[192.168.202.14:9003,DS-6424283e-fad1-4b9a-aaed-dc6683e55a4d,DISK], DatanodeInfoWithStorage[192.168.202.13:9003,DS-c211a421-b13d-4d46-9c28-e52426509f8a,DISK]]: datanode 0(DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]) is bad.
- 写过程中,若客户端写一个 packet 之后,收到 ack(即回应)的时间超过 30s,则打印慢速告警,日志如下:
[2021-06-17 15:22:58,929] WARN Slow ReadProcessor read fields took 37555ms (threshold=30000ms); ack: seqno: 343 reply: SUCCESS reply: SUCCESS reply: SUCCESS downstreamAckTimeNanos: 16503757088 flag: 0 flag: 0 flag: 0, targets: [DatanodeInfoWithStorage[9.10.146.124:9003,DS-cdab7fb8-c6ec-4f6b-8b6a-2a0c92aed6b6,DISK], DatanodeInfoWithStorage[9.10.146.98:9003,DS-346a7f42-4b12-4bac-8e58-8b33d972eb79,DISK], DatanodeInfoWithStorage[9.180.22.26:9003,DS-ad6cbeb4-9ce8-495b-b978-5c7aac66686f,DISK]]
- 写过程中,若客户端写一个 packet 之后 75s 还未收到 ack(2副本写的阈值为70s),则超时出错,并开始错误恢复,日志如下:
21/02/22 16:09:35 WARN hdfs.DataStreamer: Exception for BP-239523849-192.168.202.11-1613727437316:blk_1073741844_1021
java.net.SocketTimeoutException: 75000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:44868 remote=/192.168.202.11:9003]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1092)
21/02/22 16:09:35 WARN hdfs.DataStreamer: Error Recovery for BP-239523849-192.168.202.11-1613727437316:blk_1073741844_1021 in pipeline [DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK], DatanodeInfoWithStorage[192.168.202.13:9003,DS-c211a421-b13d-4d46-9c28-e52426509f8a,DISK], DatanodeInfoWithStorage[192.168.202.14:9003,DS-6424283e-fad1-4b9a-aaed-dc6683e55a4d,DISK]]: datanode 0(DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]) is bad.
- 关闭文件(或手动调用 hflush、hsync)时,客户端会将尚未写入集群的数据全部 flush 到集群,如果 flush 的时间超过 30s,则打印慢速告警,日志如下:
20/12/15 11:22:25 WARN DataStreamer: Slow waitForAckedSeqno took 45747ms (threshold=30000ms). File being written: /stage/interface/TEG/g_teg_common_teg_plan_bigdata/plan/exportBandwidth/origin/company/2020/1215/1059.parquet/_temporary/0/_temporary/attempt_20201215112121_0008_m_000021_514/part-00021-94e67782-be1b-48ae-b736-204624fa498c-c000.snappy.parquet, block: BP-1776336001-100.76.59.150-1482408994930:blk_16194984410_15220615717, Write pipeline datanodes: [DatanodeInfoWithStorage[100.76.29.36:9003,DS-4a301194-a232-46c6-b606-44b15a83ebed,DISK], DatanodeInfoWithStorage[100.76.60.168:9003,DS-24645191-aa52-4643-9c97-213b2a0bb41d,DISK], DatanodeInfoWithStorage[100.76.60.160:9003,DS-27ca6eb7-75b9-47a2-ae9d-de6d720f4d9a,DISK]].
- 写过程中,由于 DN 增量上报太慢,导致客户端没法及时分配出新的 block,会打印一些日志,并重试:
21/02/22 16:16:53 INFO hdfs.DFSOutputStream: Exception while adding a block
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException): Not replicated yet: /a.COPYING
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2572)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:885)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:540)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:448)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:806)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2286)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2541)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1520)
at org.apache.hadoop.ipc.Client.call(Client.java:1466)
at org.apache.hadoop.ipc.Client.call(Client.java:1376)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:472)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1074)
at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1880)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1683)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
21/02/22 16:16:53 WARN hdfs.DFSOutputStream: NotReplicatedYetException sleeping /a.COPYING retries left 4
- 写过程中,由于 DN 增量上报太慢,导致客户端无法及时 close 文件,会打印一些日志,并重试:
2021-02-22 16:19:23,259 INFO hdfs.DFSClient: Could not complete /a.txt retrying...
- 读过程中,若目标 DN 已经提前挂掉,会打印一些连接异常日志,然后尝试其他 DN:
21/02/22 16:29:33 WARN impl.BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:541)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3039)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:814)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:739)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
at java.io.DataInputStream.read(DataInputStream.java:100)
at a.a.TestWrite.main(TestWrite.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
21/02/22 16:29:33 WARN hdfs.DFSClient: Failed to connect to /192.168.202.11:9003 for file /a.txt for block BP-239523849-192.168.202.11-1613727437316:blk_1073741852_1030, add to deadNodes and continue.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:541)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3039)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:814)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:739)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
at java.io.DataInputStream.read(DataInputStream.java:100)
at a.a.TestWrite.main(TestWrite.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
21/02/22 16:29:34 INFO hdfs.DFSClient: Successfully connected to /192.168.202.14:9003 for BP-239523849-192.168.202.11-1613727437316:blk_1073741852_1030
- 读过程中,和目标 DN 建立 TCP 连接超时导致出错,会打印一些日志,然后尝试其它 DN:
2021-02-25 23:57:11,000 WARN org.apache.hadoop.hdfs.DFSClient: Connection failure: Failed to connect to /9.10.34.27:9003 for file /data/SPARK/part-r-00320.tfr.gz for block BP-1815681714-100.76.60.19-1523177824331:blk_10324215185_9339836899:org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/9.10.34.27:9003]
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/9.10.34.27:9003]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3450)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:777)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:694)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1173)
at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1094)
at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1449)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1412)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:89)
- 读过程中,和目标 DN 建立读取通道时,DN 超时无响应(默认阈值60s),会打印一些日志,然后尝试其它 DN:
21/02/22 16:52:32 WARN impl.BlockReaderFactory: I/O error constructing remote block reader.
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:45318 remote=/192.168.202.11:9003]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:407)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:845)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:742)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at a.a.TestWrite.main(TestWrite.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
21/02/22 16:52:32 WARN hdfs.DFSClient: Failed to connect to /192.168.202.11:9003 for file /a.txt for block BP-239523849-192.168.202.11-1613727437316:blk_1073741891_1069, add to deadNodes and continue.
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:45318 remote=/192.168.202.11:9003]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:407)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:845)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:742)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at a.a.TestWrite.main(TestWrite.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
21/02/22 16:52:32 INFO hdfs.DFSClient: Successfully connected to /192.168.202.14:9003 for BP-239523849-192.168.202.11-1613727437316:blk_1073741891_1069
- 读过程中,已经开始传输数据,但传输太慢导致超时(默认阈值60s),会打印一些日志,然后尝试其他 DN:
21/02/22 16:44:30 WARN hdfs.DFSClient: Exception while reading from BP-239523849-192.168.202.11-1613727437316:blk_1073741889_1067 of /a.txt from DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:45254 remote=/192.168.202.11:9003]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:256)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:207)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.readNextPacket(BlockReaderRemote.java:183)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.read(BlockReaderRemote.java:142)
at org.apache.hadoop.hdfs.ByteArrayStrategy.readFromBlock(ReaderStrategy.java:118)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:703)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:764)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
- 读过程中,在所有 DN 上都找不到目标 block(即遇到了 missing block),报错如下:
2021-02-22 16:57:59,009 WARN hdfs.DFSClient: No live nodes contain block BP-239523849-192.168.202.11-1613727437316:blk_1073741893_1071 after checking nodes = [], ignoredNodes = null
2021-02-22 16:57:59,009 WARN hdfs.DFSClient: Could not obtain block: BP-239523849-192.168.202.11-1613727437316:blk_1073741893_1071 file=/a No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException
2021-02-22 16:57:59,010 WARN hdfs.DFSClient: DFS Read
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-239523849-192.168.202.11-1613727437316:blk_1073741893_1071 file=/a
at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1053)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1036)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1015)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:647)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:926)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:982)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at a.a.TestWrite.main(TestWrite.java:23)