HDFS 客户端常见报错整理

概述

HDFS 客户端在使用过程中,有下面两个过程:

  1. 向 NameNode 进行 RPC 请求
  2. 向 DataNode 进行 IO 读写。
    无论哪个过程,如果出现异常,一般都不会导致业务失败,也都有重试机制,实际上,业务想要失败是很难的。在实际使用过程中,客户端和 NN 之间的 RPC 交互一般不会有什么报错,大部分报错都出现在和 DN 的 IO 交互过程中,这篇文章主要总结一下常见的 DN IO 报错。

客户端常见的 IO 报错

  1. 客户端写过程中,因为种种原因,无法成功建立流水线,此时会放弃出错的 DN,重新申请新的 DN 并建立流水线,几个典型的情况如下:

    1. 流水线中的第一个 DN 挂掉,导致流水线建立失败,由于这个 DN 是和客户端直连的,因此客户端能拿到具体的出错原因:

    21/02/22 15:34:23 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741830_1006
    java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:541)
    at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:254)
    at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1740)
    at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
    at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
    21/02/22 15:34:23 WARN hdfs.DataStreamer: Abandoning BP-239523849-192.168.202.11-1613727437316:blk_1073741830_1006
    21/02/22 15:34:23 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]

    1. 流水线中的第一个 DN 负载超标,导致流水线建立失败,日志如下:

    21/02/22 16:03:12 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741842_1019
    java.io.EOFException: Unexpected EOF while trying to read response from server
    at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:461)
    at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1776)
    at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
    at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
    21/02/22 16:03:12 WARN hdfs.DataStreamer: Abandoning BP-239523849-192.168.202.11-1613727437316:blk_1073741842_1019
    21/02/22 16:03:12 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]

    1. 流水线中的其它 DN 出现问题(挂掉或者负载超标),导致流水线建立失败,由于这些 DN 并不和客户端直连,因此客户端往往拿不到具体的出错原因,只能知道出错 DN 的 IP:

    21/02/22 15:51:21 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741835_1012
    java.io.IOException: Got error, status=ERROR, status message , ack with firstBadLink as 192.168.202.12:9003
    at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:121)
    at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1792)
    at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
    at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
    21/02/22 15:51:21 WARN hdfs.DataStreamer: Abandoning BP-239523849-192.168.202.11-1613727437316:blk_1073741835_1012
    21/02/22 15:51:21 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.12:9003,DS-b76f5779-927e-4f8c-b4fe-9db592ecadfa,DISK]

    1. 由于某些原因(如 DN IO 并发负载实在太高导致严重争锁),导致流水线建立超时(3副本默认75s 超时),日志如下:

    21/06/17 15:51:28 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073742830_2006
    java.net.SocketTimeoutException: 75000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.13:56994 remote=/192.168.202.13:9003]
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
    at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1776)
    at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
    at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
    21/06/17 15:51:28 WARN hdfs.DataStreamer: Abandoning BP-358940719-192.168.202.11-1623894544733:blk_1073742830_2006
    21/06/17 15:51:28 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.13:50010,DS-5bfd7a2e-9963-40b0-9f5d-50ffecde15c1,DISK]

  2. 写过程中,流水线中的某个 DN 突然挂掉,此时会进行一次错误恢复,日志如下:

21/02/22 15:47:39 WARN hdfs.DataStreamer: Exception for BP-239523849-192.168.202.11-1613727437316:blk_1073741834_1010
java.io.EOFException: Unexpected EOF while trying to read response from server
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:461)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1092)
21/02/22 15:47:39 WARN hdfs.DataStreamer: Error Recovery for BP-239523849-192.168.202.11-1613727437316:blk_1073741834_1010 in pipeline [DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK], DatanodeInfoWithStorage[192.168.202.14:9003,DS-6424283e-fad1-4b9a-aaed-dc6683e55a4d,DISK], DatanodeInfoWithStorage[192.168.202.13:9003,DS-c211a421-b13d-4d46-9c28-e52426509f8a,DISK]]: datanode 0(DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]) is bad.

  1. 写过程中,若客户端写一个 packet 之后,收到 ack(即回应)的时间超过 30s,则打印慢速告警,日志如下:

[2021-06-17 15:22:58,929] WARN Slow ReadProcessor read fields took 37555ms (threshold=30000ms); ack: seqno: 343 reply: SUCCESS reply: SUCCESS reply: SUCCESS downstreamAckTimeNanos: 16503757088 flag: 0 flag: 0 flag: 0, targets: [DatanodeInfoWithStorage[9.10.146.124:9003,DS-cdab7fb8-c6ec-4f6b-8b6a-2a0c92aed6b6,DISK], DatanodeInfoWithStorage[9.10.146.98:9003,DS-346a7f42-4b12-4bac-8e58-8b33d972eb79,DISK], DatanodeInfoWithStorage[9.180.22.26:9003,DS-ad6cbeb4-9ce8-495b-b978-5c7aac66686f,DISK]]

  1. 写过程中,若客户端写一个 packet 之后 75s 还未收到 ack(2副本写的阈值为70s),则超时出错,并开始错误恢复,日志如下:

21/02/22 16:09:35 WARN hdfs.DataStreamer: Exception for BP-239523849-192.168.202.11-1613727437316:blk_1073741844_1021
java.net.SocketTimeoutException: 75000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:44868 remote=/192.168.202.11:9003]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1092)
21/02/22 16:09:35 WARN hdfs.DataStreamer: Error Recovery for BP-239523849-192.168.202.11-1613727437316:blk_1073741844_1021 in pipeline [DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK], DatanodeInfoWithStorage[192.168.202.13:9003,DS-c211a421-b13d-4d46-9c28-e52426509f8a,DISK], DatanodeInfoWithStorage[192.168.202.14:9003,DS-6424283e-fad1-4b9a-aaed-dc6683e55a4d,DISK]]: datanode 0(DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]) is bad.

  1. 关闭文件(或手动调用 hflush、hsync)时,客户端会将尚未写入集群的数据全部 flush 到集群,如果 flush 的时间超过 30s,则打印慢速告警,日志如下:

20/12/15 11:22:25 WARN DataStreamer: Slow waitForAckedSeqno took 45747ms (threshold=30000ms). File being written: /stage/interface/TEG/g_teg_common_teg_plan_bigdata/plan/exportBandwidth/origin/company/2020/1215/1059.parquet/_temporary/0/_temporary/attempt_20201215112121_0008_m_000021_514/part-00021-94e67782-be1b-48ae-b736-204624fa498c-c000.snappy.parquet, block: BP-1776336001-100.76.59.150-1482408994930:blk_16194984410_15220615717, Write pipeline datanodes: [DatanodeInfoWithStorage[100.76.29.36:9003,DS-4a301194-a232-46c6-b606-44b15a83ebed,DISK], DatanodeInfoWithStorage[100.76.60.168:9003,DS-24645191-aa52-4643-9c97-213b2a0bb41d,DISK], DatanodeInfoWithStorage[100.76.60.160:9003,DS-27ca6eb7-75b9-47a2-ae9d-de6d720f4d9a,DISK]].

  1. 写过程中,由于 DN 增量上报太慢,导致客户端没法及时分配出新的 block,会打印一些日志,并重试:

21/02/22 16:16:53 INFO hdfs.DFSOutputStream: Exception while adding a block
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException): Not replicated yet: /a.COPYING
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2572)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:885)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:540)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:448)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:806)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2286)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2541)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1520)
at org.apache.hadoop.ipc.Client.call(Client.java:1466)
at org.apache.hadoop.ipc.Client.call(Client.java:1376)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:472)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1074)
at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1880)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1683)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
21/02/22 16:16:53 WARN hdfs.DFSOutputStream: NotReplicatedYetException sleeping /a.COPYING retries left 4

  1. 写过程中,由于 DN 增量上报太慢,导致客户端无法及时 close 文件,会打印一些日志,并重试:

2021-02-22 16:19:23,259 INFO hdfs.DFSClient: Could not complete /a.txt retrying...

  1. 读过程中,若目标 DN 已经提前挂掉,会打印一些连接异常日志,然后尝试其他 DN:

21/02/22 16:29:33 WARN impl.BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:541)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3039)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:814)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:739)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
at java.io.DataInputStream.read(DataInputStream.java:100)
at a.a.TestWrite.main(TestWrite.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
21/02/22 16:29:33 WARN hdfs.DFSClient: Failed to connect to /192.168.202.11:9003 for file /a.txt for block BP-239523849-192.168.202.11-1613727437316:blk_1073741852_1030, add to deadNodes and continue.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:541)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3039)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:814)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:739)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
at java.io.DataInputStream.read(DataInputStream.java:100)
at a.a.TestWrite.main(TestWrite.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
21/02/22 16:29:34 INFO hdfs.DFSClient: Successfully connected to /192.168.202.14:9003 for BP-239523849-192.168.202.11-1613727437316:blk_1073741852_1030

  1. 读过程中,和目标 DN 建立 TCP 连接超时导致出错,会打印一些日志,然后尝试其它 DN:

2021-02-25 23:57:11,000 WARN org.apache.hadoop.hdfs.DFSClient: Connection failure: Failed to connect to /9.10.34.27:9003 for file /data/SPARK/part-r-00320.tfr.gz for block BP-1815681714-100.76.60.19-1523177824331:blk_10324215185_9339836899:org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/9.10.34.27:9003]
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/9.10.34.27:9003]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3450)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:777)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:694)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1173)
at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1094)
at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1449)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1412)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:89)

  1. 读过程中,和目标 DN 建立读取通道时,DN 超时无响应(默认阈值60s),会打印一些日志,然后尝试其它 DN:

21/02/22 16:52:32 WARN impl.BlockReaderFactory: I/O error constructing remote block reader.
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:45318 remote=/192.168.202.11:9003]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:407)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:845)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:742)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at a.a.TestWrite.main(TestWrite.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
21/02/22 16:52:32 WARN hdfs.DFSClient: Failed to connect to /192.168.202.11:9003 for file /a.txt for block BP-239523849-192.168.202.11-1613727437316:blk_1073741891_1069, add to deadNodes and continue.
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:45318 remote=/192.168.202.11:9003]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:407)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:845)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:742)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at a.a.TestWrite.main(TestWrite.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
21/02/22 16:52:32 INFO hdfs.DFSClient: Successfully connected to /192.168.202.14:9003 for BP-239523849-192.168.202.11-1613727437316:blk_1073741891_1069

  1. 读过程中,已经开始传输数据,但传输太慢导致超时(默认阈值60s),会打印一些日志,然后尝试其他 DN:

21/02/22 16:44:30 WARN hdfs.DFSClient: Exception while reading from BP-239523849-192.168.202.11-1613727437316:blk_1073741889_1067 of /a.txt from DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:45254 remote=/192.168.202.11:9003]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:256)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:207)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.readNextPacket(BlockReaderRemote.java:183)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.read(BlockReaderRemote.java:142)
at org.apache.hadoop.hdfs.ByteArrayStrategy.readFromBlock(ReaderStrategy.java:118)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:703)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:764)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)

  1. 读过程中,在所有 DN 上都找不到目标 block(即遇到了 missing block),报错如下:

2021-02-22 16:57:59,009 WARN hdfs.DFSClient: No live nodes contain block BP-239523849-192.168.202.11-1613727437316:blk_1073741893_1071 after checking nodes = [], ignoredNodes = null
2021-02-22 16:57:59,009 WARN hdfs.DFSClient: Could not obtain block: BP-239523849-192.168.202.11-1613727437316:blk_1073741893_1071 file=/a No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException
2021-02-22 16:57:59,010 WARN hdfs.DFSClient: DFS Read
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-239523849-192.168.202.11-1613727437316:blk_1073741893_1071 file=/a
at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1053)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1036)
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1015)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:647)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:926)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:982)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at a.a.TestWrite.main(TestWrite.java:23)

©著作权归作者所有,转载或内容合作请联系作者
禁止转载,如需转载请通过简信或评论联系作者。
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 205,386评论 6 479
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 87,939评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,851评论 0 341
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,953评论 1 278
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,971评论 5 369
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,784评论 1 283
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,126评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,765评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 43,148评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,744评论 2 323
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,858评论 1 333
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,479评论 4 322
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,080评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,053评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,278评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,245评论 2 352
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,590评论 2 343

推荐阅读更多精彩内容