NameNode日志中频繁出现rename失败的日志,且频繁GC
2019-10-22 12:18:15,826 WARN hdfs.StateChange (FSDirRenameOp.java:validateRenameSource(559)) - DIR* FSDirectory.unprotectedRenameTo: rename source /apps/hive/warehouse/zs_db.db/umetrip_client_all/.hive-staging_hive_2019-10-21_15-05-03_662_6055882197773119796-53816/-ext-10000/000080_0 is not found.
2019-10-22 12:18:15,827 WARN hdfs.StateChange (FSDirRenameOp.java:validateRenameSource(559)) - DIR* FSDirectory.unprotectedRenameTo: rename source /apps/hive/warehouse/zs_db.db/umetrip_client_all/.hive-staging_hive_2019-10-21_15-05-03_662_6055882197773119796-53816/-ext-10000/000080_0 is not found.
2019-10-22 12:18:15,827 WARN hdfs.StateChange (FSDirRenameOp.java:validateRenameSource(559)) - DIR* FSDirectory.unprotectedRenameTo: rename source /apps/hive/warehouse/zs_db.db/umetrip_client_all/.hive-staging_hive_2019-10-21_15-05-03_662_6055882197773119796-53816/-ext-10000/000080_0 is not found.
2019-10-22 12:18:15,828 WARN hdfs.StateChange (FSDirRenameOp.java:validateRenameSource(559)) - DIR* FSDirectory.unprotectedRenameTo: rename source /apps/hive/warehouse/zs_db.db/umetrip_client_all/.hive-staging_hive_2019-10-21_15-05-03_662_6055882197773119796-53816/-ext-10000/000080_0 is not found.
2019-10-22 12:18:15,828 WARN hdfs.StateChange (FSDirRenameOp.java:validateRenameSource(559)) - DIR* FSDirectory.unprotectedRenameTo: rename source /apps/hive/warehouse/zs_db.db/umetrip_client_all/.hive-staging_hive_2019-10-21_15-05-03_662_6055882197773119796-53816/-ext-10000/000080_0 is not found.
根据日志发现查尝试rename 文件/apps/hive/warehouse/zs_db.db/umetrip_client_all/.hive-staging_hive_2019-10-21_15-05-03_662_6055882197773119796-53816/-ext-10000/000080_0 ,但该文件并不存在。
根据路径怀疑是Hive任务操作,但集群中并无运行中的Hive任务。类似
https://issues.apache.org/jira/browse/HIVE-7273。
解决方法:
创建文件/apps/hive/warehouse/zs_db.db/umetrip_client_all/.hive-staging_hive_2019-10-21_15-05-03_662_6055882197773119796-53816/-ext-10000/000080_0
创建该文件后,NameNode中相关日志消失,且GC频率恢复正常。
/apps/hive/warehouse/zs_db.db/umetrip_client_all/ 目录下多出了文件000080_0_copy_131756330
drwxr-xr-x - umecron hadoop 0 2019-10-22 19:19 /apps/hive/warehouse/zs_db.db/umetrip_client_all/.hive-staging_hive_2019-10-21_15-05-03_662_6055882197773119796-53816
-rw-r--r-- 3 umecron hadoop 0 2019-10-22 19:21 /apps/hive/warehouse/zs_db.db/umetrip_client_all/000080_0_copy_131756330
可能原因是运行在Yarn集群中的Hive任务中的Reducer线程遗留了下来,该线程卡死在尝试重命名不存在的文件。手工创建该文件后,重名成功,遗留线程执行结束。