环境
OS: SUSE Linux Enterprise Server 12 SP2
DOCKER: 1.12.6
KERNEL: 4.4.59-92.20-default
RANCHER: v1.6.2
问题
2018年1月某日,在测试环境中发现服务器出现
kernel:[1854773.108055] unregister_netdevice: waiting for eth0 to become free. Usage count = 1
临时解决办法是: reboot 😅
排查过程
依据报错信息很快找到这个bug,open时间是opened this issue on 6 May 2014
https://github.com/moby/moby/issues/5618
(现在这个问题貌似解决了,但是那时是1月)在后来的日子里,此报错信息还伴随着,cpu负载变高,docker ps命令hang,等“杂音”
有人专门针对此问题给出了重现方法
https://github.com/fho/docker-samba-loop
在上面的操作系统内核版本上可以重现
kernel:[1598.704278] unregister_netdevice: waiting for lo to become free. Usage count = 1
如果修改dockerfile,追加命令
sleep 10
则不会有kernel 报错信息出现,可能是等待的过程网络连接正常关闭
- 此次bug 在kernel 4.4.114 上修复了
https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.4.114
commit edaafa805e0f9d09560a4892790b8e19cab8bf09
Author: Dan Streetman <ddstreet@ieee.org>
Date: Thu Jan 18 16:14:26 2018 -0500
net: tcp: close sock if net namespace is exiting
[ Upstream commit 4ee806d51176ba7b8ff1efd81f271d7252e03a1d ]
When a tcp socket is closed, if it detects that its net namespace is
exiting, close immediately and do not wait for FIN sequence.
For normal sockets, a reference is taken to their net namespace, so it will
never exit while the socket is open. However, kernel sockets do not take a
reference to their net namespace, so it may begin exiting while the kernel
socket is still open. In this case if the kernel socket is a tcp socket,
it will stay open trying to complete its close sequence. The sock's dst(s)
hold a reference to their interface, which are all transferred to the
namespace's loopback interface when the real interfaces are taken down.
When the namespace tries to take down its loopback interface, it hangs
waiting for all references to the loopback interface to release, which
results in messages like:
unregister_netdevice: waiting for lo to become free. Usage count = 1
These messages continue until the socket finally times out and closes.
Since the net namespace cleanup holds the net_mutex while calling its
registered pernet callbacks, any new net namespace initialization is
blocked until the current net namespace finishes exiting.
After this change, the tcp socket notices the exiting net namespace, and
closes immediately, releasing its dst(s) and their reference to the
loopback interface, which lets the net namespace continue exiting.
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811
Signed-off-by: Dan Streetman <ddstreet@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
升级后,重试步骤3,不再出现报错
验证
在生产环境中升级了一个操作系统kernel 到4.4.114,但是发现问题依旧。
问题可能出现在,lo? eth0?
后续
待续