问题描述
在NFV平台上,有一个客户的流量突然断了(VM使用SRIOV收发包,网卡类型为intel的XL710),重启应用或者VM都不能恢复,只能重启VM所在的host才能恢复。经过排查在host上的dmesg发现如下关于此网卡的log
[3703223.514901] i40e 0000:81:00.1: TX driver issue detected, PF reset issued
[3703223.514913] i40e 0000:81:00.1: TX driver issue detected on VF 1
从打印出来的log看应该是VF上发生了某些事件被PF的kernel driver捕捉到,PF的kernel driver将VF和PF同时reset了。
那么哪些事件会导致这个问题呢?从网卡的datasheet手册可以看到,在发送方向下面的这些事件会被认为是恶意事件。
这里就不卖关子了,排查的过程是没想象中顺利的,在客户的这个环境上是因为发送报文长度小于17字节导致的,即上图的wrong size类型。
下面模拟客户的环境复现一下这个问题,拓扑如下所示
使用DPDK-pktgen在81:00.0网卡上发送报文,目的mac为对端网卡81:00.1的VF1的mac。VF1和VF2透传给VM,在VM内部启动DPDK的l2fwd例子从VF1收包并将报文从VF2转发出去,l2fwd还会将报文长度修改为小于17字节的数字,这样从VF2发出去的报文就会导致VF和PF reset。
VF透传配置
查看当前host上的两个X710网卡及其驱动版本
# lspci | grep net
81:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
81:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
# ethtool -i enp129s0f1
driver: i40e
version: 2.7.29
firmware-version: 6.01 0x80003483 1.1747.0
expansion-rom-version:
bus-info: 0000:81:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
在enp129s0f1网卡上生成两个VF
//generate 2 VF on 81:00.1
echo 2 > /sys/bus/pci/devices/0000\:81\:00.1/sriov_numvfs
//PF up起来,否则其VF透彻给VM时会报错
ip link set dev enp129s0f1 up
ip link set dev enp129s0f1 promisc on
//查看生成了两个VF
# lspci | grep net
81:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
81:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
81:0a.0 Ethernet controller: Intel Corporation XL710/X710 Virtual Function (rev 02)
81:0a.1 Ethernet controller: Intel Corporation XL710/X710 Virtual Function (rev 02)
将如下配置加入到VM的XML配置文件,启动VM后就会自动将上面生成的两个VF透传给VM
<interface type='hostdev' managed='yes'>
<mac address='52:54:00:04:94:dd'/>
<source>
<address type='pci' domain='0x0000' bus='0x81' slot='0x0a' function='0x0'/>
</source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/>
</interface>
<interface type='hostdev' managed='yes'>
<mac address='52:54:00:84:e4:34'/>
<source>
<address type='pci' domain='0x0000' bus='0x81' slot='0x0a' function='0x1'/>
</source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x0c' function='0x0'/>
</interface>
启动VM后,可查看到VM的接口列表,多了两个hostdev类型的接口
c# virsh domiflist ubuntu18
Interface Type Source Model MAC
-------------------------------------------------------
vnet0 network default rtl8139 52:54:00:0e:e8:45
- hostdev - - 52:54:00:04:94:dd
- hostdev - - 52:54:00:84:e4:34
设置pktgen
//pktgen需要依赖DPDK的lib,所以先编译DPDK
mount -t hugetlbfs none /mnt/huge -o pagesize=2MB
echo 500 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
cd /root/dpdk-stable-17.05.2
export RTE_TARGET=build
export RTE_SDK=`pwd`
make config T=x86_64-native-linuxapp-gcc
make
modprobe uio
insmod build/kmod/igb_uio.ko
//将网卡绑定到igb_uio
./usertools/dpdk-devbind.py -b igb_uio 0000:81:00.0
//编译pktgen
export RTE_SDK=/root/dpdk-stable-17.05.2
export RTE_TARGET=build
cd /root/pktgen-3.2.11/
make
//运行pktgen
./app/build/pktgen -l 0-2 -n 3 -w 0000:81:00.0 -- -P -m "[1].0"
//设置发送报文的目的mac为第一个VF的mac
set 0 dst mac 52:54:00:04:94:dd
//设置发送报文个数
set 0 count 1
//设置发送报文的大小
set 0 size 64
//启动port 0开始发送
start 0
在VM里面的配置
需要运行DPDK的例子l2fwd来收发包
mount -t hugetlbfs none /mnt/huge -o pagesize=2MB
echo 500 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
cd /root/dpdk-stable-18.11.2/
export RTE_TARGET=build
export RTE_SDK=`pwd`
make config T=x86_64-native-linuxapp-gcc
make
modprobe uio
insmod build/kmod/igb_uio.ko
./usertools/dpdk-devbind.py -b igb_uio 0000:00:0b.0
./usertools/dpdk-devbind.py -b igb_uio 0000:00:0c.0
cd examples/l2fwd
make
./build/l2fwd -cf -n4 -- -p3 -T 100
为了让发送报文长度变小,需要修改l2fwd源码将mbuf->data_len修改为小于17的任何数字,驱动在发送报文时以data_len为准。
int count = 0;
static void
l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
{
unsigned dst_port;
int sent;
struct rte_eth_dev_tx_buffer *buffer;
dst_port = l2fwd_dst_ports[portid];
if (mac_updating)
l2fwd_mac_updating(m, dst_port);
//将报文长度修改为1
m->data_len = 1;
buffer = tx_buffer[dst_port];
sent = rte_eth_tx_buffer(dst_port, 0, buffer, m);
if (sent)
port_statistics[dst_port].tx += sent;
}
打开PF kernel driver log开关
在复现前,还可以打开PF kernel driver的log开关查看到更多的信息
# ethtool -s enp129s0f1 msglvl 0x0080
...
# dmesg -c
[3704218.409935] i40e 0000:81:00.1: Malicious Driver Detection event 0x00 on TX queue 69 PF number 0x01 VF number 0x41
[3704218.409941] i40e 0000:81:00.1: TX driver issue detected, PF reset issued
[3704218.409947] i40e 0000:81:00.1: TX driver issue detected on VF 1
解决办法
解决办法也很简单,应用在发包时需要判断报文长度,小于17字节的报文认为是非法报文,简单丢弃即可。