一、说明
- 本篇主要描述BGP EVPN VxLAN VNI间(同租户,不同VNI)互通的控制层面操作,同时也描述了BGP EVPN VxLAN相同VNI内主机互通的过程;
- 本篇也描述了数据层面的转发过程;
- 本篇网络拓扑和配置信息全部基于前两篇“4 基于BGP EVPN实现Cisco VxLAN实验 & 分布式任播网关”和“5 基于BGP EVPN实现Cisco VxLAN控制层面之MAC学习”;
- 本篇新加了ARP抑制配置,另外与之前不同,本篇VRF名称由"Tenant-A"变更为"ta"。
二、拓扑
三、控制层面操作
3.1 MAC-IP学习过程
- 本节详细介绍了本端VTEP交换机如何从终端主机生成的免费ARP消息中了解其本地连接的主机的IP地址,以及Host Mobility Manager(HMM-主机移动管理器)组件如何将信息装载进相关VNI的L2RIB中(保留MAC-IP地址信息的L2RIB数据库也被称为IP VRF);
- 本节展示了如何使用BGP EVPN Route Type 2(MAC/MAC-IP通告路由)将路由从L2RIB导出到BGP Loc-RIB,再通过BGP Adj-RIB-Out通告给远端VTEP交换机;
- 本节展示了路由信息如何最终到达远端VTEP的L2RIB中。
3.1.1 本端VTEP的ARP学习
- PC1启动后,它会发送Gratuitous ARP(GARP-免费ARP)来验证其IP地址的唯一性,VTEP交换机Leaf-1从接口E1/3接收到GARP消息,并将来自PC1 MAC的MAC-IP地址绑定信息和来自GARP有效载荷的PC1 IP字段装载进ARP表中;
- 下方展示了VRF ta的ARP表。在NX-OS中,本地学习的ARP条目的默认老化时间为1500秒,比MAC地址老化计时器短300秒。当ARP老化计时器超时后,交换机会通过向主机发送ARP请求来检查主机的存在。如果主机响应ARP请求,则交换机将重置老化计时器。如果主机未响应ARP请求,则该条目将从ARP表中删除,但在发送删除消息之前,会在BGP EVPN表中额外保留1800秒(MAC老化计时器)。MAC地址老化定时器应大于ARP老化定时器,这是因为ARP刷新进程还将更新MAC表,并且可以避免不必要的泛洪。
Leaf-1# sh ip arp vrf ta
Flags: * - Adjacencies learnt on non-active FHRP router
+ - Adjacencies synced via CFSoE
# - Adjacencies Throttled for Glean
CP - Added via L2RIB, Control plane Adjacencies
PS - Added via L2RIB, Peer Sync
RO - Re-Originated Peer Sync Entry
D - Static Adjacencies attached to down interface
IP ARP Table for context ta
Total number of entries: 1
Address Age MAC Address Interface Flags
172.16.1.1 00:02:00 0050.7966.6806 Vlan10
3.1.2 本端VTEP的MAC-IP
- 主机移动管理器组件(HMM)将MAC-IP信息作为本地路由进行学习;
- HMM将信息装载进本地主机数据库中,并将MAC-IP信息转发到L2RIB;
- 本地主机数据库包含有关IP地址(/32)、MAC地址、SVI和本地接口的信息。L2RIB中具有相同的信息(除了没有SVI外);
- 下方展示了Leaf-1上部分MAC-IP的学习过程;
Leaf-1# show system internal l2rib event-history mac-ip
L2RIB MAC-IP Object Event Logs:
[10/12/20 14:25:31.870 CST 1 29704] Rcvd MAC-IP ROUTE BASE msg: obj_type: 13 oper_type: 1 oper_sbtype: 0 producer: 12
[10/12/20 14:25:31.870 CST 2 29704] Rcvd MAC-IP ROUTE msg: (10, 0050.7966.6806, 172.16.1.1), l2 vni 0, l3 vni 13960,
[10/12/20 14:25:31.870 CST 3 29704] Rcvd MAC-IP ROUTE msg: flags , admin_dist 7, seq 0, soo 0, peerid 0,
[10/12/20 14:25:31.870 CST 4 29704] Rcvd MAC-IP ROUTE msg: res 0, esi (F), ifindex 0, nh_count 0, pc-ifindex 0
[10/12/20 14:25:31.871 CST 5 29704] (10,0050.7966.6806,172.16.1.1):MAC-IP entry created
[10/12/20 14:25:31.871 CST 6 29704] (10,0050.7966.6806,172.16.1.1,12):MAC-IP route created with flags 0, l3 vni 13960, seq 0
[10/12/20 14:25:31.871 CST 7 29704] (10,0050.7966.6806,172.16.1.1,12): admin dist 7, soo 0, peerid 0, peer ifindex 0
[10/12/20 14:25:31.871 CST 8 29704] (10,0050.7966.6806,172.16.1.1,12): esi (F), pc-ifindex 0
[10/12/20 14:25:31.875 CST 9 29704] (10,0050.7966.6806,172.16.1.1,12):Encoding MAC-IP best route (ADD, client id 5), esi: (F)
- 下方展示了Leaf-上VRF ta的本地主机数据库中与PC1的MAC-IP相关绑定信息;
Leaf-1# show fabric forwarding ip local-host-db vrf ta
HMM host IPv4 routing table information for VRF ta
Status: *-valid, x-deleted, D-Duplicate, DF-Duplicate and frozen,
c-cleaned in 00:01:49
Host MAC Address SVI Flags Physical Interface
* 172.16.1.1/32 0050.7966.6806 Vlan10 0x420201 Ethernet1/3
- 下方表明了有关L2RIB下IP VRF中PC1的MAC-IP的信息是由HMM组件产生的
Leaf-1# show l2route mac-ip topology 10 detail
Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv(D):Del Pending (S):Stale (C):Clear
(Ps):Peer Sync (Ro):Re-Originated
Topology Mac Address Prod Flags Seq No Host IP Next-Hops
----------- -------------- ------ ---------- --------------- ---------------
10 0050.7966.6806 HMM -- 0 172.16.1.1 Local
Sent To: BGP
L3-Info: 13960
3.1.3 本端VTEP的BGP路由导出
- VTEP交换机Leaf-1将来自L2RIB的MAC-IP路由装载进BGP Loc-RIB中;
- MAC-IP信息被作为单独的BGP EVPN Route Type 2更新进行通告(使用MAC-only和MAC IP的专用NLRI更新),MAC-only和MAC-IP路由更新携带的NLRI信息的区别在于:MAC-IP通告除了携带主机的MAC地址外,还携带了主机的IP地址、掩码信息以及MPLS标签栈2的信息,该信息定义了VRF ta中使用的L3VNI;
- 另外MAC-IP更新消息中还有两个扩展团体属性,包含RT 65234:13960和路由器MAC 5e00.0000.0007;
- 下方展示了VTEP交换机Leaf-1如何接收MAC-IP路由信息并将其安装到RIB和BGP Loc-RIB中的内部过程,掩码长度包括RD(8×8bit)+MAC地址(6×8bit)+IP地址(4×8bit)=18个8bit即144bit;
Leaf-1# show bgp internal event-history events | in 6806
BRIB:
2020 Oct 12 17:36:36.317231: (default) BRIB: [L2VPN EVPN] Installing prefix 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/144 (local) via 3.3.3.3 label 10010 (0x0/0x0) into BRIB with extcomm Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0007
RIB:
2020 Oct 12 17:36:36.319783: (default) RIB: [L2VPN EVPN] add prefix 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1] (flags 0x1) : OK
, total 1
EVENT:
2020 Oct 12 17:36:36.316899: EVT: Received from L2RIB MAC-IP route: Add ESI 0000.0000.0000.0000.0000 topo 10010 mac 0050.7966.6806 ip 172.16.1.1 L3 VN
I 13960 flags 00000000 soo 0 seq 0, reorig :0
- 下方展示有关PC1的MAC-IP NLRI的BGP Loc-RIB;
Leaf-1# sh bgp l2vpn evpn 172.16.1.1
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 3.3.3.3:32777 (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/272, version 969
Paths: (1 available, best #1)
Flags: (0x000102) (high32 00000000) on xmit-list, is not in l2rib/evpn
Advertised path-id 1
Path type: local, path is valid, is best path
AS-Path: NONE, path locally originated
3.3.3.3 (metric 0) from 0.0.0.0 (3.3.3.3)
Origin IGP, MED not set, localpref 100, weight 32768
Received label 10010 13960
Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0007
Path-id 1 advertised to peers:
1.1.1.1 2.2.2.2
- 上方输出中的前缀信息解释可参考下表;
前缀信息 | 说明 | 备注 |
---|---|---|
2 | BGP EVPN Route-Type 2 | MAC/MAC-IP路由通告 |
0 | Ethernet Segment Identifier (ESI) | 全部置零=单宿主站点 |
0 | Ethernet Tag Id | EVPN路由必须使用0 |
48 | MAC地址长度 | / |
0050.7966.6806 | MAC地址 | / |
32 | IP地址长度 | / |
172.16.1.1 | IP地址 | / |
/272 | MAC-IP VRF NLRI的长度(以bit为单位) | RD(8×8bit) + MAC address(6×8bit) + L2VNI Id(3×8bit) + L3VNI Id(3×8bit) + IP address(4×8bit) + ESI(10×8bit) = 34×8bit即272bits |
- 上方输出中的L2VNI信息显示在“Received label”字段中,另外还有三个BGP扩展团体属性;
BGP扩展团体 | 说明 | 备注 |
---|---|---|
RT:65234:10010 | 用于导出/导入策略(L2VNI) | VNI 10010对应VLAN 10 |
RT:65234:13960 | 用于导出/导入策略(L3VNI) | VNI 13960对应VLAN 3960 |
ENCAP:8 | 定义数据层面的封装类型为VxLAN | / |
Router MAC:5000.0003.0007 | 用于路由数据包的内层MAC头源地址 | 这是必要的,因为VxLAN为MAC in UDP封装机制,并且L3边界上的数据有效负载不携带源主机的MAC地址,所以使用RMAC。 |
3.1.4 远端VTEP的BGP路由导入
- VTEP交换机Leaf-2接收BGP EVPN MAC路由通告并将其装载进BGP Adj-RIB-In数据库中,并且无需进行任何修改;
- Leaf-2从BGP Adj-RIB-In数据库中将路由导入到BGP Loc-RIB,并通过最佳路径选择进程将其装载进L2RIB;
- 当远端VTEP交换机Leaf-2将路由从BGP Adj-RIB装载进BGP Loc-RIB时,它将根据其BGP RID:VLAN ID组合将RD更改为4.4.4.4:32777,此过程与MAC-only路由导入相同,并且基于相同的RT 65234:10010;
- 下方展示了内部导入过程,Leaf-2将接收到的MAC-IP路由装载进RD 3.3.3.3:32777的BGP Adj-RIB-In中,再将此路由导入到RD 4.4.4.4:32777的BGP Adj-RIB-In中,并装载进BGP Loc-RIB中,最后将其导入L2RIB中。请注意,下方输出还包含L3RIB的装载过程;
Leaf-2# show bgp internal event-history events | i 6806
2020 Oct 12 21:52:48.495013: (default) RIB: [L2VPN EVPN]: Send to L2RIB 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:
[0]:[0.0.0.0]/112
2020 Oct 12 21:52:48.494399: (default) RIB: [L2VPN EVPN] For 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0.0.
0]/112, added 0 next hops, suppress 0
2020 Oct 12 21:52:48.494371: (default) RIB: [L2VPN EVPN] Add/delete 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:
[0.0.0.0]/112, flags=0x210, in_rib: yes
2020 Oct 12 21:52:48.493006: (default) BRIB: [L2VPN EVPN] Marking imported path for dest 4.4.4.4:32777:[2]:[0]:[0]:[48]:
[0050.7966.6806]:[0]:[0.0.0.0]/112 as deleted, path ibgp
2020 Oct 12 21:52:48.492893: EVT: [L2VPN EVPN] Deleting imported path [2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0.0.0]
2020 Oct 12 21:52:48.492506: (default) RIB: [L2VPN EVPN] Add/delete 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:
[0.0.0.0]/112, flags=0x200, evi_ctx invalid, in_rib: no
2020 Oct 12 21:52:48.491786: (default) BRIB: [L2VPN EVPN] Marking path for dest 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.796
6.6806]:[0]:[0.0.0.0]/112 from peer 2.2.2.2 as deleted, pflags = 0x40000011, reeval=0
2020 Oct 12 21:52:48.474282: (default) RIB: [L2VPN EVPN] Suppressing 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]
:[0.0.0.0]/112 download to L2RIB
2020 Oct 12 21:52:48.474255: (default) RIB: [L2VPN EVPN] For 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0.0.
0]/112, added 1 next hops, suppress 1
2020 Oct 12 21:52:48.474189: (default) RIB: [L2VPN EVPN] Adding 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0
.0.0]/112 via 3.3.3.3 to NH list (flags2: 0x0)
2020 Oct 12 21:52:48.473909: (default) RIB: [L2VPN EVPN] Add/delete 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:
[0.0.0.0]/112, flags=0x210, in_rib: yes
2020 Oct 12 21:52:48.473593: (default) IMP: [L2VPN EVPN] Import of 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[
0.0.0.0]/112 (EVI: 0) to RD 4.4.4.4:65534 (0) inhibited, no Type2 for EAD-ES import
2020 Oct 12 21:52:48.472917: (default) IMP: [L2VPN EVPN] Importing prefix 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806
]:[0]:[0.0.0.0]/112 to <default> RD 4.4.4.4:32777
2020 Oct 12 21:52:48.466435: (default) RIB: [L2VPN EVPN] Add/delete 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:
[0.0.0.0]/112, flags=0x200, evi_ctx invalid, in_rib: no
2020 Oct 12 21:52:48.465106: (default) BRIB: [L2VPN EVPN] Marking path for dest 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.796
6.6806]:[0]:[0.0.0.0]/112 from peer 1.1.1.1 as deleted, pflags = 0x40000011, reeval=0
2020 Oct 12 21:47:48.453800: (default) RIB: [L2VPN EVPN]: Send to L2RIB 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:
[0]:[0.0.0.0]/112
2020 Oct 12 21:47:48.451605: (default) RIB: [L2VPN EVPN] For 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0.0.
0]/112, added 1 next hops, suppress 0
2020 Oct 12 21:47:48.451584: (default) RIB: [L2VPN EVPN] Adding 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0
.0.0]/112 via 3.3.3.3 to NH list (flags2: 0x0)
2020 Oct 12 21:47:48.451553: (default) RIB: [L2VPN EVPN] Add/delete 4.4.4.4:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:
[0.0.0.0]/112, flags=0x200, in_rib: no
- 下方展示了Leaf-2上BGP-RIB(BRIB)的部分输出(Adj-RIB-In和Loc-RIB)。输出的上半部分描述了从Spine-1接收到的原始、未修改的NLRI,该NLRI装载在Adj-RIB-In中。输出的中间部分显示了已装载进BGP Loc-RIB中并且修改了RD值的相同NLRI,此NLRI基于RT 65234:10010实现路由的正确导入。输出的下半部分显示了与中间部分相同的NLRI(此NLRI与RD 4.4.4.4:3一同装载),它用于VNI间(L3VNI)的流量转发,基于在VRF Context中的配置自动生成的RT 65234:13960导入到相关的L3VNI Loc-RIB。
Leaf-2# show bgp l2vpn evpn 172.16.1.1 vrf ta
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 3.3.3.3:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/272, version 801
Paths: (2 available, best #2)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW
Path type: internal, path is valid, not best reason: Neighbor Address
AS-Path: NONE, path sourced internal to AS
3.3.3.3 (metric 81) from 2.2.2.2 (2.2.2.2)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10010 13960
Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0007
Originator: 3.3.3.3 Cluster list: 2.2.2.2
Advertised path-id 1
Path type: internal, path is valid, is best path
Imported to 3 destination(s)
AS-Path: NONE, path sourced internal to AS
3.3.3.3 (metric 81) from 1.1.1.1 (1.1.1.1)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10010 13960
Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0007
Originator: 3.3.3.3 Cluster list: 1.1.1.1
Path-id 1 not advertised to any peer
Route Distinguisher: 4.4.4.4:32777 (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/272, version 824
Paths: (1 available, best #1)
Flags: (0x000212) (high32 00000000) on xmit-list, is in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path, in rib
Imported from 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/272
AS-Path: NONE, path sourced internal to AS
3.3.3.3 (metric 81) from 1.1.1.1 (1.1.1.1)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10010 13960
Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0007
Originator: 3.3.3.3 Cluster list: 1.1.1.1
Path-id 1 not advertised to any peer
Route Distinguisher: 4.4.4.4:3 (L3VNI 13960)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/272, version 799
Paths: (1 available, best #1)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path
Imported from 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/272
AS-Path: NONE, path sourced internal to AS
3.3.3.3 (metric 81) from 1.1.1.1 (1.1.1.1)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10010 13960
Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0007
Originator: 3.3.3.3 Cluster list: 1.1.1.1
Path-id 1 not advertised to any peer
3.1.5 远端VTEP的IP VRF
- 远端VTEP Leaf-2会验证从NLRI找到的下一跳IP地址的可达性,HMM组件将MAC-IP路由装载进L2RIB中。这时本地拓扑ID为10(基于VLAN 10),路由信息的来源是BGP,下一跳接口信息指向VTEP交换机Leaf-1的NVE1接口绑定的源IP地址;
- 在此阶段,两个VTEP交换机在其L2RIB以及BGP表中都具有了PC1的MAC-IP信息,但是只有本端VTEP交换机Leaf-1才将MAC-IP绑定信息装载进ARP表中;
- 下方展示了Leaf-2上的部分MAC-IP学习过程;
Leaf-2# sh system internal l2rib event-history mac-ip
L2RIB MAC-IP Object Event Logs:
[10/12/20 14:25:33.711 CST 1 29679] Rcvd MAC-IP ROUTE BASE msg: obj_type: 13 oper_type: 1 oper_sbtype: 0 producer: 5
[10/12/20 14:25:33.711 CST 2 29679] Rcvd MAC-IP ROUTE msg: (10, 0050.7966.6806, 172.16.1.1), l2 vni 0, l3 vni 0,
[10/12/20 14:25:33.711 CST 3 29679] Rcvd MAC-IP ROUTE msg: flags , admin_dist 0, seq 0, soo 0, peerid 0,
[10/12/20 14:25:33.711 CST 4 29679] Rcvd MAC-IP ROUTE msg: res 0, esi (F), ifindex 0, nh_count 1, pc-ifindex 0
[10/12/20 14:25:33.711 CST 5 29679] NH: 3.3.3.3
[10/12/20 14:25:33.713 CST 6 29679] (10,0050.7966.6806,172.16.1.1):MAC-IP entry created
[10/12/20 14:25:33.713 CST 7 29679] (10,0050.7966.6806,172.16.1.1,5):MAC-IP route created with flags 0, l3 vni 0, seq 0
[10/12/20 14:25:33.713 CST 8 29679] (10,0050.7966.6806,172.16.1.1,5): admin dist 20, soo 0, peerid 0, peer ifindex 0
[10/12/20 14:25:33.714 CST 9 29679] (10,0050.7966.6806,172.16.1.1,5): esi (F), pc-ifindex 0
[10/12/20 14:25:45.795 CST a 29679] Rcvd MAC-IP ROUTE BASE msg: obj_type: 13 oper_type: 1 oper_sbtype: 0 producer: 12
[10/12/20 14:25:45.795 CST b 29679] Rcvd MAC-IP ROUTE msg: (10, 0050.7966.6808, 172.16.1.3), l2 vni 0, l3 vni 13960,
[10/12/20 14:25:45.795 CST c 29679] Rcvd MAC-IP ROUTE msg: flags , admin_dist 7, seq 0, soo 0, peerid 0,
[10/12/20 14:25:45.795 CST d 29679] Rcvd MAC-IP ROUTE msg: res 0, esi (F), ifindex 0, nh_count 0, pc-ifindex 0
[10/12/20 14:25:45.795 CST e 29679] (10,0050.7966.6808,172.16.1.3):MAC-IP entry created
[10/12/20 14:25:45.795 CST f 29679] (10,0050.7966.6808,172.16.1.3,12):MAC-IP route created with flags 0, l3 vni 13960, s
eq 0
[10/12/20 14:25:45.795 CST 10 29679] (10,0050.7966.6808,172.16.1.3,12): admin dist 7, soo 0, peerid 0, peer ifindex 0
[10/12/20 14:25:45.795 CST 11 29679] (10,0050.7966.6808,172.16.1.3,12): esi (F), pc-ifindex 0
[10/12/20 14:25:45.800 CST 12 29679] (10,0050.7966.6808,172.16.1.3,12):Encoding MAC-IP best route (ADD, client id 5), es
- 下方表明了L2RIB中的MAC-IP信息是由BGP产生的;
Leaf-2# show l2route mac-ip topology 10 detail
Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv(D):Del Pending (S):Stale (C):Clear
(Ps):Peer Sync (Ro):Re-Originated
Topology Mac Address Prod Flags Seq No Host IP Next-Hops
----------- -------------- ------ ---------- --------------- ---------------
10 0050.7966.6806 BGP -- 0 172.16.1.1 3.3.3.3
Sent To: ARP
- 经过以上阶段,两个VTEP交换机都具有了PC1的MAC-IP信息。
3.2 ARP抑制
- 章节3.1说明了如何在BGP EVPN VxLAN Fabric中传播MAC-IP地址信息。本节介绍了VTEP交换机的ARP抑制机制如何利用MAC-IP绑定信息来减少VxLAN Fabric中不必要的2层BUM(广播、未知单播、组播)流量。
3.2.1 配置Leaf交换机:启用ARP抑制
Leaf-1配置:
interface nve1
member vni 10010
suppress-arp
member vni 10020
suppress-arp
Leaf-2配置:
interface nve1
member vni 10010
suppress-arp
member vni 10020
suppress-arp
Leaf-3配置:
interface nve1
member vni 10010
suppress-arp
member vni 10020
suppress-arp
3.2.2 查看ARP抑制缓存
- 从启动PC1的阶段开始,当PC1开机后,PC1将GARP/ARP消息发送到网络,Leaf-1将MAC-IP绑定信息安装载进VRF ta的ARP表中,下方展示了Leaf-1的ARP表;
Leaf-1# show ip arp vrf ta
Flags: * - Adjacencies learnt on non-active FHRP router
+ - Adjacencies synced via CFSoE
# - Adjacencies Throttled for Glean
CP - Added via L2RIB, Control plane Adjacencies
PS - Added via L2RIB, Peer Sync
RO - Re-Originated Peer Sync Entry
D - Static Adjacencies attached to down interface
IP ARP Table for context ta
Total number of entries: 1
Address Age MAC Address Interface Flags
172.16.1.1 00:01:03 0050.7966.6806 Vlan10
- 当在本端VTEP交换机上启用基于VNI的ARP抑制时,MAC-IP地址绑定信息也会从ARP表装载进本地ARP抑制缓存中,下方展示了Leaf-1的ARP抑制缓存表;
Leaf-1# show ip arp suppression-cache detail
Flags: + - Adjacencies synced via CFSoE
L - Local Adjacency
R - Remote Adjacency
L2 - Learnt over L2 interface
PS - Added via L2RIB, Peer Sync
RO - Dervied from L2RIB Peer Sync Entry
Ip Address Age Mac Address Vlan Physical-ifindex Flags Remote
Vtep Addrs
172.16.1.1 00:03:55 0050.7966.6806 10 Ethernet1/3 L
- 在远端VTEP交换机(Leaf-2)上启用ARP抑制后,ARP抑制缓存信息将从L2RIB中获取。下方展示了Leaf-2上关于PC1的ARP抑制缓存表;
Leaf-2# show ip arp suppression-cache detail
Flags: + - Adjacencies synced via CFSoE
L - Local Adjacency
R - Remote Adjacency
L2 - Learnt over L2 interface
PS - Added via L2RIB, Peer Sync
RO - Dervied from L2RIB Peer Sync Entry
Ip Address Age Mac Address Vlan Physical-ifindex Flags Remote Vtep Addrs
172.16.1.1 05:01:11 0050.7966.6806 10 (null) R 3.3.3.3
3.2.3 抑制场景对比:
- 无抑制:当收到ARP请求报文时,本地所有ARP请求都发往VNI所关联的组播组,并且所有加入该组播组的VTEP交换机都会接收ARP请求消息,并将其转发到数据包VxLAN包头中VNI ID所定义的广播域的端口;
- ARP抑制:当收到ARP请求报文时,本地VTEP交换机检查请求的MAC-IP绑定信息是否存储在本地ARP抑制缓存中。如果检查通过,则本地交换机直接将ARP回复发送给请求者,而不会将ARP请求泛洪到网络中。如果ARP抑制缓存检查未命中,则将ARP请求泛洪到网络中(建议在Intra-VNI访问可达性测试之通过后再启用ARP抑制);
- ARP和未知单播抑制:在命中ARP抑制检查的情况下,其工作原理与ARP抑制相同。但是如果未命中,则会丢弃ARP请求,所以此特性要求VxLAN Fabric中不能有静默主机。
3.3 主机路由通告:VNI间路由(L3VNI)
上篇和本篇前半部分介绍了终端主机的MAC和MAC-IP信息如何在VxLAN Fabirc中传播以及如何利用这些信息实现VNI内交换和MAC地址解析,也介绍了利用ARP抑制机制减少BUM流量。本节将说明如何将主机路由导入L3RIB,以及如何利用此信息实现VNI间路由。
3.3.1 本端VTEP RIB中的主机路由
- 章节3.1介绍了本地VTEP交换机如何将MAC-IP地址绑定信息装载进ARP表中,以及HMM(主机移动管理器)组件如何将信息装载进L2RIB中。除了此过程之外,HMM组件还会将ARP表中的MAC-IP信息装载进L3RIB中;
- 下方展示了本地VTEP交换机Leaf-1中的VRF ta的RIB。该路由是从VLAN 10中获悉的,并由HMM装载进RIB中;
Leaf-1# show ip route 172.16.1.1 vrf ta
IP Route Table for VRF "ta"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
172.16.1.1/32, ubest/mbest: 1/0, attached
*via 172.16.1.1, Vlan10, [190/0], 1d05h, hmm
3.3.2 本端VTEP上BGP进程中的主机路由
- 章节3.1还介绍了如何将MAC-IP信息从L2RIB发送到Loc-RIB,再从Loc-RIB发送到Adj-RIB-Out,然后将其通告为BGP EVPN Route type 2,发送至到远端VTEP交换机;
- 下方展示了与PC1的IP地址相关的BGP Loc-RIB;
Leaf-1# show bgp l2vpn evpn 172.16.1.1
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 3.3.3.3:32777 (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/
272, version 969
Paths: (1 available, best #1)
Flags: (0x000102) (high32 00000000) on xmit-list, is not in l2rib/evpn
Advertised path-id 1
Path type: local, path is valid, is best path
AS-Path: NONE, path locally originated
3.3.3.3 (metric 0) from 0.0.0.0 (3.3.3.3)
Origin IGP, MED not set, localpref 100, weight 32768
Received label 10010 13960
Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0
007
Path-id 1 advertised to peers:
1.1.1.1 2.2.2.2
3.3.3 远端VTEP上BGP进程中的主机路由
- 章节3.1没有说明MAC-IP路由信息如何最终进入远端VTEP交换机的L3RIB;
- 有关PC1 MAC-IP NLRI的BGP EVPN Route Type 2更新还包含了RT 65234:13960(L3VNI);
- 接收到的NLRI信息通过BGP的Import Policy Engine(基于RT 65234:13960导入)发送,最终将L3VNI条目发送到Loc-RIB;
- 在Input Policy处理期间,原始RD 3.3.3.3:32777更改为VRF ta特定的RD 4.4.4.4:3:3(3 = VRF ta的VRF ID),RD用于在不同的VRF中的区分重叠的IP地址;
- 下方展示了Leaf-2的BGP表,可以看到上方描述的所有详细信息(其中包含了原始的信息、修改RD后的信息、L3VNI信息等);
Leaf-2# show bgp l2vpn evpn 172.16.1.1
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 3.3.3.3:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/
272, version 801
Paths: (2 available, best #2)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not i
n HW
Path type: internal, path is valid, not best reason: Neighbor Address
AS-Path: NONE, path sourced internal to AS
3.3.3.3 (metric 81) from 2.2.2.2 (2.2.2.2)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10010 13960
Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0
007
Originator: 3.3.3.3 Cluster list: 2.2.2.2
Advertised path-id 1
Path type: internal, path is valid, is best path
Imported to 3 destination(s)
AS-Path: NONE, path sourced internal to AS
3.3.3.3 (metric 81) from 1.1.1.1 (1.1.1.1)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10010 13960
Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0
007
Originator: 3.3.3.3 Cluster list: 1.1.1.1
Path-id 1 not advertised to any peer
Route Distinguisher: 4.4.4.4:32777 (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/
272, version 824
Paths: (1 available, best #1)
Flags: (0x000212) (high32 00000000) on xmit-list, is in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path, in rib
Imported from 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:
[172.16.1.1]/272
AS-Path: NONE, path sourced internal to AS
3.3.3.3 (metric 81) from 1.1.1.1 (1.1.1.1)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10010 13960
Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0
007
Originator: 3.3.3.3 Cluster list: 1.1.1.1
Path-id 1 not advertised to any peer
Route Distinguisher: 4.4.4.4:3 (L3VNI 13960)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:[172.16.1.1]/
272, version 799
Paths: (1 available, best #1)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not i
n HW
Advertised path-id 1
Path type: internal, path is valid, is best path
Imported from 3.3.3.3:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[32]:
[172.16.1.1]/272
AS-Path: NONE, path sourced internal to AS
3.3.3.3 (metric 81) from 1.1.1.1 (1.1.1.1)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10010 13960
Extcommunity: RT:65234:10010 RT:65234:13960 ENCAP:8 Router MAC:5000.0003.0
007
Originator: 3.3.3.3 Cluster list: 1.1.1.1
Path-id 1 not advertised to any peer
- 下方展示了Leaf-2上的VRF信息,其中包含了VRF ID;
Leaf-2# show vrf
VRF-Name VRF-ID State Reason
default 1 Up --
management 2 Up --
ta 3 Up --
3.3.4 将主机路由装载进远端VTEP的RIB
- 该路由已从BGP Loc-RIB装载进L3 RIB。RIB条目包括有关下一跳地址和隧道ID、封装类型(VxLAN)、网段ID和路由来源(BGP)信息;
- 在此阶段,本端VTEP交换机Leaf-1和远端VTEP交换机Leaf-2都能够将来自不同L2VNI主机的流量(VNI间流量)路由到PC1(属于L2VNI 10010)。
- 下方展示了Leaf-2上VRF ta RIB中有关172.16.1.1/32的路由条目;
Leaf-2# show ip route 172.16.1.1 vrf ta
IP Route Table for VRF "ta"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
172.16.1.1/32, ubest/mbest: 1/0
*via 3.3.3.3%default, [200/0], 1d02h, bgp-65234, internal, tag 65234 (evpn)
segid: 13960 tunnelid: 0x3030303 encap: VXLAN
- 下方展示了BGP递归数据库,其中3.3.3.3用于目标172.16.1.1的下一跳;
Leaf-2# show nve internal bgp rnh database vni 13960
--------------------------------------------
Total peer-vni msgs recvd from bgp: 23
Peer add requests: 14
Peer update requests: 0
Peer delete requests: 9
Peer add/update requests: 14
Peer add ignored (peer exists): 0
Peer update ignored (invalid opc): 0
Peer delete ignored (invalid opc): 0
Peer add/update ignored (malloc error): 0
Peer add/update ignored (vni not cp): 0
Peer delete ignored (vni not cp): 0
--------------------------------------------
Showing BGP RNH Database, size : 5 vni 13960
Flag codes: 0 - ISSU Done/ISSU N/A 1 - ADD_ISSU_PENDING
2 - DEL_ISSU_PENDING 3 - UPD_ISSU_PENDING
VNI Peer-IP Peer-MAC Tunnel-ID Encap (A/S) FlagsP
T
13960 3.3.3.3 5000.0003.0007 0x3030303 vxlan (1/0) 0 F
AB
13960 5.5.5.5 5000.0005.0007 0x5050505 vxlan (1/0) 0 F
AB
- 下方展示了Leaf-2上关于VRF ta的完整路由表;
Leaf-2# show ip route vrf ta
IP Route Table for VRF "ta"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
172.16.1.0/24, ubest/mbest: 1/0, attached
*via 172.16.1.254, Vlan10, [0/0], 1d06h, direct
172.16.1.1/32, ubest/mbest: 1/0
*via 3.3.3.3%default, [200/0], 1d02h, bgp-65234, internal, tag 65234 (evpn)
segid: 13960 tunnelid: 0x3030303 encap: VXLAN
172.16.1.3/32, ubest/mbest: 1/0, attached
*via 172.16.1.3, Vlan10, [190/0], 1d06h, hmm
172.16.1.5/32, ubest/mbest: 1/0
*via 5.5.5.5%default, [200/0], 1d02h, bgp-65234, internal, tag 65234 (evpn)
segid: 13960 tunnelid: 0x5050505 encap: VXLAN
172.16.1.254/32, ubest/mbest: 1/0, attached
*via 172.16.1.254, Vlan10, [0/0], 1d06h, local
172.16.2.0/24, ubest/mbest: 1/0, attached
*via 172.16.2.254, Vlan20, [0/0], 1d06h, direct
172.16.2.2/32, ubest/mbest: 1/0, attached
*via 172.16.2.2, Vlan20, [190/0], 1d06h, hmm
172.16.2.4/32, ubest/mbest: 1/0
*via 5.5.5.5%default, [200/0], 1d02h, bgp-65234, internal, tag 65234 (evpn)
segid: 13960 tunnelid: 0x5050505 encap: VXLAN
172.16.2.254/32, ubest/mbest: 1/0, attached
*via 172.16.2.254, Vlan20, [0/0], 1d06h, local
四、数据层面操作
4.1 ARP抑制过程
- 当PC1开机后,即使我们在VTEP Leaf-1的NVE1接口下启用了ARP抑制,从主机PC1接收到的GARP也会被VxLAN封装并泛洪到组播组239.0.0.1;
- 这是因为VTEP Leaf-1在ARP表和ARP抑制缓存中都没有有关主机PC1的IP/MAC地址信息;
- 从下方VTEP Leaf-1的Debug输出中也可以看到上方关于ARP的描述过程,Leaf-从主机PC1接收GARP,它没有172.16.1.1的缓存条目,因此必须泛洪该帧,然后Leaf-将更新其ARP抑制缓存和L2RIB;
Leaf-1# terminal monitor
Leaf-1# debug ip arp cache
Leaf-1# debug ip arp event
Leaf-1# debug ip arp suppression-event
Leaf-1#
Leaf-1# 2020 Oct 13 20:47:51.940670 arp: arp_process_receive_packet_msg: VINCI: Anycast Proxy mode
2020 Oct 13 20:47:51.940988 arp: arp_process_packet_in_l3_mode: GARP: Vlan: 10, Dest-ip: 172.16.1.1, Mac-Addr: 0050.7966.6806, ifindex: 0x0
2020 Oct 13 20:47:51.941107 arp: arp_cache_resolve_l3_addr: arp_cache_resolve_l3_addr
2020 Oct 13 20:47:51.941173 arp: arp_cache_resolve_l3_addr: mac: 0050.7966.6806, phy-ifindex:0x1a000400, is_local:TRUE
2020 Oct 13 20:47:51.941283 arp: arp_process_receive_packet_msg: GARP count on the interface Vlan10 is 1
2020 Oct 13 20:47:51.941696 arp: arp_process_receive_packet_msg: NO GARP storm on interface Vlan10
2020 Oct 13 20:47:51.941771 arp: arp_process_receive_packet_msg: Existing entry found for source 172.16.1.1 on Vlan10
2020 Oct 13 20:47:51.941839 arp: arp_add_adj: arp_add_adj: Updating MAC on interface Vlan10, phy-interface Ethernet1/3, flags:0x1
2020 Oct 13 20:47:51.941927 arp: arp_adj_update_state_get_action_on_add: Successful action on add Previous State:0x10, Current State:0x10 Received event:Data Plane Add, entry: 172.16.1.1, 0050.7966.6806, Vlan10, action to be taken send_to_am:FALSE, arp_aging:TRUE
2020 Oct 13 20:47:51.942079 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Create request for sw-bd: 10, mac: 0050.7966.6806 ip: 172.16.1.1, uuid: 268, vlan_mode: 2, ifindex: 0x901000a, phyifindex 0x1a000400
2020 Oct 13 20:47:51.942191 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Post L2FM lookup MAC binding : for sw-bd: 10, mac: 0050.7966.6806 ip: 172.16.1.1, uuid: 268, vlan_mode: 2, ifindex: 0x901000a, phyifindex 0x1a000400
2020 Oct 13 20:47:51.942251 arp: arp_cache_create_cache_node: create node for uuid:268, sw-bd:10, ip:172.16.1.1, mac:0050.7966.6806, mode:2, flags:0x10 is_timer: 0
2020 Oct 13 20:47:51.942396 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Entry with same ip/vlan exists
2020 Oct 13 20:47:51.942472 arp: arp_add_adj: Entry added for 172.16.1.1, 0050.7966.6806, state 2 on interface Vlan10, physical interface Ethernet1/3, ismct 0. flags:0x10, Rearp (interval: 0, count: 0), TTL: 1500 seconds update_shm:TRUE
2020 Oct 13 20:47:51.942541 arp: arp_add_adj: Adj info: iod: 139, phy-iod: 9, ip: 172.16.1.1, mac: 0050.7966.6806, type: 0, sync: FALSE, suppress-mode: L2/L3 ARP Suppression flags:0x10
2020 Oct 13 20:47:51.942595 arp: arp_process_receive_packet_msg: VINCI: enhanced_proxy: 0, traditional_proxy: 1, adj_added: 0
2020 Oct 13 20:47:51.943681 arp: arp_cache_create_cache_node: create node for uuid:268, sw-bd:10, ip:172.16.1.1, mac:0050.7966.6806, mode:2, flags:0x10 is_timer: 0
2020 Oct 13 20:47:51.944623 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Entry with same ip/vlan exists
2020 Oct 13 20:47:51.944702 arp: arp_add_adj: Entry added for 172.16.1.1, 0050.7966.6806, state 2 on interface Vlan10, physical interface Ethernet1/3, ismct 0. flags:0x10, Rearp (interval: 0, count: 0), TTL: 1500 seconds update_shm:TRUE
2020 Oct 13 20:47:51.945113 arp: arp_add_adj: Adj info: iod: 139, phy-iod: 9, ip: 172.16.1.1, mac: 0050.7966.6806, type: 0, sync: FALSE, suppress-mode: L2/L3 ARP Suppression flags:0x10
2020 Oct 13 20:47:51.945239 arp: arp_process_receive_packet_msg: Received ARP request on Vlan10 (Ethernet1/3)
2020 Oct 13 20:47:51.945375 arp: arp_process_receive_packet_msg: Gratuitous ARP request received on Vlan10 (Ethernet1/3).Proxy or Anycast Gateway enabled on Vlan10.Dropping the packet
- 下方展示了Leaf-2上的Debug ARP中关于PC1的输出;
Leaf-2# terminal monitor
Leaf-2# debug ip arp cache
Leaf-2# debug ip arp event
Leaf-2# debug ip arp suppression-event
Leaf-2#
2020 Oct 13 20:55:25.960139 arp: arp_l2rib_msg_cb: arp_l2rib_msg_cb: (Type: Route) Len: 184 Seq: 0, del: 0 (Prod: 5) , peer-id = 0
2020 Oct 13 20:55:25.960255 arp: arp_l2rib_msg_cb: MAC address: 0050.7966.6806 Remote Host IP: 172.16.1.1
2020 Oct 13 20:55:25.960564 arp: arp_l2rib_msg_cb: Host IP 172.16.1.1, Remote vtep addr count = 1
2020 Oct 13 20:55:25.960647 arp: arp_l2rib_msg_cb: RNHs : 3.3.3.3
2020 Oct 13 20:55:25.960752 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Create request for sw-bd: 10, mac: 0050.7966.6806 ip: 172.16.1.1, uuid: 1290, vlan_mode: 2, ifindex: 0x0, phyifindex 0x0
2020 Oct 13 20:55:25.960893 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Failed to get phy_iod for ifindex 0x0 : Reason no such pss key
2020 Oct 13 20:55:25.960964 arp: arp_cache_add_entry_to_cache_and_upd_l2rib: Post L2FM lookup MAC binding : for sw-bd: 10, mac: 0050.7966.6806 ip: 172.16.1.1, uuid: 1290, vlan_mode: 2, ifindex: 0x0, phyifindex 0x0
2020 Oct 13 20:55:25.961034 arp: arp_cache_create_cache_node: create node for uuid:1290, sw-bd:10, ip:172.16.1.1, mac:0050.7966.6806, mode:2, flags:0x0 is_timer: 0
2020 Oct 13 20:55:25.961282 arp: arp_cache_create_cache_node: Host IP 172.16.1.1, Remote vtep addr count = 1
2020 Oct 13 20:55:25.961349 arp: arp_cache_create_cache_node: RNHs : 3.3.3.3
2020 Oct 13 20:55:25.961622 arp: arp_cache_create_cache_node: New entry: create node 0x6c13ea74 0x6c13ee1c, uuid: 1290, sw-bd: 10, ip:172.16.1.1, mac: 0050.7966.6806, is_local: FALSE, num-macs: 1
- 下方展示了Leaf-1的ARP缓存抑制表;
Leaf-1# show ip arp suppression-cache detail
Flags: + - Adjacencies synced via CFSoE
L - Local Adjacency
R - Remote Adjacency
L2 - Learnt over L2 interface
PS - Added via L2RIB, Peer Sync
RO - Dervied from L2RIB Peer Sync Entry
Ip Address Age Mac Address Vlan Physical-ifindex Flags Remote
Vtep Addrs
172.16.1.1 00:03:44 0050.7966.6806 10 Ethernet1/3 L
- 下方展示了Leaf-2的ARP缓存抑制表;
Leaf-2# show ip arp suppression-cache detail
Flags: + - Adjacencies synced via CFSoE
L - Local Adjacency
R - Remote Adjacency
L2 - Learnt over L2 interface
PS - Added via L2RIB, Peer Sync
RO - Dervied from L2RIB Peer Sync Entry
Ip Address Age Mac Address Vlan Physical-ifindex Flags Remote
Vtep Addrs
172.16.1.1 00:03:01 0050.7966.6806 10 (null) R 3.3.3.
3
4.2 ARP抑制验证
- 在PC3(172.16.1.3)上ping PC1(172.16.1.1)
PC3> ping 172.16.1.1
84 bytes from 172.16.1.1 icmp_seq=1 ttl=64 time=58.651 ms
84 bytes from 172.16.1.1 icmp_seq=2 ttl=64 time=52.082 ms
84 bytes from 172.16.1.1 icmp_seq=3 ttl=64 time=54.362 ms
84 bytes from 172.16.1.1 icmp_seq=4 ttl=64 time=67.275 ms
84 bytes from 172.16.1.1 icmp_seq=5 ttl=64 time=50.352 ms
- 这时本地VTEP Leaf-2能够应答ARP请求消息,因为它具有存储在ARP抑制缓存中的信息。 因此,当主机首次加入网络时,它会发送一条GARP消息,以确保分配给它的IP地址是唯一的;
- 由于ARP表或ARP抑制高速缓存都没有关于要求的IP-mac绑定的条目,因此该消息将泛洪到其他VTEP叶子交换机。但在这些表完成更新后,下次主机间通讯时无需再进行ARP请求泛洪;
- 下方展示了Leaf-2发送ARP回复消息的过程;
Leaf-2# 2020 Oct 13 21:02:00.100412 arp: arp_process_receive_packet_msg: VINCI: Anycast Proxy mode
2020 Oct 13 21:02:00.100797 arp: arp_cache_resolve_l3_addr: arp_cache_resolve_l3_addr
2020 Oct 13 21:02:00.101111 arp: arp_cache_resolve_l3_addr: mac: 0050.7966.6806, phy-ifindex:0x0, is_local:FALSE
2020 Oct 13 21:02:00.101405 arp: arp_process_packet_in_l3_mode: ARP request: iod: 139, Vlan: 10, Dest-ip: 172.16.1.1, Mac-Addr: 0050.7966.6806, ifindex: 0x0, is_local: FALSE
2020 Oct 13 21:02:00.101802 arp: arp_send_response_internal: ARP response from 172.16.1.1 to 172.16.1.3 on Vlan10, phy iod Ethernet1/4, vlan 10, svi_flag: 1
2020 Oct 13 21:02:00.101867 arp: arp_send_response_internal: arp_send_response_internal: VINCI: is_flood: 0, iod: 139 phyiod: 10
2020 Oct 13 21:02:00.101953 arp: arp_send_packet: Packet for 0050.7966.6808/172.16.1.3, iod 139(Vlan10), phy_iod 10(Ethernet1/4), phy_is_mct 0, flood_bd 0, flood port 1, skip_unnumbered_flood 0
4.3 同VRF,不同VNI下的主机互通
- 关于同VNI下主机互通已在上篇展示,本篇不再展示;
- 本节以PC1(172.16.1.1) ping PC2(172.162.2.)为例。
4.3.1 Leaf-1的VNI内交换
-
因为目标IP地址在另一个子网中,所以PC1使用Anycast Gateway MAC(AGM) 1234.1234.1234作为目标MAC地址,PC1向其默认网关Leaf-1发送ICMP请求消息,可参考下图;
4.3.2 Leaf-1上将数据包从L2VNI 10010路由到L3VNI 13960
- 本地VTEP交换机Leaf-1接收帧。目标IP地址172.16.2.2(主机PC3)是通过BGP学习的,并与下一跳IP地址4.4.4.4(Leaf-2)一起装载进RIB中,并在数据平面中也封装了其他信息,例如L3VNI和封装类型;
- Leaf-1对下一跳地址进行递归路由查找,封装原始数据包并加上包含VNI ID(13960)的VxLAN包头,并通过Spine-1和Spine2将数据包路由到Leaf-2(外层MAC地址属于Spine-1和Spine-2);
- 因为VxLAN属于MAC in UDP封装类型,所以必须有内层源MAC地址和目标MAC地址。内层源MAC地址是从Inter-VNI路由中使用的SVI(SVI VLAN 3960)中获取的,内层目标地址是BGP扩展团体通过BGP更新接收到的RMAC。
4.3.3 Leaf-2上将数据包从L3VNI 13960路由到L2VNI 10020
当VTEP交换机Leaf-2收到VxLAN封装的数据包时,它将拆掉VxLAN包头。由于VNI 13960已关联到VRF ta,因此路由决策基于VRF ta的RIB;
Leaf-2将原始ICMP请求路由到VLAN 20,并通过接口E1/3转发出去;
以上过程描述了对称式集成路由与桥接(IRB)模型,其中数据包首先由本地VTEP交换,然后通过使用VxLAN包头中的公用L3VNI在VxLAN Fabric中进行路由。接收方VTEP交换机收到数据包后拆掉VxLAN封装,并根据原始IP数据包的目标IP地址做出路由决策。在路由选择决定之后,数据包被转发到目的地(bridge-route-route-bridge),数据包回程遵循相同的模型;
使用对称式IRB提供了设计上的灵活性,因为与非对称式IRB不同,无需将所有VNI配置到所有的VTEP交换机。非对称式IRB基于"bridge-route-bridge"模型,其中没有公用的L3VNI用于VNI间路由。例如:如果我们在VxLAN Fabric中使用非对称式IRB,则主机PC1会将数据包发送至默认网关(bridge部分),就像在对称式IRB中一样。本地VTEP交换机Leaf-1做出路由决策,但不是使用的公用L3VNI,而是使用VxLAN包头中的VNI 10020,该包头关联到VLAN 20(VNI 10020关联的VLAN),这是“route”部分。接收方VTEP交换机Leaf-2收到数据包后拆掉VxLAN包头,并基于VxLAN 10020将数据包转发至VLAN 20,最终到达主机PC3。
-
测试PC1 ping PC3,并在Spine与Leaf之间抓包,下方展示了抓包结果;
- 以上说明了如何在VxLAN Fabric中传播主机的IP地址以及如何将其装载进L3RIB中。
五、总结
六、引用参考
膜拜大佬:Toni Pasanen
https://nwktimes.blogspot.com/2018/05/vxlan-part-vii-vxlan-bgp-evpn-control.html