H3C Data Center Switches M-LAG Configuration Guide-6W100

HomeSupportResource CenterConfigure & DeployConfiguration GuidesH3C Data Center Switches M-LAG Configuration Guide-6W100
06-M-LAG+RDMA Configuration Example
Title Size Download
06-M-LAG+RDMA Configuration Example 488.82 KB

Contents

Example: Deploying M-LAG and RDMA·· 1

Network configuration· 1

Connectivity models· 4

Applicable product matrix· 4

Restrictions and guidelines· 5

Configuring S6850 switches as leaf nodes· 5

Procedure summary· 6

Configuring global RoCE settings· 6

Configuring the links towards the spine tier 15

Configuring RoCE settings for the links towards the spine tier 15

Configuring M-LAG·· 18

Configuring RoCE settings for the peer link· 21

Configuring the links towards the servers· 23

Configuring RoCE settings for the links towards the servers· 25

Configuring an underlay BGP instance· 28

Configuring VLAN interfaces and gateways (dual-active gateways) 29

Configuring VLAN interfaces and gateways (VRRP gateways) 30

Configuring spine devices· 30

Spine device configuration tasks at a glance· 30

Configuring global RoCE settings· 31

Configuring the links towards the leaf tier 33

Configuring RoCE settings for the links towards the leaf tier 33

Configuring routing policies to replace the original AS numbers with the local AS number 36

Configuring an underlay BGP instance· 37

Influence of adding or deleting commands on traffic· 38

Traffic model 40

Convergence performance test results· 41

Failure test results· 41

Verifying the configuration· 42

Verification commands· 42

Procedure· 42

Tuning the parameters· 44

Recommended PFC settings· 44

Recommended ECN settings· 45

NIC settings for reference· 46

Guidelines for tuning parameters· 47

Restrictions and guidelines· 47

Identifying whether packets are dropped· 48

Identifying whether the latency meets the requirements· 49

Upgrading the devices· 51

Upgrading a leaf device· 51

Upgrading a spine device· 51

Expanding the network· 52

Adding a leaf device· 52

Replacing hardware· 53

Replacing an interface module· 53

Replacing a switching fabric module· 54


Example: Deploying M-LAG and RDMA

Network configuration

As shown in Figure 1, build a Layer 2 network within the data center to implement traffic forwarding across leaf devices and spine devices.

The following is the network configuration:

·     Deploy the leaf devices as access switches. Use M-LAG to build two pairs of access switches as M-LAG systems. The M-LAG interfaces in an M-LAG system form a multichassis aggregate link for improved link usage. Configure dual-active-gateway or VRRP for the leaf devices to provide connectivity for servers.

·     Deploy the spine devices as aggregation switches. Configure them as route reflectors (RRs) to reflect BGP routes among leaf devices.

·     Configure intelligent lossless network features such as PFC and data buffer to provide zero packet loss for RDMA. In this example, zero packet loss is implemented for packets with 802.1p priority 3.

·     Configure DHCP relay on leaf devices to help servers obtain IP addresses.

Figure 1 Network diagram

 

Device

Interface

IP address

Remarks

Leaf 1

WGE1/0/1

N/A

Member port of an M-LAG interface.

Connected to Server 1.

WGE1/0/2

N/A

Member port of an M-LAG interface.

Connected to Server 2.

HGE1/0/29

N/A

Member port of the peer-link interface.

Connected to HGE 1/0/29 on Leaf 2.

HGE1/0/30

N/A

Member port of the peer-link interface.

Connected to HGE 1/0/30 on Leaf 2.

HGE1/0/31

N/A

Member port of the peer-link interface.

Connected to HGE 1/0/31 on Leaf 2.

HGE1/0/32

N/A

Member port of the peer-link interface.

Connected to HGE 1/0/32 on Leaf 2.

WGE1/0/55

RAGG 1000: 1.1.1.1/30

Keepalive link between M-LAG member devices.

Connected to WGE 1/0/55 on Leaf 2.

WGE1/0/56

Keepalive link between M-LAG member devices.

Connected to WGE 1/0/56 on Leaf 2.

HGE1/0/25

172.16.2.154/30

Connected to WGE 1/1/1 on Spine 1.

HGE1/0/26

172.16.3.154/30

Connected to WGE 1/1/1 on Spine 2.

Loopback1

50.50.255.41/32

N/A

Leaf 2

WGE1/0/1

N/A

Member port of an M-LAG interface.

Connected to Server 1.

WGE1/0/2

N/A

Member port of an M-LAG interface.

Connected to Server 2.

HGE1/0/29

N/A

Member port of the peer-link interface.

Connected to HGE 1/0/29 on Leaf 1.

HGE1/0/30

N/A

Member port of the peer-link interface.

Connected to HGE 1/0/30 on Leaf 1.

HGE1/0/31

N/A

Member port of the peer-link interface.

Connected to HGE 1/0/31 on Leaf 1.

HGE1/0/32

N/A

Member port of the peer-link interface.

Connected to HGE 1/0/32 on Leaf 1.

WGE1/0/55

RAGG 1000: 1.1.1.2/30

Keepalive link between M-LAG member devices.

Connected to WGE 1/0/55 on Leaf 1.

WGE1/0/56

Keepalive link between M-LAG member devices.

Connected to WGE 1/0/56 on Leaf 1.

HGE1/0/25

172.16.2.158/30

Connected to WGE 1/1/2 on Spine 1.

HGE1/0/26

172.16.3.158/30

Connected to WGE 1/1/2 on Spine 2.

Loopback1

50.50.255.42/32

N/A

Leaf 3

WGE1/0/1

N/A

Interface connected to a single-homed device.

Connected Server 3.

WGE1/0/2

N/A

Interface connected to a single-homed device.

Connected Server 4.

HGE1/0/29

N/A

Member port of the peer-link interface.

Connected to HGE 1/0/29 on Leaf 4.

HGE1/0/30

N/A

Member port of the peer-link interface.

Connected to HGE 1/0/30 on Leaf 4.

HGE1/0/31

N/A

Member port of the peer-link interface.

Connected to HGE 1/0/31 on Leaf 4.

HGE1/0/32

N/A

Member port of the peer-link interface.

Connected to HGE 1/0/32 on Leaf 4.

WGE1/0/55

RAGG 1000: 1.1.1.1/30

Keepalive link between M-LAG member devices.

Connected to WGE 1/0/55 on Leaf 1.

WGE1/0/56

Keepalive link between M-LAG member devices.

Connected to WGE 1/0/56 on Leaf 1.

HGE1/0/25

172.16.2.82/30

Connected to WGE 1/1/2 on Spine 1.

HGE1/0/26

172.16.3.82/30

Connected to WGE 1/1/2 on Spine 2.

Loopback1

50.50.255.23/32

N/A

Leaf 4

WGE1/0/1

N/A

Interface connected to a single-homed device.

Connected Server 3.

WGE1/0/2

N/A

Interface connected to a single-homed device.

Connected Server 4.

HGE1/0/29

N/A

Member port of the peer-link interface.

Connected to HGE 1/0/29 on Leaf 3.

HGE1/0/30

N/A

Member port of the peer-link interface.

Connected to HGE 1/0/30 on Leaf 3.

HGE1/0/31

N/A

Member port of the peer-link interface.

Connected to HGE 1/0/31 on Leaf 3.

HGE1/0/32

N/A

Member port of the peer-link interface.

Connected to HGE 1/0/32 on Leaf 3.

WGE1/0/55

RAGG 1000: 1.1.1.2/30

Keepalive link between M-LAG member devices.

Connected to WGE 1/0/55 on Leaf 1.

WGE1/0/56

Keepalive link between M-LAG member devices.

Connected to WGE 1/0/56 on Leaf 1.

HGE1/0/25

172.16.2.86/30

Connected to WGE 1/1/2 on Spine 1.

HGE1/0/26

172.16.3.86/30

Connected to WGE 1/1/2 on Spine 2.

Loopback1

50.50.255.24/32

N/A

Spine 1

HGE1/1/1

172.16.2.153/30

Connected to HGE 1/0/25 on Leaf 1.

HGE1/1/2

172.16.2.157/30

Connected to HGE 1/0/25 on Leaf 2.

HGE1/1/3

172.16.2.81/30

Connected to HGE 1/0/25 on Leaf 3.

HGE1/1/4

172.16.2.85/30

Connected to HGE 1/0/25 on Leaf 4.

Loopback1

50.50.255.1/32

N/A

Spine 2

HGE1/1/1

172.16.3.153/30

Connected to HGE 1/0/26 on Leaf 1.

HGE1/1/2

172.16.3.157/30

Connected to HGE 1/0/26 on Leaf 2.

HGE1/1/3

172.16.3.81/30

Connected to HGE 1/0/26 on Leaf 3.

HGE1/1/4

172.16.3.85/30

Connected to HGE 1/0/26 on Leaf 4.

Loopback1

50.50.255.2/32

N/A

 

Connectivity models

The following are the types of connectivity between servers:

·     Layer 2 connectivity between servers attached to the same M-LAG system at the leaf tier.

·     Layer 3 connectivity between servers attached to the same M-LAG system at the leaf tier.

·     Layer 3 connectivity between servers attached to different M-LAG systems at the leaf tier.

Applicable product matrix

Role

Devices

Software version

Spine

S12500X-AF

This example uses S12500X-AF switches.

Not recommended.

S12500G-AF

Not recommended.

S9820-8C

This example uses S9820-8C switches as spine nodes.

R6710

Leaf

S6800, S6860

Not recommended.

S6812, S6813

Not recommended.

S6805, S6825, S6850, S9850

This example uses S6850 switches as leaf nodes.

R6710

S6890

Not recommended.

S9820-64H

R6710

Server NIC

Mellanox ConnectX-6 Lx

·     Driver version: MLNX_OFED_LINUX-5.4-3.2.7.2.3-rhel8.4-x86_64

·     Firmware version:

¡     Driver: mlx5_core

¡     Version: 5.4-3.2.7.2.3

¡     firmware-version: 26.31.2006 (MT_0000000531)

Mellanox ConnectX-5

·     Driver version: MLNX_OFED_LINUX-5.4-3.2.7.2.3-rhel8.4-x86_64

·     Firmware version:

¡     Driver: mlx5_core

¡     Version: 5.4-3.2.7.2.3

¡     firmware-version: 16.31.2006 (MT_0000000080)

Mellanox ConnectX-4 Lx

·     Driver version: MLNX_OFED_LINUX-5.4-3.2.7.2.3-rhel8.4-x86_64

·     Firmware version:

¡     Driver: mlx5_core

¡     Version: 5.4-3.2.7.2.3

¡     firmware-version: 14.31.2006 (MT_2420110034)

 

Restrictions and guidelines

Determine the appropriate convergence ratio according to actual business needs. As a best practice, use two or four high-speed interfaces for the peer link to meet the east-west traffic forwarding requirements among servers in the same leaf group. Use other high-speed interfaces as uplink interfaces. Use common-rate interfaces to connect to servers. If the convergence ratio cannot meet requirements, attach less servers to leaf devices.

When you enable PFC for an 802.1p priority, the device will set a default value for each threshold. As a best practice, use the default threshold values. For more information, see "Tuning the parameters" and "Guidelines for tuning parameters."

You can configure WRED by using one of the following approaches:

·     Interface configuration—Configure WRED parameters on an interface and enable WRED. You can configure WRED parameters for only the RoCE queue. You can configure WRED parameters on different interfaces. This approach is recommended.

·     WRED table configuration—Create a WRED table, configure WRED parameters for a queue, and then apply the WRED table to an interface. If you do not configure WRED parameters for a queue, the default values are used for that queue. As a best practice, configure WRED parameters for each queue because the default values are small and might not be applicable.

Configuring S6850 switches as leaf nodes

This example describes the procedure to deploy nodes Leaf 1 and Leaf 2. The same procedure applies to nodes Leaf 3 and Leaf 4 except the single-homed device configuration.

Procedure summary

·     Configuring global RoCE settings

·     Configuring the links towards the spine tier

·     Configuring RoCE settings for the links towards the spine tier

Choose one option as needed

¡     Configuring a WRED table

¡     Configuring WRED on an interface

·     Configuring M-LAG

·     Configuring RoCE settings for the peer link

·     Configuring the links towards the servers

·     Configuring RoCE settings for the links towards the servers

Choose one option as needed

¡     Configuring a WRED table

¡     Configuring WRED on an interface

·     Configuring an underlay BGP instance

·     Configuring VLAN interfaces and gateways. Choose one option as needed:

¡     Configuring VLAN interfaces and gateways (dual-active gateways)

¡     Configuring VLAN interfaces and gateways (VRRP gateways)

Configuring global RoCE settings

Configuring global PFC settings and data buffer settings

Leaf 1

Leaf 2

Description

Remarks

priority-flow-control poolID 0 headroom 131072

priority-flow-control poolID 0 headroom 131072

Set the maximum number of cell resources that can be used in a headroom storage space.

Set the maximum value allowed.

priority-flow-control deadlock cos 3 interval 10

priority-flow-control deadlock cos 3 interval 10

Set the PFC deadlock detection interval for the specified CoS value.

N/A

priority-flow-control deadlock precision high

priority-flow-control deadlock precision high

Set the precision for the PFC deadlock detection timer.

Set the high precision for the PFC deadlock detection timer.

buffer egress cell queue 3 shared ratio 100

buffer egress cell queue 3 shared ratio 100

Set the maximum shared-area ratio to 100% for the RoCE queue.

N/A

buffer egress cell queue 6 shared ratio 100

buffer egress cell queue 6 shared ratio 100

Set the maximum shared-area ratio to 100% for the CNP queue.

N/A

buffer apply

buffer apply

Apply manually configured data buffer settings.

N/A

 

Configuring WRED tables

Leaf 1

Leaf 2

Description

Remarks

qos wred queue table 100G-WRED-Template

qos wred queue table 100G-WRED-Template

Create a queue-based WRED table.

N/A

queue 0 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 0 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 0 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 0 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 0 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 0 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Configure WRED table settings for queue 0.

N/A

queue 1 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 1 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 1 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 1 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 1 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 1 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Configure WRED table settings for queue 1.

N/A

queue 2 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 2 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 2 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 2 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 2 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 2 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Configure WRED table settings for queue 2.

N/A

queue 3 drop-level 0 low-limit 1000 high-limit 2000 discard-probability 20

queue 3 drop-level 1 low-limit 1000 high-limit 2000 discard-probability 20

queue 3 drop-level 2 low-limit 1000 high-limit 2000 discard-probability 20

queue 3 weighting-constant 0

queue 3 ecn

queue 3 drop-level 0 low-limit 1000 high-limit 2000 discard-probability 20

queue 3 drop-level 1 low-limit 1000 high-limit 2000 discard-probability 20

queue 3 drop-level 2 low-limit 1000 high-limit 2000 discard-probability 20

queue 3 weighting-constant 0

queue 3 ecn

Configure WRED table settings for queue 3.

Set small values for the low limit and high limit for the RoCE queue and large values for other queues.

queue 4 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 4 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 4 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 4 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 4 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 4 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Configure WRED table settings for queue 4.

N/A

queue 5 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 5 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 5 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 5 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 5 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 5 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Configure WRED table settings for queue 5.

N/A

queue 6 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 6 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 6 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 6 ecn

queue 6 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 6 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 6 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 6 ecn

Configure WRED table settings for queue 6.

N/A

queue 7 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 7 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 7 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 7 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 7 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 7 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Configure WRED table settings for queue 7.

N/A

qos wred queue table 25G-WRED-Template

qos wred queue table 25G-WRED-Template

Create a queue-based WRED table.

N/A

queue 0 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 0 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 0 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 0 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 0 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 0 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Configure WRED table settings for queue 0.

N/A

queue 1 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 1 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 1 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 1 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 1 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 1 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Configure WRED table settings for queue 1.

N/A

queue 2 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 2 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 2 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 2 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 2 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 2 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Configure WRED table settings for queue 2.

N/A

queue 3 drop-level 0 low-limit 400 high-limit 1625 discard-probability 20

queue 3 drop-level 1 low-limit 400 high-limit 1625 discard-probability 20

queue 3 drop-level 2 low-limit 400 high-limit 1625 discard-probability 20

queue 3 weighting-constant 0

queue 3 ecn

queue 3 drop-level 0 low-limit 400 high-limit 1625 discard-probability 20

queue 3 drop-level 1 low-limit 400 high-limit 1625 discard-probability 20

queue 3 drop-level 2 low-limit 400 high-limit 1625 discard-probability 20

queue 3 weighting-constant 0

queue 3 ecn

Configure WRED table settings for queue 3.

Set small values for the low limit and high limit for the RoCE queue and large values for other queues.

queue 4 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 4 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 4 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 4 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 4 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 4 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Configure WRED table settings for queue 4.

N/A

queue 5 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 5 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 5 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 5 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 5 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 5 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Configure WRED table settings for queue 5.

N/A

queue 6 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 6 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 6 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 6 ecn

queue 6 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 6 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 6 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 6 ecn

Configure WRED table settings for queue 6.

N/A

queue 7 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 7 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 7 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 7 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 7 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 7 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Configure WRED table settings for queue 7.

N/A

 

Configuring priority mapping

 

NOTE:

·     To configure PFC on an interface connecting a leaf device to a server, you must configure the interface to trust the 802.1p or DSCP priority in packets. To configure PFC on a Layer 3 interface connecting a leaf device to a spine device, you must configure the interface to trust the DSCP priority in packets.

·     As a best practice, configure outgoing packets of servers to carry the DSCP priority and to be enqueued by DSCP priority. In this case, configure the interface connecting a leaf device to a server to trust the DSCP priority in packets. If the server does not support carrying the DSCP priority, configure the interface connecting a leaf device to a server to trust the 802.1p priority in packets and configure DSCP-to-802.1p mappings for all involved packets.

·     This section describes the procedure to configure DSCP-to-802.1p mappings when an interface is configured to trust 802.1p priority in packets. This section does not need to be configured if an interface is configured to trust the DSCP priority in packets.

 

Leaf 1

Leaf 2

Description

Remarks

qos map-table dot1p-lp

 import 0 export 0

 import 1 export 1

 import 2 export 2

qos map-table dot1p-lp

 import 0 export 0

 import 1 export 1

 import 2 export 2

Configure the dot1p-lp mapping table.

N/A

traffic classifier dot1p0 operator and

traffic classifier dot1p0 operator and

Create a traffic class with the AND operator.

If the downlink interface is configured with the command, the QoS policy does not need to be configured.

if-match service-dot1p 0

if-match service-dot1p 0

Configure an outer 802.1p priority match criterion.

N/A

traffic classifier dot1p1 operator and

traffic classifier dot1p1 operator and

Create a traffic class with the AND operator.

N/A

if-match service-dot1p 1

if-match service-dot1p 1

Configure an outer 802.1p priority match criterion.

N/A

traffic classifier dot1p2 operator and

traffic classifier dot1p2 operator and

Create a traffic class with the AND operator.

N/A

if-match service-dot1p 2

if-match service-dot1p 2

Configure an outer 802.1p priority match criterion.

N/A

traffic classifier dot1p3 operator and

traffic classifier dot1p3 operator and

Create a traffic class with the AND operator.

N/A

if-match service-dot1p 3

if-match service-dot1p 3

Configure an outer 802.1p priority match criterion.

N/A

traffic classifier dot1p4 operator and

traffic classifier dot1p4 operator and

Create a traffic class with the AND operator.

N/A

if-match service-dot1p 4

if-match service-dot1p 4

Configure an outer 802.1p priority match criterion.

N/A

traffic classifier dot1p5 operator and

traffic classifier dot1p5 operator and

Create a traffic class with the AND operator.

N/A

if-match service-dot1p 5

if-match service-dot1p 5

Configure an outer 802.1p priority match criterion.

N/A

traffic classifier dot1p5 operator and

traffic classifier dot1p5 operator and

Create a traffic class with the AND operator.

N/A

if-match service-dot1p 6

if-match service-dot1p 6

Configure an outer 802.1p priority match criterion.

N/A

traffic classifier dot1p7 operator and

traffic classifier dot1p7 operator and

Create a traffic class with the AND operator.

N/A

if-match service-dot1p 7

if-match service-dot1p 7

Configure an outer 802.1p priority match criterion.

N/A

traffic classifier dscp0 operator and

traffic classifier dscp0 operator and

Create a traffic class with the AND operator.

N/A

if-match dscp default

if-match dscp default

Configure an outer 802.1p priority match criterion.

N/A

traffic classifier dscp10 operator and

traffic classifier dscp10 operator and

Create a traffic class with the AND operator.

N/A

if-match dscp af11

if-match dscp af11

Configure a DSCP priority match criterion.

N/A

traffic classifier dscp18 operator and

traffic classifier dscp18 operator and

Create a traffic class with the AND operator.

N/A

if-match dscp af21

if-match dscp af21

Configure a DSCP priority match criterion.

N/A

traffic classifier dscp26 operator and

traffic classifier dscp26 operator and

Create a traffic class with the AND operator.

N/A

if-match dscp af31

if-match dscp af31

Configure a DSCP priority match criterion.

N/A

traffic classifier dscp34 operator and

traffic classifier dscp34 operator and

Create a traffic class with the AND operator.

N/A

if-match dscp af41

if-match dscp af41

Configure a DSCP priority match criterion.

N/A

traffic classifier dscp40 operator and

traffic classifier dscp40 operator and

Create a traffic class with the AND operator.

N/A

if-match dscp cs5

if-match dscp cs5

Configure a DSCP priority match criterion.

N/A

traffic classifier dscp48 operator and

traffic classifier dscp48 operator and

Create a traffic class with the AND operator.

N/A

if-match dscp cs6

if-match dscp cs6

Configure a DSCP priority match criterion.

N/A

traffic classifier dscp56 operator and

traffic classifier dscp56 operator and

Create a traffic class with the AND operator.

N/A

if-match dscp cs7

if-match dscp cs7

Configure a DSCP priority match criterion.

N/A

traffic behavior dot1p0

traffic behavior dot1p0

Create a traffic behavior.

N/A

remark dot1p 0

remark dot1p 0

Configure an 802.1p priority marking action.

N/A

traffic behavior dot1p1

traffic behavior dot1p1

Create a traffic behavior.

N/A

remark dot1p 1

remark dot1p 1

Configure an 802.1p priority marking action.

N/A

traffic behavior dot1p2

traffic behavior dot1p2

Create a traffic behavior.

N/A

remark dot1p 2

remark dot1p 2

Configure an 802.1p priority marking action.

N/A

traffic behavior dot1p3

traffic behavior dot1p3

Create a traffic behavior.

N/A

remark dot1p 3

remark dot1p 3

Configure an 802.1p priority marking action.

N/A

traffic behavior dot1p4

traffic behavior dot1p4

Create a traffic behavior.

N/A

remark dot1p 4

remark dot1p 4

Configure an 802.1p priority marking action.

N/A

traffic behavior dot1p5

traffic behavior dot1p5

Create a traffic behavior.

N/A

remark dot1p 5

remark dot1p 5

Configure an 802.1p priority marking action.

N/A

traffic behavior dot1p5

traffic behavior dot1p5

Create a traffic behavior.

N/A

remark dot1p 6

remark dot1p 6

Configure an 802.1p priority marking action.

N/A

traffic behavior dot1p7

traffic behavior dot1p7

Create a traffic behavior.

N/A

remark dot1p 7

remark dot1p 7

Configure an 802.1p priority marking action.

N/A

traffic behavior dscp-0

traffic behavior dscp-0

Create a traffic behavior.

N/A

remark dscp default

remark dscp default

Configure a DSCP priority marking action.

N/A

traffic behavior dscp-af11

traffic behavior dscp-af11

Create a traffic behavior.

N/A

remark dscp af11

remark dscp af11

Configure a DSCP priority marking action.

N/A

traffic behavior dscp-af21

traffic behavior dscp-af21

Create a traffic behavior.

N/A

remark dscp af21

remark dscp af21

Configure a DSCP priority marking action.

N/A

traffic behavior dscp-af31

traffic behavior dscp-af31

Create a traffic behavior.

N/A

remark dscp af31

remark dscp af31

Configure a DSCP priority marking action.

N/A

traffic behavior dscp-af41

traffic behavior dscp-af41

Create a traffic behavior.

N/A

remark dscp af41

remark dscp af41

Configure a DSCP priority marking action.

N/A

traffic behavior dscp-cs5

traffic behavior dscp-cs5

Create a traffic behavior.

N/A

remark dscp cs5

remark dscp cs5

Configure a DSCP priority marking action.

N/A

traffic behavior dscp-cs6

traffic behavior dscp-cs6

Create a traffic behavior.

N/A

remark dscp cs6

remark dscp cs6

Configure a DSCP priority marking action.

N/A

traffic behavior dscp-cs7

traffic behavior dscp-cs7

Create a traffic behavior.

N/A

remark dscp cs7

remark dscp cs7

Configure a DSCP priority marking action.

N/A

qos policy dot1p-dscp

qos policy dot1p-dscp

Create a generic QoS policy.

N/A

classifier dot1p0 behavior dscp-0

classifier dot1p0 behavior dscp-0

Associate a traffic class with a traffic behavior.

N/A

classifier dot1p3 behavior dscp-af31

classifier dot1p3 behavior dscp-af31

Associate a traffic class with a traffic behavior.

N/A

classifier dot1p1 behavior dscp-af11

classifier dot1p1 behavior dscp-af11

Associate a traffic class with a traffic behavior.

N/A

classifier dot1p2 behavior dscp-af21

classifier dot1p2 behavior dscp-af21

Associate a traffic class with a traffic behavior.

N/A

classifier dot1p4 behavior dscp-af41

classifier dot1p4 behavior dscp-af41

Associate a traffic class with a traffic behavior.

N/A

classifier dot1p5 behavior dscp-cs5

classifier dot1p5 behavior dscp-cs5

Associate a traffic class with a traffic behavior.

N/A

classifier dot1p5 behavior dscp-cs6

classifier dot1p5 behavior dscp-cs6

Associate a traffic class with a traffic behavior.

N/A

classifier dot1p7 behavior dscp-cs7

classifier dot1p7 behavior dscp-cs7

Associate a traffic class with a traffic behavior.

N/A

qos policy dscptodot1p

qos policy dscptodot1p

Create a generic QoS policy.

N/A

classifier dscp0 behavior dot1p0

classifier dscp0 behavior dot1p0

Associate a traffic class with a traffic behavior.

N/A

classifier dscp26 behavior dot1p3

classifier dscp26 behavior dot1p3

Associate a traffic class with a traffic behavior.

N/A

classifier dscp10 behavior dot1p1

classifier dscp10 behavior dot1p1

Associate a traffic class with a traffic behavior.

N/A

classifier dscp18 behavior dot1p2

classifier dscp18 behavior dot1p2

Associate a traffic class with a traffic behavior.

N/A

classifier dscp34 behavior dot1p4

classifier dscp34 behavior dot1p4

Associate a traffic class with a traffic behavior.

N/A

classifier dscp40 behavior dot1p5

classifier dscp40 behavior dot1p5

Associate a traffic class with a traffic behavior.

N/A

classifier dscp48 behavior dot1p5

classifier dscp48 behavior dot1p5

Associate a traffic class with a traffic behavior.

N/A

classifier dscp56 behavior dot1p7

classifier dscp56 behavior dot1p7

Associate a traffic class with a traffic behavior.

N/A

 

Configuring the links towards the spine tier

Leaf 1

Leaf 2

Description

monitor-link group 1

monitor-link group 1

Configure a monitor link group.

interface HundredGigE1/0/25

interface HundredGigE1/0/25

Configure the interface connected to Spine 1.

port link-mode route

port link-mode route

Configure the interface to operate in route mode as a Layer 3 interface.

ip address 172.16.2.154 255.255.255.252

ip address 172.16.2.158  255.255.255.252

Configure an IP address.

port monitor-link group 1 uplink

port monitor-link group 1 uplink

Configure the interface as the uplink interface of the monitor link group.

interface HundredGigE1/0/26

interface HundredGigE1/0/26

Configure the interface connected to Spine 1.

port link-mode route

port link-mode route

Configure the interface to operate in route mode as a Layer 3 interface.

ip address 172.16.3.154 255.255.255.252

ip address 172.16.3.158 255.255.255.252

Configure an IP address.

port monitor-link group 1 uplink

port monitor-link group 1 uplink

Configure the interface as the uplink interface of the monitor link group.

 

Configuring RoCE settings for the links towards the spine tier

Configuring a WRED table

Leaf 1

Leaf 2

Description

Remarks

interface range HundredGigE1/0/25 HundredGigE1/0/26

interface range HundredGigE1/0/25 HundredGigE1/0/26

Configure the interfaces connected to Spine 1.

N/A

priority-flow-control deadlock enable

priority-flow-control deadlock enable

Enable PFC deadlock detection on an interface.

N/A

priority-flow-control enable

priority-flow-control enable

Enable PFC on all Ethernet interfaces.

N/A

priority-flow-control no-drop dot1p 3

priority-flow-control no-drop dot1p 3

Enable PFC for the queue of RoCE packets.

N/A

priority-flow-control dot1p 3 headroom 491

priority-flow-control dot1p 3 headroom 491

Set the headroom buffer threshold to 491 for 802.1p priority 3.

After you enable PFC for the specified 802.1p priority, the device automatically deploys the PFC threshold settings. For more information, see "Recommended PFC settings."

priority-flow-control dot1p 3 reserved-buffer 17

priority-flow-control dot1p 3 reserved-buffer 17

Set the PFC reserved threshold.

priority-flow-control dot1p 3 ingress-buffer dynamic 5

priority-flow-control dot1p 3 ingress-buffer dynamic 5

Set the dynamic back pressure frame triggering threshold.

priority-flow-control dot1p 3 ingress-threshold-offset 12

priority-flow-control dot1p 3 ingress-threshold-offset 12

Set the offset between the back pressure frame stopping threshold and triggering threshold.

qos trust dscp

qos trust dscp

Configure the interface to trust the DSCP priority

N/A

qos wfq byte-count

qos wfq byte-count

Enable byte-count WFQ.

N/A

qos wfq be group 1 byte-count 15

qos wfq be group 1 byte-count 15

Configure the weight for queue 0.

Adjust the weight according to your business needs.

qos wfq af1 group 1 byte-count 2

qos wfq af1 group 1 byte-count 2

Configure the weight for queue 1.

Adjust the weight according to your business needs.

qos wfq af2 group 1 byte-count 2

qos wfq af2 group 1 byte-count 2

Configure the weight for queue 2.

Adjust the weight according to your business needs.

qos wfq af3 group 1 byte-count 60

qos wfq af3 group 1 byte-count 60

Configure the weight for queue 3.

Adjust the weight according to your business needs.

This example configures the weights of the RoCE queue and other queues at a 4:1 ratio.

qos wfq cs6 group sp

qos wfq cs6 group sp

Assign queue 6 to the SP group.

Adjust the setting according to your business needs.

qos wfq cs7 group sp

qos wfq cs7 group sp

Assign queue 7 to the SP group.

Adjust the setting according to your business needs.

qos wred apply 100G-WRED-Template

qos wred apply 100G-WRED-Template

Apply the WRED table to the interface.

N/A

qos gts queue 6 cir 50000000 cbs 16000000

qos gts queue 6 cir 50000000 cbs 16000000

Configure GTS with a CIR of 50 Gbps for the CNP queue.

N/A

 

Configuring WRED on an interface

Leaf 1

Leaf 2

Description

Remarks

interface range HundredGigE1/0/25 HundredGigE1/0/26

interface range HundredGigE1/0/25 HundredGigE1/0/26

Configure the interfaces connected to Spine 1.

N/A

priority-flow-control deadlock enable

priority-flow-control deadlock enable

Enable PFC deadlock detection on an interface.

N/A

priority-flow-control enable

priority-flow-control enable

Enable PFC on all Ethernet interfaces.

N/A

priority-flow-control no-drop dot1p 3

priority-flow-control no-drop dot1p 3

Enable PFC for the queue of RoCE packets.

N/A

priority-flow-control dot1p 3 headroom 491

priority-flow-control dot1p 3 headroom 491

Set the headroom buffer threshold to 491 for 802.1p priority 3.

After you enable PFC for the specified 802.1p priority, the device automatically deploys the PFC threshold settings. For more information, see "Recommended PFC settings."

priority-flow-control dot1p 3 reserved-buffer 17

priority-flow-control dot1p 3 reserved-buffer 17

Set the PFC reserved threshold.

priority-flow-control dot1p 3 ingress-buffer dynamic 5

priority-flow-control dot1p 3 ingress-buffer dynamic 5

Set the dynamic back pressure frame triggering threshold.

priority-flow-control dot1p 3 ingress-threshold-offset 12

priority-flow-control dot1p 3 ingress-threshold-offset 12

Set the offset between the back pressure frame stopping threshold and triggering threshold.

qos trust dscp

qos trust dscp

Configure the interface to trust the DSCP priority

N/A

qos wfq byte-count

qos wfq byte-count

Enable byte-count WFQ.

N/A

qos wfq be group 1 byte-count 15

qos wfq be group 1 byte-count 15

Configure the weight for queue 0.

Adjust the weight according to your business needs.

qos wfq af1 group 1 byte-count 2

qos wfq af1 group 1 byte-count 2

Configure the weight for queue 1.

Adjust the weight according to your business needs.

qos wfq af2 group 1 byte-count 2

qos wfq af2 group 1 byte-count 2

Configure the weight for queue 2.

Adjust the weight according to your business needs.

qos wfq af3 group 1 byte-count 60

qos wfq af3 group 1 byte-count 60

Configure the weight for queue 3.

Adjust the weight according to your business needs.

This example configures the weights of the RoCE queue and other queues at a 4:1 ratio.

qos wfq cs6 group sp

qos wfq cs6 group sp

Assign queue 6 to the SP group.

Adjust the setting according to your business needs.

qos wfq cs7 group sp

qos wfq cs7 group sp

Assign queue 7 to the SP group.

Adjust the setting according to your business needs.

qos wred queue 3 drop-level 0 low-limit 1000 high-limit 2000 discard-probability 20

qos wred queue 3 drop-level 1 low-limit 1300 high-limit 2100 discard-probability 20

qos wred queue 3 drop-level 2 low-limit 1300 high-limit 2100 discard-probability 20

qos wred queue 3 drop-level 0 low-limit 1000 high-limit 2000 discard-probability 20

qos wred queue 3 drop-level 1 low-limit 1300 high-limit 2100 discard-probability 20

qos wred queue 3 drop-level 2 low-limit 1300 high-limit 2100 discard-probability 20

Set the drop-related parameters for the RoCE queue.

Set small values for the low limit and high limit for the RoCE queue and large values for other queues.

qos wred queue 3 weighting-constant 0

qos wred queue 3 weighting-constant 0

Set the WRED exponent for average queue size calculation for the RoCE queue.

N/A

qos wred queue 3 ecn

qos wred queue 3 ecn

Enable ECN for the RoCE queue.

N/A

qos wred queue 6 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Set the drop-related parameters for the CNP queue.

N/A

qos wred queue 6 ecn

qos wred queue 6 ecn

Set the WRED exponent for average queue size calculation for the CNP queue.

N/A

qos gts queue 6 cir 50000000 cbs 16000000

qos gts queue 6 cir 50000000 cbs 16000000

Enable ECN for the CNP queue.

N/A

 

Configuring M-LAG

Leaf 1

Leaf 2

Description

Remarks

interface Bridge-Aggregation1000

interface Bridge-Aggregation1000

Create the Layer 2 aggregate interface to be used as the peer-link interface and enter its interface view.

N/A

link-aggregation mode dynamic

link-aggregation mode dynamic

Configure the aggregate interface to operate in dynamic mode.

N/A

quit

quit

Return to system view.

N/A

interface range HundredGigE1/0/29 to HundredGigE1/0/32

interface range HundredGigE1/0/29 to HundredGigE1/0/32

Enter interface range view.

N/A

port link-aggregation group 1000

port link-aggregation group 1000

Assign the ports to the link aggregation group for the peer-link interface (aggregation group 1000).

N/A

quit

quit

Return to system view.

N/A

interface Bridge-Aggregation1000

interface Bridge-Aggregation1000

Enter the interface view for the port to be used as the peer-link interface.

N/A

port m-lag peer-link 1

port m-lag peer-link 1

Specify the aggregate interface as the peer-link interface.

N/A

quit

quit

Return to system view.

N/A

ip vpn-instance M-LAGKeepalive

ip vpn-instance M-LAGKeepalive

Create a VPN instance used for M-LAG keepalive.

N/A

quit

quit

Return to system view.

N/A

interface Route-Aggregation1000

interface Route-Aggregation1000

Create the Layer 3 aggregate interface to be used as the keepalive interface.

N/A

ip binding vpn-instance M-LAGKeepalive

ip binding vpn-instance M-LAGKeepalive

Associate the interface with the VPN instance.

N/A

ip address 1.1.1.1 255.255.255.252

ip address 1.1.1.2 255.255.255.252

Assign an IP address to the keepalive interface.

N/A

link-aggregation mode dynamic

link-aggregation mode dynamic

Configure the aggregate interface to operate in dynamic mode.

N/A

quit

quit

Return to system view.

N/A

interface Twenty-FiveGigE1/0/55

interface Twenty-FiveGigE1/0/55

Configure the interface as a member port of the keepalive interface.

N/A

port link-mode route

port link-mode route

Configure the interface for keepalive detection to operate in route mode as a Layer 3 interface.

N/A

port link-aggregation group 1000

port link-aggregation group 1000

Assign the port to the link aggregation group for the peer-link interface (aggregation group 1000).

N/A

quit

quit

Return to system view.

N/A

interface Twenty-FiveGigE1/0/56

interface Twenty-FiveGigE1/0/56

Configure the interface as a member port of the keepalive interface.

N/A

port link-mode route

port link-mode route

Configure the interface for keepalive detection to operate in route mode as a Layer 3 interface.

N/A

port link-aggregation group 1000

port link-aggregation group 1000

Assign the port to the link aggregation group for the peer-link interface (aggregation group 1000).

N/A

quit

quit

Return to system view.

N/A

m-lag role priority 50

m-lag role priority 100

Set the M-LAG role priority.

N/A

m-lag system-mac 2001-0000-0018

m-lag system-mac 2001-0000-0018

Configure the M-LAG system MAC address.

You must configure the same M-LAG system MAC address for all M-LAG member devices.

m-lag system-number 1

m-lag system-number 2

Set the M-LAG system number.

You must assign different M-LAG system numbers to the member devices in an M-LAG system.

m-lag system-priority 110

m-lag system-priority 110

(Optional.) Set the M-LAG system priority.

You must set the same M-LAG system priority on the member devices in an M-LAG system.

The lower the value, the higher the priority.

m-lag keepalive ip destination 1.1.1.2 source 1.1.1.1 vpn-instance M-LAGKeepalive

m-lag keepalive ip destination 1.1.1.1 source 1.1.1.2 vpn-instance M-LAGKeepalive

Configure M-LAG keepalive packet parameters.

If the interface of an IP address is not an interface excluded from the MAD shutdown action, configure it as an interface excluded from the MAD shutdown action.

m-lag mad exclude interface LoopBack1

m-lag mad exclude interface LoopBack1

Exclude Loopback1 from the shutdown action by M-LAG MAD.

N/A

m-lag mad exclude interface Route-Aggregation1000

m-lag mad exclude interface Route-Aggregation1000

Exclude Route-Aggregation 1000 from the shutdown action by M-LAG MAD.

N/A

m-lag mad exclude interface Twenty-FiveGigE1/0/55

m-lag mad exclude interface Twenty-FiveGigE1/0/55

Exclude Twenty-FiveGigE 1/0/55 from the shutdown action by M-LAG MAD.

N/A

m-lag mad exclude interface Twenty-FiveGigE1/0/56

m-lag mad exclude interface Twenty-FiveGigE1/0/56

Exclude Twenty-FiveGigE 1/0/56 from the shutdown action by M-LAG MAD.

N/A

m-lag mad exclude interface M-GigabitEthernet0/0/0

m-lag mad exclude interface M-GigabitEthernet0/0/0

Exclude M-GigabitEthernet 0/0/0 from the shutdown action by M-LAG MAD.

N/A

 

Configuring RoCE settings for the peer link

Configuring a WRED table

Leaf 1

Leaf 2

Description

Remarks

interface range HundredGigE1/0/29 to HundredGigE1/0/32

interface range HundredGigE1/0/29 to HundredGigE1/0/32

Enter the view for bulk configuring member ports of the peer-link interface.

N/A

qos trust dscp

qos trust dscp

Configure the interface to trust the DSCP priority.

N/A

qos wfq byte-count

qos wfq byte-count

Enable byte-count WFQ.

N/A

qos wfq be group 1 byte-count 15

qos wfq be group 1 byte-count 15

Configure the weight for queue 0.

Adjust the weight according to your business needs.

qos wfq af1 group 1 byte-count 2

qos wfq af1 group 1 byte-count 2

Configure the weight for queue 1.

Adjust the weight according to your business needs.

qos wfq af2 group 1 byte-count 2

qos wfq af2 group 1 byte-count 2

Configure the weight for queue 2.

Adjust the weight according to your business needs.

qos wfq af3 group 1 byte-count 60

qos wfq af3 group 1 byte-count 60

Configure the weight for queue 3.

Adjust the weight according to your business needs.

This example configures the weights of the RoCE queue and other queues at a 4:1 ratio.

qos wfq cs6 group sp

qos wfq cs6 group sp

Assign queue 6 to the SP group.

Adjust the setting according to your business needs.

qos wfq cs7 group sp

qos wfq cs7 group sp

Assign queue 7 to the SP group.

Adjust the setting according to your business needs.

qos wred apply 100G-WRED-Template

qos wred apply 100G-WRED-Template

Apply the WRED table to the interface.

N/A

qos gts queue 6 cir 50000000 cbs 16000000

qos gts queue 6 cir 50000000 cbs 16000000

Configure GTS with a CIR of 50 Gbps for the CNP queue.

N/A

quit

quit

Return to system view.

N/A

 

Configuring WRED on an interface

Leaf 1

Leaf 2

Description

Remarks

interface range HundredGigE1/0/29 to HundredGigE1/0/32

interface range HundredGigE1/0/29 to HundredGigE1/0/32

Enter the view for bulk configuring member ports of the peer-link interface.

N/A

qos trust dscp

qos trust dscp

Configure the interface to trust the DSCP priority.

N/A

qos wfq byte-count

qos wfq byte-count

Enable byte-count WFQ.

N/A

qos wfq be group 1 byte-count 15

qos wfq be group 1 byte-count 15

Configure the weight for queue 0.

Adjust the weight according to your business needs.

qos wfq af1 group 1 byte-count 2

qos wfq af1 group 1 byte-count 2

Configure the weight for queue 1.

Adjust the weight according to your business needs.

qos wfq af2 group 1 byte-count 2

qos wfq af2 group 1 byte-count 2

Configure the weight for queue 2.

Adjust the weight according to your business needs.

qos wfq af3 group 1 byte-count 60

qos wfq af3 group 1 byte-count 60

Configure the weight for queue 3.

Adjust the weight according to your business needs.

This example configures the weights of the RoCE queue and other queues at a 4:1 ratio.

qos wfq cs6 group sp

qos wfq cs6 group sp

Assign queue 6 to the SP group.

Adjust the setting according to your business needs.

qos wfq cs7 group sp

qos wfq cs7 group sp

Assign queue 7 to the SP group.

Adjust the setting according to your business needs.

qos wred queue 3 drop-level 0 low-limit 1000 high-limit 2000 discard-probability 20

qos wred queue 3 drop-level 1 low-limit 1300 high-limit 2100 discard-probability 20

qos wred queue 3 drop-level 2 low-limit 1300 high-limit 2100 discard-probability 20

qos wred queue 3 drop-level 0 low-limit 1000 high-limit 2000 discard-probability 20

qos wred queue 3 drop-level 1 low-limit 1300 high-limit 2100 discard-probability 20

qos wred queue 3 drop-level 2 low-limit 1300 high-limit 2100 discard-probability 20

Set the drop-related parameters for the RoCE queue.

Set small values for the low limit and high limit for the RoCE queue and large values for other queues.

qos wred queue 3 weighting-constant 0

qos wred queue 3 weighting-constant 0

Set the WRED exponent for average queue size calculation for the RoCE queue.

N/A

qos wred queue 3 ecn

qos wred queue 3 ecn

Enable ECN for the RoCE queue.

N/A

qos wred queue 6 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Set the drop-related parameters for the CNP queue.

N/A

qos wred queue 6 ecn

qos wred queue 6 ecn

Set the WRED exponent for average queue size calculation for the CNP queue.

N/A

qos gts queue 6 cir 50000000 cbs 16000000

qos gts queue 6 cir 50000000 cbs 16000000

Enable ECN for the CNP queue.

N/A

quit

quit

N/A

N/A

 

Configuring the links towards the servers

Configuring M-LAG links

 

NOTE:

This example describes the procedure to configure Twenty-FiveGigE 1/0/1 on Leaf 1 and Leaf 2. The same procedure applies to other interfaces.

 

Leaf 1

Leaf 2

Description

interface bridge-aggregation 100

interface bridge-aggregation 100

Create an aggregate interface to be used as an M-LAG interface.

link-aggregation mode dynamic

link-aggregation mode dynamic

Configure the aggregate interface to operate in dynamic mode.

port m-lag group 1

port m-lag group 1

Assign the aggregate interface to an M-LAG group.

quit

quit

N/A

interface Twenty-FiveGigE1/0/1

interface Twenty-FiveGigE1/0/1

Enter the view of interface connected to servers.

port link-aggregation group 100

port link-aggregation group 100

Add the physical interface to the Layer 2 aggregation group.

broadcast-suppression 1

broadcast-suppression 1

Enable broadcast suppression and set the broadcast suppression threshold.

multicast-suppression 1

multicast-suppression 1

Enable multicast suppression and set the multicast suppression threshold.

stp edged-port

stp edged-port

Configure the interface as an edge port.

port monitor-link group 1 downlink

port monitor-link group 1 downlink

Configure the interface as the downlink interface of a monitor link group.

quit

quit

N/A

interface bridge-aggregation 100

interface bridge-aggregation 100

N/A

port link-type trunk

port link-type trunk

Set the link type of the interface to trunk.

undo port trunk permit vlan 1

undo port trunk permit vlan 1

Remove the trunk port from VLAN 1.

port trunk permit vlan 1100 to 1500

port trunk permit vlan 1100 to 1500

Assign the trunk port to VLANs 1100 through 1500.

quit

quit

N/A

 

Configuring a single link

Leaf 3

Leaf 3

Description

Remarks

interface range Twenty-FiveGigE1/0/1 Twenty-FiveGigE1/0/2

interface range Twenty-FiveGigE1/0/1 Twenty-FiveGigE1/0/2

Configure the interfaces connected to servers.

N/A

port link-type trunk

port link-type trunk

Set the link type of the interface to trunk.

N/A

undo port trunk permit vlan 1

undo port trunk permit vlan 1

Remove the trunk port from VLAN 1.

N/A

port trunk permit vlan 1100 to 1500

port trunk permit vlan 1100 to 1500

Assign the trunk port to VLANs 1100 through 1500.

N/A

broadcast-suppression 1

broadcast-suppression 1

Enable broadcast suppression and set the broadcast suppression threshold.

N/A

multicast-suppression 1

multicast-suppression 1

Enable multicast suppression and set the multicast suppression threshold.

N/A

stp edged-port

stp edged-port

Configure the interface as an edge port.

N/A

port monitor-link group 1 downlink

port monitor-link group 1 downlink

Configure the interface as the downlink interface of a monitor link group.

N/A

quit

quit

N/A

N/A

interface bridge-aggregation 100

interface bridge-aggregation 100

Create a Layer 2 aggregate interface.

Use this step to create an empty M-LAG interface to act as a workaround to the restrictions in the section "Restrictions and guidelines for single-homed servers attached to non-M-LAG interfaces" in M-LAG Network Planning.

link-aggregation mode dynamic

link-aggregation mode dynamic

Configure the aggregate interface to operate in dynamic mode.

N/A

port m-lag group 1

port m-lag group 1

Assign the aggregate interface to an M-LAG group.

N/A

port link-type trunk

port link-type trunk

Set the link type of the interface to trunk.

N/A

undo port trunk permit vlan 1

undo port trunk permit vlan 1

Remove the trunk port from VLAN 1.

N/A

port trunk permit vlan 1100 to 1500

port trunk permit vlan 1100 to 1500

Assign the trunk port to VLANs 1100 through 1500.

N/A

quit

quit

N/A

N/A

 

Configuring RoCE settings for the links towards the servers

Configuring a WRED table

Leaf 1

Leaf 2

Description

Remarks

interface Twenty-FiveGigE1/0/1

interface Twenty-FiveGigE1/0/1

Configure the interface connected to servers.

N/A

priority-flow-control deadlock enable

priority-flow-control deadlock enable

Enable PFC deadlock detection on an interface.

N/A

priority-flow-control enable

priority-flow-control enable

Enable PFC on all Ethernet interfaces.

N/A

priority-flow-control no-drop dot1p 3

priority-flow-control no-drop dot1p 3

Enable PFC for the queue of RoCE packets.

N/A

priority-flow-control dot1p 3 headroom 125

priority-flow-control dot1p 3 headroom 125

Set the headroom buffer threshold to 125 for 802.1p priority 3.

After you enable PFC for the specified 802.1p priority, the device automatically deploys the PFC threshold settings. For more information, see "Recommended PFC settings."

priority-flow-control dot1p 3 reserved-buffer 17

priority-flow-control dot1p 3 reserved-buffer 17

Set the PFC reserved threshold.

priority-flow-control dot1p 3 ingress-buffer dynamic 5

priority-flow-control dot1p 3 ingress-buffer dynamic 5

Set the dynamic back pressure frame triggering threshold.

priority-flow-control dot1p 3 ingress-threshold-offset 12

priority-flow-control dot1p 3 ingress-threshold-offset 12

Set the offset between the back pressure frame stopping threshold and triggering threshold.

qos trust dot1p

qos trust dot1p

Configure the interface to trust the DSCP priority

N/A

qos wfq byte-count

qos wfq byte-count

Enable byte-count WFQ.

N/A

qos wfq be group 1 byte-count 15

qos wfq be group 1 byte-count 15

Configure the weight for queue 0.

Adjust the weight according to your business needs.

qos wfq af1 group 1 byte-count 2

qos wfq af1 group 1 byte-count 2

Configure the weight for queue 1.

Adjust the weight according to your business needs.

qos wfq af2 group 1 byte-count 2

qos wfq af2 group 1 byte-count 2

Configure the weight for queue 2.

Adjust the weight according to your business needs.

qos wfq af3 group 1 byte-count 60

qos wfq af3 group 1 byte-count 60

Configure the weight for queue 3.

Adjust the weight according to your business needs.

This example configures the weights of the RoCE queue and other queues at a 4:1 ratio.

qos wfq cs6 group sp

qos wfq cs6 group sp

Assign queue 6 to the SP group.

Adjust the setting according to your business needs.

qos wfq cs7 group sp

qos wfq cs7 group sp

Assign queue 7 to the SP group.

Adjust the setting according to your business needs.

qos apply policy dot1p-dscp inbound

qos apply policy dot1p-dscp inbound

Configure a QoS policy in the inbound direction to perform dot1p-dscp mapping.

N/A

qos apply policy dscptodot1p outbound

qos apply policy dscptodot1p outbound

Configure a QoS policy in the outbound direction to perform dscp-dot1p mapping.

N/A

qos wred apply 25G-WRED-Template

qos wred apply 25G-WRED-Template

Apply the WRED table to the interface.

N/A

qos gts queue 6 cir 12500000 cbs 16000000

qos gts queue 6 cir 12500000 cbs 16000000

Configure GTS with a CIR of 25 Gbps for the CNP queue.

N/A

 

Configuring WRED on an interface

Leaf 1

Leaf 2

Description

Remarks

interface Twenty-FiveGigE1/0/1

interface Twenty-FiveGigE1/0/1

Configure the interface connected to servers.

N/A

priority-flow-control deadlock enable

priority-flow-control deadlock enable

Enable PFC deadlock detection on an interface.

N/A

priority-flow-control enable

priority-flow-control enable

Enable PFC on all Ethernet interfaces.

N/A

priority-flow-control no-drop dot1p 3

priority-flow-control no-drop dot1p 3

Enable PFC for the queue of RoCE packets.

N/A

priority-flow-control dot1p 3 headroom 125

priority-flow-control dot1p 3 headroom 125

Set the headroom buffer threshold to 125 for 802.1p priority 3.

After you enable PFC for the specified 802.1p priority, the device automatically deploys the PFC threshold settings. For more information, see "Recommended PFC settings."

priority-flow-control dot1p 3 reserved-buffer 17

priority-flow-control dot1p 3 reserved-buffer 17

Set the PFC reserved threshold.

priority-flow-control dot1p 3 ingress-buffer dynamic 5

priority-flow-control dot1p 3 ingress-buffer dynamic 5

Set the dynamic back pressure frame triggering threshold.

priority-flow-control dot1p 3 ingress-threshold-offset 12

priority-flow-control dot1p 3 ingress-threshold-offset 12

Set the offset between the back pressure frame stopping threshold and triggering threshold.

qos trust dscp

qos trust dscp

Configure the interface to trust the DSCP priority.

N/A

qos wfq byte-count

qos wfq byte-count

Enable byte-count WFQ.

N/A

qos wfq be group 1 byte-count 15

qos wfq be group 1 byte-count 15

Configure the weight for queue 0.

Adjust the weight according to your business needs.

qos wfq af1 group 1 byte-count 2

qos wfq af1 group 1 byte-count 2

Configure the weight for queue 1.

Adjust the weight according to your business needs.

qos wfq af2 group 1 byte-count 2

qos wfq af2 group 1 byte-count 2

Configure the weight for queue 2.

Adjust the weight according to your business needs.

qos wfq af3 group 1 byte-count 60

qos wfq af3 group 1 byte-count 60

Configure the weight for queue 3.

Adjust the weight according to your business needs.

This example configures the weights of the RoCE queue and other queues at a 4:1 ratio.

qos wfq cs6 group sp

qos wfq cs6 group sp

Assign queue 6 to the SP group.

Adjust the setting according to your business needs.

qos wfq cs7 group sp

qos wfq cs7 group sp

Assign queue 7 to the SP group.

Adjust the setting according to your business needs.

qos apply policy dot1p-dscp inbound

qos apply policy dot1p-dscp inbound

Configure a QoS policy in the inbound direction to perform dot1p-dscp mapping.

N/A

qos apply policy dscptodot1p outbound

qos apply policy dscptodot1p outbound

Configure a QoS policy in the outbound direction to perform dscp-dot1p mapping.

N/A

qos wred queue 3 drop-level 0 low-limit 400 high-limit 1625 discard-probability 20

qos wred queue 3 drop-level 1 low-limit 400 high-limit 1625 discard-probability 20

qos wred queue 3 drop-level 2 low-limit 400 high-limit 1625 discard-probability 20

qos wred queue 3 drop-level 0 low-limit 400 high-limit 1625 discard-probability 20

qos wred queue 3 drop-level 1 low-limit 400 high-limit 1625 discard-probability 20

qos wred queue 3 drop-level 2 low-limit 400 high-limit 1625 discard-probability 20

Set the drop-related parameters for the RoCE queue.

Set small values for the low limit and high limit for the RoCE queue and large values for other queues.

qos wred queue 3 ecn

qos wred queue 3 ecn

Enable ECN for the RoCE queue.

N/A

qos wred queue 3 weighting-constant 0

qos wred queue 3 weighting-constant 0

Set the WRED exponent for average queue size calculation for the RoCE queue.

N/A

qos wred queue 6 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Set the drop-related parameters for the CNP queue.

N/A

qos wred queue 6 ecn

qos wred queue 6 ecn

Set the WRED exponent for average queue size calculation for the CNP queue.

N/A

qos gts queue 6 cir 12500000 cbs 16000000

qos gts queue 6 cir 12500000 cbs 16000000

Enable ECN for the CNP queue.

N/A

 

Configuring an underlay BGP instance

Leaf 1

Leaf 2

Description

Remarks

bgp 64636

bgp 64636

Enable a BGP instance.

N/A

router-id 50.50.255.41

router-id 50.50.255.42

Specify a unique router ID for the BGP instance on each BGP device.

To run BGP, a BGP instance must have a router ID.

If you do not specify a router ID for the BGP instance on a device, it uses the global router ID.

group Spine external

group Spine external

Create an EBGP peer group.

N/A

peer Spine as-number 64637

peer Spine as-number 64637

Specify an AS number for a peer group.

N/A

peer 172.16.2.153 group Spine

peer 172.16.2.157 group Spine

Add the specified spine device to the peer group.

N/A

peer 172.16.3.153 group Spine

peer 172.16.3.157 group Spine

Add the specified spine device to the peer group.

N/A

address-family ipv4 unicast

address-family ipv4 unicast

Create the BGP IPv4 unicast address family and enter its view.

N/A

balance 32

balance 32

Enable load balancing and set the maximum number of BGP ECMP routes for load balancing.

N/A

network 55.50.138.0 255.255.255.128

network 55.50.138.0 255.255.255.128

Inject a network to the BGP routing table and configure BGP to advertise the network.

N/A

network 55.50.153.128 255.255.255.128

network 55.50.153.128 255.255.255.128

Inject a network to the BGP routing table and configure BGP to advertise the network.

N/A

network 55.50.250.0 255.255.255.192

network 55.50.250.0 255.255.255.192

Inject a network to the BGP routing table and configure BGP to advertise the network.

N/A

network 55.50.255.41 255.255.255.255

network 55.50.255.41 255.255.255.255

Inject a network to the BGP routing table and configure BGP to advertise the network.

N/A

peer Spine enable

peer Spine enable

Enable the device to exchange routes with the peer group.

N/A

 

Configuring VLAN interfaces and gateways (dual-active gateways)

Border 1

Border 2

Description

Remarks

interface Vlan-interface1121

interface Vlan-interface1121

Create a VLAN interface.

N/A

ip address 55.50.138.124 255.255.255.128

ip address 55.50.138.124 255.255.255.128

Assign an IP address to the VLAN interface.

N/A

mac-address 0001-0001-0001

mac-address 0001-0001-0001

Configure a MAC address for the VLAN interface.

N/A

dhcp select relay

dhcp select relay

Enable the DHCP relay agent on the interface.

N/A

dhcp relay server-address 55.50.128.12

dhcp relay server-address 55.50.128.12

Specify DHCP server address 55.50.128.12 on the interface.

N/A

quit

quit

N/A

N/A

m-lag mad exclude interface Vlan-interface 1121

m-lag mad exclude interface Vlan-interface 1121

Exclude VLAN interface 1121 from the shutdown action by M-LAG MAD.

Execute this command on the downlink port.

 

Configuring VLAN interfaces and gateways (VRRP gateways)

Border 1

Border 2

Description

Remarks

interface Vlan-interface1121

interface Vlan-interface1121

Create a VLAN interface.

N/A

ip address 55.50.138.124 255.255.255.128

ip address 55.50.138.123 255.255.255.128

Assign an IP address to the VLAN interface.

N/A

vrrp vrid 1 virtual-ip 55.50.138.125

vrrp vrid 1 virtual-ip 55.50.138.125

Create an IPv4 VRRP group and assign a virtual IP address to it.

N/A

vrrp vrid 1 priority 150

vrrp vrid 1 priority 120

Set the priority of the device in the IPv4 VRRP group.

N/A

dhcp select relay

dhcp select relay

Enable the DHCP relay agent on the interface

N/A

dhcp relay server-address 55.50.128.12

dhcp relay server-address 55.50.128.12

Specify DHCP server address 55.50.128.12 on the interface.

N/A

quit

quit

N/A

N/A

m-lag mad exclude interface Vlan-interface 1121

m-lag mad exclude interface Vlan-interface 1121

Exclude VLAN interface 1121 from the shutdown action by M-LAG MAD

Execute this command on the downlink port.

 

Configuring spine devices

Spine device configuration tasks at a glance

To configure spine devices, perform the following tasks:

·     Configuring global RoCE settings

·     Configuring the links towards the leaf tier

·     Configuring RoCE settings for the links towards the leaf tier

Choose one of the following tasks:

¡     Configuring a WRED table

¡     Configuring WRED on an interface

·     Configuring routing policies to replace the original AS numbers with the local AS number

·     Configuring an underlay BGP instance

Configuring global RoCE settings

Configuring global PFC settings and data buffer settings

Spine 1

Spine 2

Description

Remarks

priority-flow-control poolID 0 headroom 130000

priority-flow-control poolID 0 headroom 130000

Set the maximum number of cell resources that can be used in the headroom storage space.

Specify the maximum value.

priority-flow-control deadlock cos 3 interval 10

priority-flow-control deadlock cos 3 interval 10

Set the PFC deadlock detection interval for the specified CoS value.

N/A

priority-flow-control deadlock precision high

priority-flow-control deadlock precision high

Set the precision for the PFC deadlock detection timer.

Specify the high precision.

buffer egress cell queue 3 shared ratio 100

buffer egress cell queue 3 shared ratio 100

Set the maximum shared-area ratio for the RoCE queue in the egress buffer.

N/A

buffer egress cell queue 6 shared ratio 100

buffer egress cell queue 6 shared ratio 100

Set the maximum shared-area ratio for the CNP queue in the egress buffer.

N/A

buffer apply

buffer apply

Apply manually configured data buffer settings.

N/A

 

Configuring WRED tables

Spine 1

Spine 2

Description

Remarks

qos wred queue table 100G-WRED-Template

qos wred queue table 100G-WRED-Template

Create a WRED table.

N/A

queue 0 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 0 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 0 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 0 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 0 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 0 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Configure WRED table settings for queue 0.

N/A

queue 1 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 1 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 1 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 1 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 1 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 1 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Configure WRED table settings for queue 1.

N/A

queue 2 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 2 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 2 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 2 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 2 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 2 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Configure WRED table settings for queue 2.

N/A

queue 3 drop-level 0 low-limit 1000 high-limit 2000 discard-probability 20

queue 3 drop-level 1 low-limit 1000 high-limit 2000 discard-probability 20

queue 3 drop-level 2 low-limit 1000 high-limit 2000 discard-probability 20

queue 3 weighting-constant 0

queue 3 ecn

queue 3 drop-level 0 low-limit 1000 high-limit 2000 discard-probability 20

queue 3 drop-level 1 low-limit 1000 high-limit 2000 discard-probability 20

queue 3 drop-level 2 low-limit 1000 high-limit 2000 discard-probability 20

queue 3 weighting-constant 0

queue 3 ecn

Configure WRED table settings for queue 3.

Set small values for the low limit and high limit for the RoCE queue and large values for other queues.

queue 4 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 4 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 4 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 4 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 4 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 4 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Configure WRED table settings for queue 4.

N/A

queue 5 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 5 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 5 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 5 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 5 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 5 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Configure WRED table settings for queue 5.

N/A

queue 6 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 6 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 6 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 6 ecn

queue 6 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 6 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 6 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 6 ecn

Configure WRED table settings for queue 6.

N/A

queue 7 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 7 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 7 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

queue 7 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

queue 7 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

queue 7 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Configure WRED table settings for queue 7.

N/A

 

Configuring the links towards the leaf tier

Spine 1

Spine 2

Description

Purpose

interface HundredGigE1/1/1

interface HundredGigE1/1/1

Enter interface view.

Configure a link connected to a leaf device.

port link-mode route

port link-mode route

Change the link mode of an Ethernet interface to the Layer 3 mode.

N/A

ip address 172.16.2.153 255.255.255.252

ip address 172.16.3.153 255.255.255.252

Configure an IP address.

N/A

interface HundredGigE1/1/2

interface HundredGigE1/1/2

Enter interface view.

Configure a link connected to leaf devices.

port link-mode route

port link-mode route

Change the link mode of an Ethernet interface to the Layer 3 mode.

N/A

ip address 172.16.2.157 255.255.255.252

ip address 172.16.3.157 255.255.255.252

Configure an IP address.

N/A

interface HundredGigE1/1/3

interface HundredGigE1/1/3

Enter interface view.

Configure a link connected to leaf devices.

port link-mode route

port link-mode route

Change the link mode of an Ethernet interface to the Layer 3 mode.

N/A

ip address 172.16.2.81 255.255.255.252

ip address 172.16.3.81 255.255.255.252

Configure an IP address.

N/A

interface HundredGigE1/1/4

interface HundredGigE1/1/4

Enter interface view.

Configure a link connected to leaf devices.

port link-mode route

port link-mode route

Change the link mode of an Ethernet interface to the Layer 3 mode.

N/A

ip address 172.16.2.85 255.255.255.252

ip address 172.16.3.85 255.255.255.252

Configure an IP address.

N/A

 

Configuring RoCE settings for the links towards the leaf tier

Configuring a WRED table

Spine 1

Spine 2

Description

Remarks

interface range HundredGigE1/1/1 to HundredGigE1/1/4

interface range HundredGigE1/1/1 to HundredGigE1/1/4

Specify a range of interfaces connected to leaf devices.

N/A

priority-flow-control deadlock enable

priority-flow-control deadlock enable

Enable PFC deadlock detection for the interface range.

N/A

priority-flow-control enable

priority-flow-control enable

Enable PFC on all Ethernet interfaces.

N/A

priority-flow-control no-drop dot1p 3

priority-flow-control no-drop dot1p 3

Enable PFC for the queue of RoCE packets.

N/A

priority-flow-control dot1p 3 headroom 491

priority-flow-control dot1p 3 headroom 491

Set the headroom buffer threshold to 491 for 802.1p priority 3.

After you enable PFC for the specified 802.1p priority, the device automatically deploys the PFC threshold settings. For more information, see "Recommended PFC settings."

priority-flow-control dot1p 3 reserved-buffer 20

priority-flow-control dot1p 3 reserved-buffer 20

Specify the PFC reserved threshold.

priority-flow-control dot1p 3 ingress-buffer dynamic 5

priority-flow-control dot1p 3 ingress-buffer dynamic 5

Set the dynamic back pressure frame triggering threshold.

priority-flow-control dot1p 3 ingress-threshold-offset 12

priority-flow-control dot1p 3 ingress-threshold-offset 12

Set the offset between the back pressure frame stopping threshold and triggering threshold.

qos trust dscp

qos trust dscp

Configure the interface to trust the DSCP priority

N/A

qos wfq byte-count

qos wfq byte-count

Enable byte-count WFQ.

N/A

qos wfq be group 1 byte-count 15

qos wfq be group 1 byte-count 15

Configure the weight for queue 0.

Adjust the weight according to your business needs.

qos wfq af1 group 1 byte-count 2

qos wfq af1 group 1 byte-count 2

Configure the weight for queue 1.

Adjust the weight according to your business needs.

qos wfq af2 group 1 byte-count 2

qos wfq af2 group 1 byte-count 2

Configure the weight for queue 2.

Adjust the weight according to your business needs.

qos wfq af3 group 1 byte-count 60

qos wfq af3 group 1 byte-count 60

Configure the weight for queue 3.

Adjust the weight according to your business needs.

This example configures the weights of the RoCE queue and other queues at a 4:1 ratio.

qos wfq cs6 group sp

qos wfq cs6 group sp

Assign queue 6 to the SP group.

Adjust the weight according to your business needs.

qos wfq cs7 group sp

qos wfq cs7 group sp

Assign queue 7 to the SP group.

Adjust the weight according to your business needs.

qos wred apply 100G-WRED-Template

qos wred apply 100G-WRED-Template

Apply the WRED table to the interface.

N/A

qos gts queue 6 cir 50000000 cbs 16000000

qos gts queue 6 cir 50000000 cbs 16000000

Configure GTS with a CIR of 50 Gbps for the CNP queue.

N/A

 

Configuring WRED on an interface

Spine 1

Spine 2

Description

Remarks

interface range HundredGigE1/1/1 to HundredGigE1/1/4

interface range HundredGigE1/1/1 to HundredGigE1/1/4

Configure the interfaces connected to Spine 1.

N/A

priority-flow-control deadlock enable

priority-flow-control deadlock enable

Enable PFC deadlock detection on an interface.

N/A

priority-flow-control enable

priority-flow-control enable

Enable PFC on all Ethernet interfaces.

N/A

priority-flow-control no-drop dot1p 3

priority-flow-control no-drop dot1p 3

Enable PFC for the queue of RoCE packets.

N/A

priority-flow-control dot1p 3 headroom 491

priority-flow-control dot1p 3 headroom 491

Set the headroom buffer threshold to 491 for 802.1p priority 3.

After you enable PFC for the specified 802.1p priority, the device automatically deploys the PFC threshold settings. For more information, see "Recommended PFC settings."

priority-flow-control dot1p 3 reserved-buffer 20

priority-flow-control dot1p 3 reserved-buffer 20

Set the PFC reserved threshold.

priority-flow-control dot1p 3 ingress-buffer dynamic 5

priority-flow-control dot1p 3 ingress-buffer dynamic 5

Set the dynamic back pressure frame triggering threshold.

priority-flow-control dot1p 3 ingress-threshold-offset 12

priority-flow-control dot1p 3 ingress-threshold-offset 12

Set the offset between the back pressure frame stopping threshold and triggering threshold.

qos trust dscp

qos trust dscp

Configure the interface to trust the DSCP priority.

N/A

qos wfq byte-count

qos wfq byte-count

Enable byte-count WFQ.

N/A

qos wfq be group 1 byte-count 15

qos wfq be group 1 byte-count 15

Configure the weight for queue 0.

Adjust the setting according to your business needs.

qos wfq af1 group 1 byte-count 2

qos wfq af1 group 1 byte-count 2

Configure the weight for queue 1.

Adjust the setting according to your business needs.

qos wfq af2 group 1 byte-count 2

qos wfq af2 group 1 byte-count 2

Configure the weight for queue 2.

Adjust the setting according to your business needs.

qos wfq af3 group 1 byte-count 60

qos wfq af3 group 1 byte-count 60

Configure the weight for queue 3.

Adjust the setting according to your business needs.

This example configures the weights of the RoCE queue and other queues at a 4:1 ratio.

qos wfq cs6 group sp

qos wfq cs6 group sp

Assign queue 6 to the SP group.

Adjust the setting according to your business needs.

qos wfq cs7 group sp

qos wfq cs7 group sp

Assign queue 7 to the SP group.

Adjust the setting according to your business needs.

qos wred queue 3 drop-level 0 low-limit 1000 high-limit 2000 discard-probability 20

qos wred queue 3 drop-level 1 low-limit 1000 high-limit 2000 discard-probability 20

qos wred queue 3 drop-level 2 low-limit 1000 high-limit 2000 discard-probability 20

qos wred queue 3 drop-level 0 low-limit 1000 high-limit 2000 discard-probability 20

qos wred queue 3 drop-level 1 low-limit 1000 high-limit 2000 discard-probability 20

qos wred queue 3 drop-level 2 low-limit 1000 high-limit 2000 discard-probability 20

Set the drop-related parameters for the RoCE queue.

Set small values for the low limit and high limit for the RoCE queue and large values for other queues.

qos wred queue 3 weighting-constant 0

qos wred queue 3 weighting-constant 0

Set the WRED exponent for average queue size calculation for the RoCE queue.

N/A

qos wred queue 3 ecn

qos wred queue 3 ecn

Enable ECN for the RoCE queue.

N/A

qos wred queue 6 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 0 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

qos wred queue 6 drop-level 2 low-limit 37999 high-limit 38000 discard-probability 20

Set the drop-related parameters for the CNP queue.

N/A

qos wred queue 6 ecn

qos wred queue 6 ecn

Enable ECN for the CNP queue.

N/A

qos gts queue 6 cir 50000000 cbs 16000000

qos gts queue 6 cir 50000000 cbs 16000000

Set the GTS parameters for the CNP queue.

N/A

 

Configuring routing policies to replace the original AS numbers with the local AS number

Perform this task to configure routing policies on spine devices to replace the original AS numbers of BGP routes received from leaf devices with the local AS numbers. If you fail to do so, a leaf device cannot learn routers that are advertised by other leaf devices. This is because a device will check the AS_PATH attribute when it receives a route advertised by its BGP neighbor. If the AS_PATH attribute of the route contains the AS number of the device, the device will not learn the route.

To configure a routing policy to replace the original AS numbers of routes received from leaf devices with the local AS number:

 

Spine1

Spine2

Description

Remarks

ip as-path leaf_aspath permit 64636

ip as-path leaf_aspath permit 64636

Configure an AS path list.

N/A

route-policy leaf_aspath_out permit node 10

route-policy leaf_aspath_out permit node 10

Create node 10 in permit mode for routing policy leaf_aspath_out.

N/A

if-match as-path leaf_aspath

if-match as-path leaf_aspath

Match BGP routes whose AS_PATH attribute matches the specified AS path list.

N/A

apply as-path 64637 replace

apply as-path 64637 replace

Set the AS_PATH attribute for BGP routes to replace the original AS numbers.

N/A

route-policy leaf_aspath_out permit node 1000

route-policy leaf_aspath_out permit node 1000

Create node 1000 in permit mode for routing policy leaf_aspath_out.

N/A

 

Configuring an underlay BGP instance

Spine1

Spine2

Description

Remarks

bgp 64637

bgp 64637

Enable a BGP instance.

N/A

router-id 50.50.255.1

router-id 50.50.255.2

Specify a unique router ID for the BGP instance on each BGP device.

To run BGP, a BGP instance must have a router ID.

If you do not specify a router ID for the BGP instance on a device, it uses the global router ID. In this situation, make sure a global router ID is set on the device.

group Leaf external

group Leaf external

Create an IBGP peer group.

N/A

peer Leaf as-number 64636

peer Leaf as-number 64636

Specify an AS number for the peer group.

N/A

peer 172.16.2.154 group Leaf

 peer 172.16.3.154 group Leaf

Add the specified spine device to the peer group.

N/A

peer 172.16.2.158 group Leaf

 peer 172.16.3.158 group Leaf

Add the specified spine device to the peer group.

N/A

peer 172.16.2.82 group Leaf

peer 172.16.3.82 group Leaf

Add the specified spine device to the peer group.

N/A

peer 172.16.2.86 group Leaf

peer 172.16.3.86 group Leaf

Add the specified spine device to the peer group.

N/A

address-family ipv4 unicast

address-family ipv4 unicast

Create the BGP IPv4 unicast address family and enter its view.

N/A

balance 32

balance 32

Enable load balancing and set the maximum number of BGP ECMP routes for load balancing.

N/A

peer Leaf enable

peer Leaf enable

Enable the device to exchange IPv4 unicast routing information with the peer group.

N/A

peer Leaf route-policy leaf_aspath_out export

peer Leaf route-policy leaf_aspath_out export

Apply the routing policy to routes outgoing to the IPv4 peer group.

N/A

 

Influence of adding or deleting commands on traffic

Packet loss might occur if you edit data buffers and PFC settings for spine and leaf devices. For example, packet loss of the S6850 device in R6710 version is as shown in Table 1. Packet loss varies by network scale and configuration.

Table 1 Influence of command adding and deleting on traffic

Command

Description

Traffic downtime

buffer apply

Apply manually configured data buffer settings.

Traffic downtime 480 ms on a single flow and traffic downtime in seconds on the device.

priority-flow-control no-drop dot1p 3

Enable or disable PFC for 802.1p priorities on an Ethernet interface.

Traffic downtime about 30 ms.

priority-flow-control dot1p 5 headroom 125

Edit the headroom setting of PFC.

Traffic downtime within 10 ms.

priority-flow-control dot1p 5 reserved-buffer 17

Edit the reserved-buffer setting of PFC.

Traffic downtime within 10 ms.

priority-flow-control dot1p 5 ingress-buffer dynamic 5

Edit the dynamic back pressure frame triggering threshold of PFC.

Traffic downtime within 10 ms.

priority-flow-control dot1p 5 ingress-threshold-offset 12

Edit the offset between the back pressure frame stopping threshold and triggering threshold.

Traffic downtime within 10 ms.

undo stp edged-port

Configure a port as a non-edge port in interface view.

The MAC entries learned by the port will be deleted and the Layer 2 traffic will be broadcasted.

qos wfq byte-count

Enable byte-count WFQ on the interface.

Traffic downtime about 40 ms.

qos wfq af3 group 1 byte-count 50

Assign queue af3, with scheduling weight 50 to WFQ group 1.

Traffic downtime about 30 ms.

qos map-table

 import 1 export 1

Configure a flexible priority map

No traffic downtime.

qos wred apply 100G-WRED-Template

Apply a WRED table to an interface.

No traffic downtime.

queue 3 drop-level 1 low-limit 37999 high-limit 38000 discard-probability 20

Edit the lower limit, upper limit, and the drop probability for ECN.

No traffic downtime.

queue 5 weighting-constant 0

Edit the WRED exponent for average queue length calculation for a queue.

No traffic downtime.

qos gts queue 3 cir 4000000 cbs 16000000

Configure the GTS parameters.

No traffic downtime.

stp edged-port

Configure a port as an edge port.

No traffic downtime.

priority-flow-control enable

Enable PFC on an interface.

No traffic downtime.

priority-flow-control deadlock enable

Enable PFC deadlock detection on an interface.

No traffic downtime.

buffer threshold alarm

Enable threshold-crossing alarms.

No traffic downtime.

Commands for configuring a QoS policy with accounting action specified.

Configure a QoS policy and specify the accounting action.

No traffic downtime.

Commands for configuring a QoS policy, including specifying the dot1p port priority type and configure a DSCP marking action in a traffic behavior.

Apply a QoS policy to the incoming traffic of an interface.

No traffic downtime.

Commands for configuring a QoS policy, including specifying the dscp port priority type and configuring an 802.1p priority marking action or an inner-to-outer tag priority copying action in a traffic behavior.

Apply a QoS policy to the outgoing traffic of an interface.

No traffic downtime.

priority-flow-control poolID 0 headroom 9928

Set the maximum number of cell resources that can be used in a headroom storage space

No traffic downtime.

dldp enable

Enable DLDP on a port.

No traffic downtime.

unicast-suppression 1

Enable or disable unknown unicast storm suppression.

No traffic downtime.

broadcast-suppression

Enable or disable broadcast suppression

No traffic downtime.

multicast-suppression

Enable or disable multicast storm suppression

No traffic downtime.

stp root-protection

Enable or disable root guard on a port.

No traffic downtime.

gRPC-related settings

Configure global gRPC-related settings.

No traffic downtime.

 

Traffic model

ID

Type

Traffic direction

Traffic path

Simulation method

Load

Description

1

Unicast/L2

East-west traffic between servers attached to the same leaf M-LAG system.

M-LAG access: Server1 > Leaf1/Leaf2 > Server2

Non-M-LAG access: Server3 > Leaf3 > Leaf4 > Server4

Tester

Light/Congested

Layer 2 connectivity between servers attached to the same M-LAG system at the leaf tier.

2

Unicast/L2

East-west traffic between servers attached to the same leaf M-LAG system.

M-LAG access: Server2-Leaf2/Leaf1-Server1

Non-M-LAG access: Server4-Leaf4 –Leaf3-Server3

Tester

Light/Congested

Layer 2 connectivity between servers attached to the same M-LAG system at the leaf tier.

3

Known unicast/IPv4

East-west traffic between servers attached to the same leaf M-LAG system.

M-LAG access: Server1-Leaf1/Leaf2-Server2

Non-M-LAG access: Server3-Leaf3 –Leaf4-Server4

Tester

Light/Congested

Layer 3 connectivity between servers attached to the same M-LAG system at the leaf tier.

4

Known unicast/IPv4

East-west traffic between servers attached to the same leaf M-LAG system.

M-LAG access: Server2-Leaf2/Leaf1-Server1

Non-M-LAG access: Server4-Leaf4 –Leaf3-Server3

Tester

Light/Congested

Layer 3 connectivity between servers attached to the same M-LAG system at the leaf tier.

5

Known unicast/IPv4

East-west traffic between servers attached to different leaf M-LAG systems.

Server1 > Leaf1/Leaf2 > Spine 1/Spine 2 > Leaf3 > Server3

Tester

Light/Congested

Layer 3 connectivity between servers attached to different M-LAG systems at the leaf tier.

6

Known unicast/IPv4

East-west traffic between servers attached to different leaf M-LAG systems.

Server3 > Leaf3 > Spine 1/Spine 2 > Leaf1/Leaf2 > Server1

Tester

Light/Congested

Layer 3 connectivity between servers attached to different M-LAG systems at the leaf tier.

 

Convergence performance test results

Failure test results

Table 2 Link failure test results

Device

Failure cause

Traffic downtime

Recovery event

Traffic downtime

Leaf

Single uplink failure

420 ms

Recovery from a single E uplink failure

0 ms

Leaf dual-uplink failure

330 ms

Recovery from leaf dual-uplink failure

0 ms

Single-armed downlink failure

120 ms

Recovery from single-armed downlink failure

0 ms

Peer link  failure (Secondary device in M-LAG MAD DOWN state)

740 ms

Recovery from peer link failure

0 ms

Peer link member interface failure

10 ms

Recovery from peer link member interface failure

0 ms

M-LAG keepalive link failure

0 ms

Recovery from M-LAG keepalive link failure

0 ms

Restart of the primary device in the M-LAG system

170 ms

Restart of the primary device in the M-LAG system upon recovery

0 ms (Packet loss occurs if re-selection is performed.)

Restart of the secondary device in the M-LAG system

170 ms

Restart of the secondary device in the M-LAG system upon recovery

0 ms

Spine

Device restart

30 ms

Device restart upon recovery

0 ms

Single ECMP downlink failure

160 ms

Recovery from a single ECMP uplink failure

0 ms

 

Verifying the configuration

Verification commands

Leaf 1

Leaf 2

Description

display m-lag summary

display m-lag summary

Display brief information about the peer-link interface and M-LAG interfaces.

display m-lag system

display m-lag system

Display the M-LAG system settings.

display m-lag keepalive

display m-lag keepalive

Display M-LAG keepalive packet statistics.

display m-lag role

display m-lag role

Display M-LAG role information.

display m-lag consistency-check status

display m-lag consistency-check status

Display the configuration consistency check status.

display vrrp verbose

display vrrp verbose

Display detailed VRRP group information.

display priority-flow-control interface

display priority-flow-control interface

Display PFC information on an interface.

display qos wred interface

display qos wred interface

Display ECN information on an interface.

display qos map-table dot1p-lp

display qos map-table dot1p-lp

Display the configuration of the 802.1p-local priority map.

display qos map-table dscp-dot1p

display qos map-table dscp-dot1p

Display the configuration of the DSCP-802.1p priority map.

 

Procedure

# Verify that the M-LAG system that contains Leaf 1 and Leaf 2 is operating correctly.

[leaf1] display m-lag summary

Flags: A -- Aggregate interface down, B -- No peer M-LAG interface configured

       C -- Configuration consistency check failed

 

Peer-link interface: BAGG1000

Peer-link interface state (cause): UP

Keepalive link state (cause): UP

 

                     M-LAG interface information

M-LAG IF      M-LAG group  Local state (cause)  Peer state  Remaining down time(s)

BAGG100       1              UP                       UP           -

# Verify the M-LAG system settings on Leaf 1.

[leaf1] display m-lag system

                     System information

Local system number: 1                      Peer system number: 2

Local system MAC: 2001-0000-0018            Peer system MAC: 2001-0000-0018

Local system priority: 110                  Peer system priority: 110

Local bridge MAC: 0068-5716-5701            Peer bridge MAC: 90e7-10b2-f8aa

Local effective role: Primary               Peer effective role: Secondary

Health level: 0

Standalone mode on split: Disabled

In standalone mode: No

 

                     System timer information

Timer                      State       Value (s)    Remaining time (s)

Auto recovery              Disabled    -            -

Restore delay              Disabled    30           -

Consistency-check delay    Disabled    15           -

Standalone delay           Disabled    -            -

Role to None delay         Disabled    60           -

# Verify information about M-LAG keepalive packets on Leaf 1.

[leaf1] display m-lag keepalive

Neighbor keepalive link status (cause): Up

Neighbor is alive for: 451077 s 86 ms

Keepalive packet transmission status:

  Sent: Successful

  Received: Successful

Last received keepalive packet information:

  Source IP address: 1.1.1.2

  Time: 2022/06/07 16:19:43

  Action: Accept

 

M-LAG keepalive parameters:

Destination IP address: 1.1.1.2

Source IP address: 1.1.1.1

Keepalive UDP port : 6400

Keepalive interval : 1000 ms

Keepalive timeout  : 5 sec

Keepalive hold time: 3 sec

# Verify the roles of leaf devices in the M-LAG system on Leaf 1.

[leaf1] display m-lag role

                    Effective role information

Factors                  Local                    Peer

Effective role           Primary                  Secondary

Initial role             None                     None

MAD DOWN state           Yes                      Yes

Health level             0                        0

Role priority            50                       100

Bridge MAC               0068-5716-5701           90e7-10b2-f8aa

Effective role trigger: Peer link calculation

Effective role reason: Role priority

 

                    Configured role information

Factors                  Local                    Peer

Configured role          Primary                  Secondary

Role priority            50                       100

Bridge MAC               0068-5716-5701           90e7-10b2-f8aa

# Verify the configuration consistency status on Leaf 1.

[leaf1] display m-lag consistency-check status

                 Global Consistency Check Configuration

Local status     : Enabled           Peer status     : Enabled

Local check mode : Strict            Peer check mode : Strict

 

                 Consistency Check on Modules

Module           Type1           Type2

LAGG             Check           Check

VLAN             Check           Check

STP              Check           Check

MAC              Not Check       Check

L2VPN            Not Check       Check

PORTSEC          Not Check       Not Check

DOT1X            Not Check       Not Check

MACA             Not Check       Not Check

WEBAUTH          Not Check       Not Check

NETANALYSIS      Not Check       Check

 

                 Type1 Consistency Check Result

Global consistency check result: SUCCESS

Inconsistent global modules: -

 

M-LAG IF         M-LAG group ID     Check Result      Inconsistency modules

BAGG100          1                     SUCCESS            -

Tuning the parameters

Typically, set up the network according to the best practices and configuration guides, and use the recommended PFC and ECN settings. The recommended settings are optimal for general network environments. Do not tune the parameters unless otherwise required.

In some cases, because the devices and servers in the network configuration are special, you must tune the parameters. For more information, see “Guidelines for tuning parameters.”

Recommended PFC settings

After you enable PFC for an 802.1p priority on an S6805 & S6825 & S6850 & S9850, S9820-64H, S9820-8C, or S9825&S9855 switch, the switch will set default values for PFC thresholds. For more information, see Table 3, Table 4, and Table 5.

The default settings are optimal for general network environments. Do not tune the parameters unless otherwise required.

Table 3 Default PFC thresholds on the S6805 & S6825 & S6850 & S9850 switches

PFC threshold (right)

Interface type (below)

Headroom buffer threshold (cell)

Dynamic back pressure frame triggering threshold (%)

Offset between the back pressure frame stopping threshold and triggering threshold (cell)

PFC reserved threshold (cell)

1GE/10GE

100

5

12

17

25GE

125

5

12

17

40GE

200

5

12

17

50GE

308

5

12

17

100GE

491

5

12

17

 

Table 4 Default PFC thresholds on the S9820-64H switches

PFC threshold (right)

Interface type (below)

Headroom buffer threshold (cell)

Dynamic back pressure frame triggering threshold (%)

Offset between the back pressure frame stopping threshold and triggering threshold (cell)

PFC reserved threshold (cell)

25GE

125

5

12

20

100GE

491

5

12

20

 

Table 5 Default PFC thresholds on the S9820-8C switches

PFC threshold (right)

Interface type (below)

Headroom buffer threshold (cell)

Dynamic back pressure frame triggering threshold (%)

Offset between the back pressure frame stopping threshold and triggering threshold (cell)

PFC reserved threshold (cell)

100GE

491

5

12

20

200GE

705

5

12

20

400GE

1000

5

12

20

 

Recommended ECN settings

For the ECN feature, Table 6 shows the recommended settings for the RoCE queues. You can configure lower threshold and upper threshold for the average queue length to be greater than the default settings.

Table 6 Recommended ECN thresholds

PFC threshold (right)

Interface type (below)

Lower limit for the average queue length

Upper limit for the average queue length

Drop probability

Exponent for average queue length calculation

25GE

400

1625

20

0

50GE

600

1500

20

0

100GE

1000

2000

20

0

200GE

1500

3000

20

0

400GE

2100

5000

20

0

 

NIC settings for reference

Make sure the server NICs support PFC and ECN and have PFC and ECN enabled. For more information, see the documents for the servers. Identify whether packets in the RoCE queues of the servers carry the ECN flags.

This document uses the Mellanox ConnectX-6 Lx NICs as an example.

NOTE: The settings of a NIC do not take effect after the server or the NIC is rebooted, because the NIC settings are not written to the configuration file. You must configure the NIC settings again after the server or NIC is rebooted.

NIC model and version

Item

Information

NIC model

Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]

NIC driver version

MLNX_OFED_LINUX-5.4-3.2.7.2.3-rhel8.4-x86_64

NIC firmware version

driver: mlx5_core

version: 5.4-3.2.7.2.3

firmware-version: 26.31.2006 (MT_0000000531)

 

Prerequisites

1.     Execute the mst start command to enable the Mellanox Software Tools (MST) service.

2.     Execute the mst status command to view the Mellanox device status.

3.     (Optional.) Execute the show_gids command to view the name, GID, and IP address of the NIC.

Configuring the NIC interface to trust the DSCP priority values in packets

1.     Execute the mlnx_qos -i ifname --trust dscp command to configure the interface to trust the DSCP priority values in packets.

The ifname argument represents the NIC interface name.

2.     Additionally, make sure the DSCP priority configured for packets on the NIC corresponds to the DSCP priority of RoCE packets on the device. That is, the 802.1p priority for which PFC and ECN are enabled on the device must be mapped to the DSCP priority value in the packets sent out of the NIC.

Configuring the DSCP priority of CNP packets

1.     Use the ethtool -i ifname bus-info command to view the bus-info of an interface.

For example, to view the bus-info of ens1f0, execute the ethtool -i ens1f0 bus-info command. In the command output, you can see that the bus-info of ens1f0 is 0000:86:00.0.

2.     Enter the DSCP priority setting path: cd /sys/kernel/debug/mlx5/0000:86:00.0/cc_params

3.     Execute the echo priority_value > np_cnp_dscp command to set the DSCP priority of CNP packets.

For example, to set DSCP priority 48 for CNP packets, execute the echo 48 > np_cnp_dscp command.

4.     Execute the cat np_cnp_dscp command to identify whether the DSCP priority is successfully set for CNP packets.

Enabling PFC for the RoCE queues

1.     Execute the mlnx_qos -i ifname ––pfc 0,1,2,3,4,5,6,7 command to enable PFC for the RoCE queues.

The ifname parameter represents the NIC interface name. If you specify 0 in the position of an 802.1p priority in the 0,1,2,3,4,5,6,7 parameter, PFC is disabled for the corresponding 802.1p priority. If you specify 1, PFC is enabled. For example, to enable PFC for packets with 802.1p priority 5 on ens1f0, execute the mlnx_qos -i ens1f0 –-pfc 0,0,0,0,0,1,0,0 command.

2.     Execute the mlnx_qos -i ifname command to view the PFC enabling state on an interface.

The value of 1 means that PFC is enabled for packets with the specified priority.

Enabling ECN for the RoCE queues

1.     Execute the following commands to enable ECN for packets with the specified priority:

¡     echo 1 > /sys/class/net/ifname/ecn/roce_np/enable/priority_value

¡     echo 1 > /sys/class/net/ifname/ecn/roce_rp/enable/priority_value

For example, to enable ECN for packets with priority 5 on ens1f0, execute the following command:

¡     echo 1 > /sys/class/net/ens1f0/ecn/roce_np/enable/5

¡     echo 1 > /sys/class/net/ens1f0/ecn/roce_rp/enable/5

2.     Execute the following command to identify whether ECN is successfully enabled.

The value of 1 means ECN is successfully enabled, and the value of 0 means ECN is not enabled.

¡     cat /sys/class/net/ifname/ecn/roce_np/enable/priority_value

¡     cat /sys/class/net/ifname/ecn/roce_rp/enable/priority_value

Guidelines for tuning parameters

Restrictions and guidelines

When tuning parameters, port traffic will be interrupted if the following commands related to PFC, QoS, and data buffer are executed:

·     buffer apply

·     buffer egress cell queue shared (executing this command does not cause packet loss, but executing the buffer apply command to apply this configuration will cause packet loss)

·     qos wred apply

·     qos wrr weight

·     qos wrr group weight

·     qos wfq byte-count

·     qos wfq queue-id group { 1 | 2 } byte-count

·     priority-flow-control no-drop dot1p

·     priority-flow-control dot1p headroom

·     priority-flow-control dot1p ingress-buffer dynamic

·     priority-flow-control dot1p ingress-buffer static

·     priority-flow-control dot1p ingress-threshold-offset

·     priority-flow-control dot1p reserved-buffer

Identifying whether packets are dropped

Identifying whether packets are dropped on an interface

# View the dropped packets on interface HundredGigE 1/0/25.

<Sysname> display packet-drop interface hundredgige 1/0/25

HundredGigE1/0/25:

  Packets dropped due to Fast Filter Processor (FFP): 0

  Packets dropped due to Egress Filter Processor (EFP):  : 0

  Packets dropped due to STP non-forwarding state: 0

  Packets dropped due to insufficient data buffer. Input dropped: 0 Output dropped: 0

  Packets of ECN marked: 0

  Packets of WRED dropped: 0

Troubleshooting packet drops caused by insufficient data buffer

1.     If input dropped packets exist, you can increase the headroom buffer threshold. As a best practice, increase the threshold by the current value each time until no input dropped packet exist. For example:

# Set the headroom buffer threshold to 982.

<sysname> system-view

[Sysname] interface hundredgige 1/0/25

[Sysname-HundredGigE1/0/25] priority-flow-control dot1p 5 headroom 982

If packets are still dropped when you have increased the threshold to the maximum value, identify whether the server NIC supports PFC and has PFC enabled.

2.     If output dropped packets exist, identify whether the maximum shared-area ratio is configured as 100% and whether the buffer apply command is applied. For example:

# Configure queue 5 to use up to 100% shared-area space of cell resources in the egress buffer and apply the configuration.

<sysname> system-view

[Sysname] buffer egress cell queue 5 shared ratio 100

[Sysname] buffer apply

If output dropped packets still exist after the maximum shared-area ratio is set to 100% for the queue in the egress buffer, possible causes are network or configuration issues. Troubleshoot the network and configuration issues, or contact Technical Support.

Troubleshooting WRED dropped packets

1.     If WRED dropped packets exist, identify whether ECN is enabled for the RoCE queues.

<Sysname> display qos wred table

Table name: 1

Table type: Queue based WRED

QID   gmin  gmax  gprob  ymin  ymax  yprob  rmin  rmax  rprob  exponent  ECN

----------------------------------------------------------------------------

0     100   1000  10      100   1000  10      100   1000  10      9         N

1     100   1000  10      100   1000  10      100   1000  10      9         N

2     100   1000  10      100   1000  10      100   1000  10      9         N

3     100   1000  10      100   1000  10      100   1000  10      9         N

4     100   1000  10      100   1000  10      100   1000  10      9         N

5     100   1000  10      100   1000  10      100   1000  10      9         N

6     100   1000  10      100   1000  10      100   1000  10      9         N

7     100   1000  10      100   1000  10      100   1000  10      9         N

2.     If ECN is not enabled for an RoCE queue (N is displayed in the ECN column for the RoCE queue), you must enable ECN for that RoCE queue.

<Sysname> system-view

[Sysname] interface hundredgige 1/0/25

[Sysname-HundredGigE1/0/25] qos wred queue 5 ecn

3.     If the number of WRED dropped packets keeps increasing after you enable ECN for the RoCE queue, identify whether the packets from the RoCE queue carry the ECN flag. If they do not carry the ECN flag, identify whether the server has PFC and ECN enabled.

4.     If packet loss persists, contact Technical Support.

Identifying whether the latency meets the requirements

To meet the latency requirements, you can tune some ECN and PFC parameters to reduce the latency on the condition that packets are not lost. Tuning the latency will affect the throughout as follows when the network is congested:

·     The less buffer is used, the lower the latency and the lower the throughput.

·     The more buffer is used, the higher the latency and the higher the throughput.

Balance the relationship between latency and throughput according to the network requirements.

When the network is not congested, PFC and ECN do not take effect. After ECN and PFC are triggered, the packet forwarding rate is decreased. When tuning the parameters, try to trigger ECN and PFC on the condition that packets are not lost with as low buffer usage as possible. As a best practice, first tune ECN parameters.

Additionally, when you tune parameters, consider the server NIC capabilities, including the response mechanisms for PFC and ECN packets and the PFC and ECN-related capabilities. For example, some NICs support automatically decreasing the speed when the latency is high. In this case, suppose you want to tune parameters to increase the throughput. However, the tuned parameters increase the latency and cause the server to automatically decrease the speed and the throughput, causing unexpected results. As a best practice, before tuning parameters, first read related documents of the server, learn the support of server NICs for PFC and ECN, and identify whether the tuned parameters are appropriate during the tuning process.

Tuning ECN parameters

You can tune the following parameters for queues in the WRED table to control the latency and throughput:

·     Low-limit (lower limit for the average queue length) and high-limit (upper limit for the average queue length)—By tuning down the low-limit and high-limit values, you can trigger the ECN flag faster to decrease the latency. However, this operation might decrease the throughput.

·     Weighting-constant (exponent for average queue length calculation)—Specify the method of calculating the average queue length. The value of 0 indicates the average queue length is the real-time queue length and more sensitive to ECN flags. The greater the exponent for average queue length calculation, the less sensitive the average queue length is to real-time queue length changes. As a best practice, use the recommended value. Try to tune this parameter only when other tuning attempts do not take effect.

·     Discard-probability (drop probability)—After ECN is enabled, this parameter represents the ECN flag probability. A greater value means more packets between the lower limit and upper limit are marked with ECN flags. As a best practice, use the recommended value. Try to tune this parameter only when other tuning attempts do not take effect. As a best practice, tune the drop probability by 20% each time.

Example:

# Configure WRED parameters for queue 5: set the lower limit and upper limit for the average queue length to 800 and 1800, respectively.

<Sysname> system-view

[Sysname] qos wred queue table queue-table1

[Sysname-wred-table-queue-table1] queue 5 drop-level 0 low-limit 800 high-limit 1800

[Sysname-wred-table-queue-table1] queue 5 drop-level 1 low-limit 800 high-limit 1800

[Sysname-wred-table-queue-table1] queue 5 drop-level 2 low-limit 800 high-limit 1800

Tuning PFC parameters

CAUTION

CAUTION:

Tuning the PFC thresholds when traffic is being received or sent might cause packet loss.

 

PFC is a barrier behind ECN to ensure that packets are not lost. Typically, PFC is not triggered and does not seriously affect the latency. Additionally, lower PFC thresholds will decrease the throughput. As a best practice, do not tune PFC parameters unless required.

To further decrease the latency, try to tune the ingress-buffer parameter (back pressure frame trigger threshold).

1.     Decrease the ingress-buffer threshold.

As a best practice, tune the ingress-buffer threshold together with the high-limit parameter of WRED (tune down the value by 10% of the current value each time). However, you must make sure the ingress-buffer threshold is greater than the high-limit of WRED, so that ECN is preferentially triggered on the device.

# Set the dynamic back pressure frame triggering threshold to 4.

<sysname> system-view

[Sysname] interface hundredgige 1/0/25

[Sysname-HundredGigE1/0/25] priority-flow-control dot1p 5 ingress-buffer dynamic 4

2.     After tuning this parameter value, execute the following command multiple times to view the PFC information of interfaces. Make sure as few PFC frames as possible are received and sent on the device (PFC is not triggered or occasionally triggered).

If you see that PFC packets are received and sent multiple times, the ingress-buffer threshold is too low. As a best practice, increase the threshold.

# Display the PFC information of interfaces.

<Sysname> display priority-flow-control interface

Conf -- Configured mode   Ne -- Negotiated mode   P -- Priority

Interface     Conf Ne  Dot1pList   P Recv       Sent       Inpps      Outpps

HGE1/0/25     Auto On  0,2-3,5-6   0 178         43         12         15

Upgrading the devices

Upgrading a leaf device

Verifying that all upgrade requirements are met

Execute the commands in "Verification commands" and the following commands to verify that all upgrade requirements are met.

 

Leaf 1

Leaf 2

Description

display device

display device

Displays device information.

display boot-loader

display boot-loader

Displays the current software images and startup software images.

display version

display version

Displays system version information.

 

Upgrading the device

See H3C Switches M-LAG System Upgrade & Replacement & Expansion Guide.

Estimating upgrade downtime

To minimize the impact on services, use information provided in "Convergence performance test results" to estimate downtime when you schedule an upgrade.

When you upgrade the M-LAG member devices one by one while the traffic volume is light, traffic downtime of a member device is less than 500 ms upon failover and 50 ms upon fallback.

Verifying the upgrade result

Execute the commands in "Verification commands" and the following commands to verify that the device is upgraded successfully.

 

Leaf 1

Leaf 2

Description

display device

display device

Displays device information.

display boot-loader

display boot-loader

Displays the current software images and startup software images.

display version

display version

Displays system version information.

 

Upgrading a spine device

Verifying that all upgrade requirements are met

Execute the commands in "Verification commands" and the following commands to verify that all upgrade requirements are met.

 

Spine 1

Spine 2

Description

display device

display device

Displays device information.

display boot-loader

display boot-loader

Displays the current software images and startup software images.

display version

display version

Displays system version information.

 

Upgrading the device

1.     Execute the display version command to verify the current BootWare image version and startup software version.

2.     Use the release notes for the upgrade software version to evaluate the upgrade impact on your network and verify the following items:

¡     Software and hardware compatibility.

¡     Version and size of the upgrade software.

¡     Compatibility of the upgrade software with the current BootWare image and startup software image.

3.     Use the release notes to verify whether the upgrade software images require a license. If licenses are required, check the system for availability of valid licenses. If no valid licenses are available, register and activate licenses for each license-based software image.

4.     Use the dir command to verify that the device has sufficient storage space for the upgrade images. If the storage space is not sufficient, delete unused files by using the delete command. Make sure that all MPUs in the system have sufficient storage space.

5.     Use FTP or TFTP to transfer the upgrade image file to the root directory of a file system.

6.     Upgrade the software. For more information about the software upgrade procedure, see the fundamentals configuration guide for the device.

Estimating upgrade downtime

To minimize the impact on services, use information provided in "Convergence performance test results" to estimate upgrade downtime when you schedule an upgrade.

When you upgrade the M-LAG member devices one by one while the traffic volume is light, traffic downtime of a member device is less than 200 ms upon failover and 50 ms upon fallback.

Verifying the upgrade result

Execute the commands in "Verification commands" and the following commands to verify that the device is upgraded successfully.

 

Leaf 1

Leaf 2

Description

display device

display device

Displays device information.

display boot-loader

display boot-loader

Displays the current software images and startup software images.

display version

display version

Displays system version information.

 

Expanding the network

An expand operation adds two leaf devices.

Adding a leaf device

Verifying that all expansion requirements are met

Execute the commands in "Verification commands" and the following commands to verify that all requirements are met for an expansion.

 

Leaf 1

Leaf 2

Description

display device

display device

Displays device information.

display boot-loader

display boot-loader

Displays the current software images and startup software images.

display version

display version

Displays system version information.

 

Adding the expansion device to the network

1.     Make sure the expansion device is not connected to network management systems.

2.     Upgrade the device to the target software version.

3.     Preconfigure the device.

4.     Connect the device to the network management systems.

Estimating expansion downtime

To minimize the impact on services, use information provided in "Convergence performance test results" to estimate the downtime when you schedule a node expansion.

Verifying the expansion result

Execute the following commands to verify that the device has been added successfully.

 

Leaf 1

Leaf 2

Description

display device

display device

Displays device information.

display boot-loader

display boot-loader

Displays the current software images and startup software images.

display version

display version

Displays system version information.

 

Replacing hardware

Replacing an interface module

Verifying that all replacement requirements are met

Execute the commands in "Verification commands" and the following commands to verify that all requirements are met for a replacement.

 

Leaf 1

Leaf 2

Description

display device

display device

Displays device information.

display boot-loader

display boot-loader

Displays the current software images and startup software images.

display version

display version

Displays system version information.

 

Replacing hardware

Before you replace an interface module, make sure the service and management traffic has switched over to other interface modules that are operating correctly.

Replace the interface modules online while the system is operating or power off the system before you do the replacement, depending on the evaluation of the conditions.

For details, see H3C Switches M-LAG System Upgrade & Replacement & Expansion Guide.

Estimating replacement downtime

To minimize the impact on services, use information provided in "Convergence performance test results" to estimate the downtime when you schedule a hardware replacement.

Verifying the replacement result

Use the same commands for pre-replacement verification to verify that the system can operate correctly after the hardware replacement.

Replacing a switching fabric module

Verifying that all replacement requirements are met

Execute the commands in "Verification commands" and the following commands to verify that all requirements are met for a replacement.

 

Leaf 1

Leaf 2

Description

display device

display device

Displays device information.

display boot-loader

display boot-loader

Displays the current software images and startup software images.

display version

display version

Displays system version information.

 

Replacing hardware

Replace the switching fabric module online while the system is operating or power off the system before you do the replacement, depending on the evaluation of the conditions.

Estimating replacement downtime

To minimize the impact on services, use information provided in "Convergence performance test results" to estimate the downtime when you schedule a hardware replacement.

Verifying the replacement result

Use the same commands for pre-replacement verification to verify that the system can operate correctly after the hardware replacement.

 

 

 

  • Cloud & AI
  • InterConnect
  • Intelligent Computing
  • Security
  • SMB Products
  • Intelligent Terminal Products
  • Product Support Services
  • Technical Service Solutions
All Services
  • Resource Center
  • Policy
  • Online Help
All Support
  • Become a Partner
  • Partner Resources
  • Partner Business Management
All Partners
  • Profile
  • News & Events
  • Online Exhibition Center
  • Contact Us
All About Us
新华三官网