LACP, Yeah You Know Me

Traditionally when configuring vCenter and ESXi, most customers go the default networking route with Management, vMotion, and vSAN Port Groups sitting on a vSphere Distributed Switch (vDS).  Usually, I see customers configure for active/active or active/standby.  When using active/standby I usually see Management and vMotion with Uplink1 active and Uplink2 standby and then Uplink2 active and Uplink1 standby for vSAN traffic. 

Configuring for active/active doesn’t really take advantage of both NICs equally and is not a great load balancing technique.  Using the active/standby deployment as discussed above usually works well, but I usually see the vSAN link used significantly more than the Management/vMotion link. 

What if I told you that there was a better way?  Something that would allow for using all the available bandwidth while providing redundancy in case of failure.  Well Hello Nur…errr…LACP.

First, I think it is critical to understand the anacronyms and terminology:

LACP – Link Aggregation Control Protocol – Allows for the bundling of multiple ports into one single combined interface that allows for load balancing and redundancy.  An example would be 2x25G ports being bundled together to form a 50G pipe.  This is the overall technology, but different vendors use different naming for how they do this.

  • LAG = Port-Channel (PC) – One vendor uses Link Aggregation Groups (LAGs) and one uses Port-Channels.  These are the same thing.  This allows for multiple ports on the same switch to be bundled together to for the higher bandwidth interface. 

  • MLAG = vPC – Multi-Chassis Link Aggregation (MLAG) and Virtual Port Channel (vPC) are both doing the same thing.  They allow you to use port/ports from two separate switches and combine them together to get the higher bandwidth interface.  This allows for increased redundance where losing a switch will only take down one of the two bonded interfaces. 

Now that we understand the anacronyms, let’s get this configured.  We are going to begin where many customers are today with their production system up and running, with a Virtual Distributed Switch (vDS).  Teaming and failover for the Management and vMotion portgroups is configured with Uplink 1 active and Uplink 2 in standby while the vSAN port group is configured for Uplink 2 active and Uplink 1 standby. 

The first change we need to make is to change the failover order for all the port groups to an active/standby model with Uplink 1 active and Uplink 2 standby.  The easiest way to do this is to right click on your vDS, choose “Distributed Port Group” and then click “Manage Distributed Port Groups”

Select “Teaming and failover” and then click “NEXT”


Select all your port groups and click “NEXT”.


Make sure that Uplink 1 is active, and Uplink 2 is standby then click “NEXT”.  Review your changes and click “FINISH”.

My switches have two 25G cables running between two Cisco 93180 switches that are configured with vPC; ports 47/48 on each switch are bundled together to for a 50G tunnel between the switches.  You must have vPC between the switches setup and configured if you are going to use MLAG/vPC. 

For information around Cisco and vPC please see Configuring vPCs.

Using the command “sh port-channel summary” we can see that the vPC link (port-channel 161) is active and communicating properly between the switches. 

Now that we have verified the two switches talking to each other, we are going to look at the host networking to the switches.  Each ESXi host has two 25G connections, one running to each switch.  In my lab the hosts are using ports 1-8 on each switch.  Notice, the only difference is my description; I like to show that there are two ports tied to this LAG.

On Switch A we are going to create the port-channels, but we are not going to add eth1/1-8 to them yet because that would break communication.  These switchports are cabled to the hypervisors’ vmnic associated with Uplink 1 (currently Active on our vDS port groups and currently handling all traffic in and out of the host.)  You would run the same commands on both switches changing up the description if desired.

Interface po201
Description esx01-LAG1-p1
vpc 201

Interface eth1/1
switchport mode active
mtu 9216
Description esx01-LAG1-p1

Interface po201
Description esx01-LAG1-p1
vpc 201

Interface eth1/2
switchport mode active
mtu 9216
Description esx02-LAG1-p1

Interface po202
Description esx02-LAG1-p1
vpc 202

Interface eth1/3
switchport mode active
mtu 9216
Description esx03-LAG1-p1

Interface po203
Description esx03-LAG1-p1
vpc 203

Interface eth1/4
switchport mode active
mtu 9216
Description esx04-LAG1-p1

Interface po204
Description esx04-LAG1-p1
vpc 204

Interface eth1/5
switchport mode active
mtu 9216
Description esx05-LAG1-p1

Interface po205
Description esx05-LAG1-p1
vpc 205

Interface eth1/6
switchport mode active
mtu 9216
Description esx06-LAG1-p1

Interface po206
Description esx06-LAG1-p1
vpc 206

Interface eth1/7
switchport mode active
mtu 9216
Description esx07-LAG1-p1

Interface po207
Description esx07-LAG1-p1
vpc 207

Interface eth1/8
switchport mode active
mtu 9216
Description esx08-LAG1-p1

Interface po208
Description esx08-LAG1-p1
vpc 208

Running “sh port-channel summary” shows that I created all my port channels, but none contain member ports yet.

Only on switch B, where these switchports are cabled to the hypervisors’ vmnic associated with Uplink 2 (currently Standby on our vDS port groups,) we are now going to add the ethernet interfaces to the port channel groups. 

SWITCH B ONLY

Interface eth1/1
channel-group 201 mode active

Interface eth1/2
channel-group 202 mode active

Interface eth1/3
channel-group 203 mode active

Interface eth1/4
channel-group 204 mode active

Interface eth1/5
channel-group 205 mode active

Interface eth1/6
channel-group 206 mode active

Interface eth1/7
channel-group 207 mode active

Interface eth1/8
channel-group 208 mode active

Running “sh port-channel summary” now shows that we have ports connected to this vPC now on switch2, but they are in a suspended state.  In vCenter, we are now going to move Uplink 2 to the LAG on the vDS.

Now we are going to create the LACP group.  Click on your vDS, then click “Configure” and then click LACP from the settings pane.  You will see the “New Link Aggregation Group” window.   Change the name as needed and just take the defaults for the rest and then click “OK”.

Click “MIGRATING NETWORK TRAFFIC TO LAGS”, then click “MANAGE DISTRIBUTED PORT GROUPS”.

Select “Teaming and failover” and click “NEXT”.

Select your port groups then click “NEXT”.

Move Uplink 2 to unused and LAG1 to standby and click “NEXT”.  You will get a notification that using standalone uplinks and a standby LAG should only be temporary while migrating to the LAG.  Click “OK”

Now we are going to migrate our Uplink 2 to LAG1.  Click “MIGRATING NETWORK TRAFFIC TO LAGS” then click “ADD AND MANAGE HOSTS”.

Click “Manage host networking” then click “NEXT”.


Select your hosts and then click “NEXT”

Select vmnic3 (your vmnics might be different) uplink and change it to “LAG1-1” then click “NEXT”.

We don’t need to make changes to our VMkernel adapters because they already live on the vDS and this LAG group is part of that.  Click “NEXT”.

We are not migrating any VM networking so click “NEXT” then click “FINISH” on the Ready to complete screen.

We currently have vmnic2 Uplink 1 on our vDS using the standard configuration and vmnic3 is attached to LAG-1.  Next, we need to move vmnic2 Uplink 1 to the MLAG.  Click “ADD AND MANAGE HOSTS”.

Click “Manage host networking” then click “NEXT”.

Select your hosts and then click “NEXT”

Select vmnic2 Uplink 1 and change it to “LAG1-0” then click “NEXT”.

We don’t need to make changes to our VMkernel adapters because they already live on the vDS and this LAG group is part of that.  Click “NEXT”.

We are not migrating any VM networking so click “NEXT” then click “FINISH” on the Ready to complete screen.

We now have both Uplinks on the MLAG.  Now we need to configure our ports on switch A to be part of the port-channel groups.  SSH into switch A and run the following. 

SWITCH A ONLY

Interface eth1/1
channel-group 201 mode active

Interface eth1/2
channel-group 202 mode active

Interface eth1/3
channel-group 203 mode active

Interface eth1/4
channel-group 204 mode active

Interface eth1/5
channel-group 205 mode active

Interface eth1/6
channel-group 206 mode active

Interface eth1/7
channel-group 207 mode active

Interface eth1/8
channel-group 208 mode active

Running “sh port-channel summary” we now see both sides of our vPC with the status of “P” which is “Up in port-channel”.

We have one final step and that is to make the MLAG the only active connection for teaming and failover.  Click “MANAGE DISTRIBUTED PORT GROUPS”.

Select “Teaming and failover” and click “NEXT”.

Select your port groups then click “NEXT”.

Move both Uplink 1 and Uplink 2 to Unused and move LAG1 to Active and click “NEXT” then “FINISH”.

For those with short attention spans; TLDR

Steps:

  1. Make sure that the vPC peer link is configured based on best practice from your switch vendor.
  2. Change vDS port groups to use Uplink 1 active and Uplink 2 standby.
  3. Configure port-channels on both switch A and switch B.
  4. Configure ethernet ports on switch A and switch B for ESXi hosts.
  5. Add only switch B ethernet ports to port-channel groups.
  6. Create LAG in vCenter.
  7. Edit port groups and make LAG1 standby and make Uplink 2 unused.
  8. Move vmnic3 from Uplink 2 to LAG1-1 and validate port-channel configuration.
  9. Move vmnic2 from Uplink 1 to LAG1-0
  10. Add switch A ethernet ports to port-channel groups.
  11. Move LAG1 to active and Uplink 1 and Uplink 2 to unused for all port groups.
  12. Profit.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s