Cumulus Linux non-disruptive upgrade procedure on MLAG pairs

I thought it would be useful to know the exact procedure for non-disruptive upgrade on Cumulus Linux MLAG – CLAG pairs. I find the online documentation Upgrading Cumulus Linux a bit short when it comes to running CLAG in what order you have to upgrade the switches with a minimal disruption of traffic..

The following procedure below worked for me on Dell S4048-ON and Dell S3048-ON switches

  • On both switches, run the following command to refresh the package index of the apt repository:
sudo apt-get update
  • Run the following command to determine which switch is CLAG primary- and which switch CLAG secondary:
sudo net show clag

Start upgrading the secondary CLAG member:

  • Shutdown on all interfaces except the peerlink using the commands below.  This will force all traffic through the other switch:
echo swp{1..52} | tr ' ' '\n' | sudo xargs -i ip link set {} down
64 bytes from 8.8.8.8: icmp_seq=8903 ttl=59 time=1.106 ms
64 bytes from 8.8.8.8: icmp_seq=8904 ttl=59 time=0.974 ms
64 bytes from 8.8.8.8: icmp_seq=8905 ttl=59 time=1.643 ms
64 bytes from 8.8.8.8: icmp_seq=8906 ttl=59 time=0.869 ms
Request timeout for icmp_seq 8907
64 bytes from 8.8.8.8: icmp_seq=8908 ttl=59 time=1.256 ms
64 bytes from 8.8.8.8: icmp_seq=8909 ttl=59 time=0.769 ms

(Rollback) If problems are seen revert the change, the commands shown:

echo swp{1..52} | tr ' ' '\n' | sudo xargs -i ip link set {} up

Wait one minute for CLAG to stabilise and verify network communication with the remaining switch.

  • Perform a clean shutdown of clagd on this switch
sudo systemctl stop clagd

(Rollback) If you see problems start clagd again:

sudo systemctl start clagd

Wait one minute for CLAG to cleanly shut down

  • Shutdown peerlink bond
sudo ip link set peerlink down

(Rollback) If you see problems enable peerlink again:

sudo ip link set peerlink up
  • Perform the upgrade using the command:
sudo apt-get upgrade

The reason why it is important to do a clean shutdown of all the ports is that the bridge and peerlink bounces during the package upgrade which could affect the network communication if this happens uncontrolled.

  • Reboot the switch using the command
sudo reboot

Wait for the upgraded switch to come up. This will cause a short outage in traffic.

64 bytes from 8.8.8.8: icmp_seq=9443 ttl=59 time=1.069 ms
64 bytes from 8.8.8.8: icmp_seq=9444 ttl=59 time=1.150 ms
64 bytes from 8.8.8.8: icmp_seq=9445 ttl=59 time=0.993 ms
64 bytes from 8.8.8.8: icmp_seq=9446 ttl=59 time=1.331 ms
Request timeout for icmp_seq 9447
Request timeout for icmp_seq 9448
Request timeout for icmp_seq 9449
64 bytes from 8.8.8.8: icmp_seq=9450 ttl=59 time=1.539 ms
64 bytes from 8.8.8.8: icmp_seq=9451 ttl=59 time=0.908 ms
64 bytes from 8.8.8.8: icmp_seq=9452 ttl=59 time=1.166 ms
64 bytes from 8.8.8.8: icmp_seq=9453 ttl=59 time=1.261 ms

Wait until the network is functioning normally again.

On the secondary, run the following command to take over the primary role:

sudo clagctl priority 0

Wait one minute for CLAG to failvoer

Verify that the CLAG handover has occurred:

sudo net show clag

Repeat steps on the new secondary (old primary) to shutdown all interfaces.

Ones finished you need to reset the clag priority on the primary to its configured default value.

Read my next post, how to rollback if an upgrade failed: Cumulus Linux Snapshot Rollback