Continuous Integration and Delivery for Networking with Cumulus Linux

Continuous Integration – Continuous Delivery (CICD) is becoming more and more popular for network automation but the problem is how to validate your scripts and stage the configuration because you don’t want to deploy untested code to a production system. Especially in networking that could be pretty destructive if you made a mistake which could cause a loss in connectivity.

I spend some days working on a Cumulus Linux lab using Vagrant which I use to stage configuration. You find the basic Ansible playbook and the gitlab-ci configuration for the Cumulus lab in my Github repo: cumulus-lab-provision

For the continuous integration and delivery (CI/CD) pipeline I am using Gitlab.com and their Gitlab-runner which is running on my server. I will not get into too much detail what is needed on the server, basically it runs vargant, libvirt (kvm), virtualbox, ansible and the gitlab-runner.

  • You need to register your Gitlab-runner with the Gitlab repository.

  • The next step is to create your .gitlab-ci.yml which defines your CI-pipeline.
---
stages:
    - validate ansible
    - staging
    - production
validate:
    stage: validate ansible
    script:
        - bash ./linter.sh
staging:
    before_script:
        - git clone https://github.com/berndonline/cumulus-lab-vagrant.git
        - cd cumulus-lab-vagrant/
        - python ./topology_converter.py ./topology-staging.dot
          -p libvirt --ansible-hostfile
    stage: staging
    script:
        - bash ../staging.sh
production:
    before_script:
        - git clone https://github.com/berndonline/cumulus-lab-vagrant.git
        - cd cumulus-lab-vagrant/
        - python ./topology_converter.py ./topology-production.dot
          -p libvirt --ansible-hostfile
    stage: production
    when: manual
    script:
        - bash ../production.sh
    only:
        - master

In the gitlab-ci you see that I clone the cumulus vagrant lab which I use to spin-up a virtual staging environment and run the Ansible playbook against the virtual lab. The production stage is in my example also a vagrant environment because I had no physical switches for testing.

  • Basically any commit or merge in the Gitlab repo triggers the pipeline which I define in the gitlab-ci.

  • You can see the details in the running job. The first stage is only to validate that the YAML files have the correct syntax.

  • Here the details of the running job of staging and when everything goes well the job succeeded.

  • The last stage is production which needs to be triggered manually.

  • After the changes run through all defined stages you see that you successfully validate, staged and deployed your configuration to a cumulus production system.

This is a complete different way of working for a network engineer but the way it goes in fully automated datacenter network environments. It gets very powerful when you combine this with the Cumulus NetQ server to validate the state of your switch fabric after you run changes in production.

The next topic I am working on, is using Cumulus NetQ to validate configuration changes.

Here again my two repositories I use:

https://github.com/berndonline/cumulus-lab-vagrant

https://github.com/berndonline/cumulus-lab-provision

Read my new posts about an Ansible Playbook for Cumulus Linux BGP IP-Fabric and Cumulus NetQ Validation and BGP EVPN and VXLAN with Cumulus Linux.

Ansible Playbook for Cumulus NetQ Agent Installation

Here a short Ansible script to install the Cumulus NetQ agent on Cumulus Linux switches.

---
- hosts: spine leaf
  remote_user: cumulus
  gather_facts: no
  become: yes
  vars:
    ansible_become_pass: "CumulusLinux!"
  tasks:
    - name: Install cumulus-netq
      apt: name=cumulus-netq update_cache=yes state=present
      register: result

    - name: Restart Syslog service
      service: name=rsyslog state=restarted
      when: result.stdout is defined

    - pause: seconds=5

    - name: Add netq server IP addr
      command: netq config add server 192.168.100.133
      when: result.stdout is defined

    - name: Start netq-agent
      service: name=netq-agent state=restarted
      when: result.stdout is defined

Your NetQ VM needs to be reachable from the switches otherwise the command “netq add server…” will fail.

You find more information in the official Cumulus NetQ documentation:  https://docs.cumulusnetworks.com/display/NETQ/Getting+Started+with+NetQ

Cumulus Networks NetQ telemetry-based validation system

I had some time to play around with the new NetQ tool from Cumulus which checks your Cumulus Linux switch fabric.

I did some testing with my Cumulus Layer 2 Fabric example: Ansible Playbook for Cumulus Linux (Layer 2 Fabric)

You need to download the NetQ VM from Cumulus as VMware or VirtualBox template: here

It is a great tool to centrally check your Cumulus switches and keep history about changes in your environment. NetQ can send out notification about changes in your fabric which is nice because you are always up-to-date what is going on in your network.

Installing NetQ agent on a Cumulus Linux Switch:

cumulus@spine-1:~$ sudo apt-get update
cumulus@spine-1:~$ sudo apt-get install cumulus-netq -y

Configuring the NetQ Agent on a switch:

cumulus@spine-1:~$ sudo systemctl restart rsyslog
cumulus@spine-1:~$ netq add server 192.168.100.133
cumulus@spine-1:~$ netq agent restart

I will write a small Ansible script in the next days to automate the agent installation and configuration.

Connect to Cumulus NetQ VM and check agent connectivity

admin@cumulus:~$ netq-shell

Welcome to Cumulus (R) NetQ Command Line Interface
TIP: Type `netq help` to get started.

netq@dc9163c7044e:/$ netq show agents
Node     Status    Sys Uptime    Agent Uptime
-------  --------  ------------  --------------
leaf-1   Fresh     1h ago        1h ago
leaf-2   Fresh     1h ago        1h ago
spine-1  Fresh     1h ago        1h ago
spine-2  Fresh     1h ago        1h ago
netq@dc9163c7044e:/$

Basic Show Commands:

netq@dc9163c7044e:/$ netq show clag
Matching CLAG session records are:
Node             Peer             SysMac            State Backup #Links #Dual Last Changed
---------------- ---------------- ----------------- ----- ------ ------ ----- --------------
leaf-1           leaf-2(P)        44:38:39:ff:40:93 up    up     1      1     8m ago
leaf-2(P)        leaf-1           44:38:39:ff:40:93 up    up     1      1     8m ago
spine-1(P)       spine-2          44:38:39:ff:40:94 up    up     1      1     8m ago
spine-2          spine-1(P)       44:38:39:ff:40:94 up    up     1      1     9m ago
netq@dc9163c7044e:/$
 
netq@dc9163c7044e:/$ netq show lldp
LLDP peer info for *:*
Node     Interface    LLDP Peer    Peer Int    Last Changed
-------  -----------  -----------  ----------  --------------
leaf-1   eth0         cumulus      eth0        1h ago
leaf-1   eth0         leaf-2       eth0        1h ago
leaf-1   eth0         spine-1      eth0        1h ago
leaf-1   eth0         spine-2      eth0        1h ago
leaf-1   swp1         spine-1      swp1        1h ago
leaf-1   swp11        leaf-2       swp11       9m ago
leaf-1   swp2         spine-2      swp1        1h ago
leaf-2   eth0         cumulus      eth0        1h ago
leaf-2   eth0         leaf-1       eth0        1h ago
leaf-2   eth0         spine-1      eth0        1h ago
leaf-2   eth0         spine-2      eth0        1h ago
leaf-2   swp1         spine-2      swp2        1h ago
leaf-2   swp11        leaf-1       swp11       8m ago
leaf-2   swp2         spine-1      swp2        1h ago
spine-1  eth0         cumulus      eth0        1h ago
spine-1  eth0         leaf-1       eth0        1h ago
spine-1  eth0         leaf-2       eth0        1h ago
spine-1  eth0         spine-2      eth0        1h ago
spine-1  swp1         leaf-1       swp1        1h ago
spine-1  swp11        spine-2      swp11       1h ago
spine-1  swp2         leaf-2       swp2        8m ago
spine-2  eth0         cumulus      eth0        1h ago
spine-2  eth0         leaf-1       eth0        1h ago
spine-2  eth0         leaf-2       eth0        1h ago
spine-2  eth0         spine-1      eth0        1h ago
spine-2  swp1         leaf-1       swp2        1h ago
spine-2  swp11        spine-1      swp11       1h ago
spine-2  swp2         leaf-2       swp1        8m ago
netq@dc9163c7044e:/$
 
netq@dc9163c7044e:/$ netq show interfaces type bond
Matching interface records are:
Node             Interface        Type     State Details                     Last Changed
---------------- ---------------- -------- ----- --------------------------- --------------
leaf-1           bond1            bond     up    Slave: swp1(spine-1:swp1),  10m ago
                                                 Slave: swp2(spine-2:swp1),
                                                 VLANs:  100-199, PVID: 1,
                                                 Master: bridge, MTU: 1500
leaf-1           peerlink         bond     up    Slave: swp11(leaf-2:swp11), 10m ago
                                                 VLANs: , PVID: 0,
                                                 Master: peerlink, MTU: 1500
leaf-2           bond1            bond     up    Slave: swp1(spine-2:swp2),  10m ago
                                                 Slave: swp2(spine-1:swp2),
                                                 VLANs:  100-199, PVID: 1,
                                                 Master: bridge, MTU: 1500
leaf-2           peerlink         bond     up    Slave: swp11(leaf-1:swp11), 10m ago
                                                 VLANs: , PVID: 0,
                                                 Master: peerlink, MTU: 1500
spine-1          bond1            bond     up    Slave: swp1(leaf-1:swp1),   10m ago
                                                 Slave: swp2(leaf-2:swp2),
                                                 VLANs:  100-199, PVID: 1,
                                                 Master: bridge, MTU: 1500
spine-1          peerlink         bond     up    Slave: swp11(spine-2:swp11) 1h ago
                                                 , VLANs:  100-199,
                                                 PVID: 1, Master: bridge,
                                                 MTU: 1500
spine-2          bond1            bond     up    Slave: swp1(leaf-1:swp2),   10m ago
                                                 Slave: swp2(leaf-2:swp1),
                                                 VLANs:  100-199, PVID: 1,
                                                 Master: bridge, MTU: 1500
spine-2          peerlink         bond     up    Slave: swp11(spine-1:swp11) 1h ago
                                                 , VLANs:  100-199,
                                                 PVID: 1, Master: bridge,
                                                 MTU: 1500
netq@dc9163c7044e:/$
 
netq@dc9163c7044e:/$ netq show ip routes
Matching IP route records are:
Origin Table            IP               Node             Nexthops                   Last Changed
------ ---------------- ---------------- ---------------- -------------------------- ----------------
1      default          169.254.1.0/30   leaf-1           peerlink.4093              11m ago
1      default          169.254.1.0/30   leaf-2           peerlink.4093              11m ago
1      default          169.254.1.0/30   spine-1          peerlink.4094              1h ago
1      default          169.254.1.0/30   spine-2          peerlink.4094              1h ago
1      default          169.254.1.1/32   leaf-1           peerlink.4093              11m ago
1      default          169.254.1.1/32   spine-1          peerlink.4094              1h ago
1      default          169.254.1.2/32   leaf-2           peerlink.4093              11m ago
1      default          169.254.1.2/32   spine-2          peerlink.4094              1h ago
1      default          192.168.100.0/24 leaf-1           eth0                       1h ago
1      default          192.168.100.0/24 leaf-2           eth0                       1h ago
1      default          192.168.100.0/24 spine-1          eth0                       1h ago
1      default          192.168.100.0/24 spine-2          eth0                       1h ago
1      default          192.168.100.205/ spine-1          eth0                       1h ago
                        32
1      default          192.168.100.206/ spine-2          eth0                       1h ago
                        32
1      default          192.168.100.207/ leaf-1           eth0                       1h ago
                        32
1      default          192.168.100.208/ leaf-2           eth0                       1h ago
                        32
0      vrf-prod         0.0.0.0/0        spine-1          Blackhole                  1h ago
0      vrf-prod         0.0.0.0/0        spine-2          Blackhole                  1h ago
1      vrf-prod         10.1.0.0/24      spine-1          bridge.100                 1h ago
1      vrf-prod         10.1.0.0/24      spine-2          bridge.100                 1h ago
1      vrf-prod         10.1.0.252/32    spine-1          bridge.100                 1h ago
1      vrf-prod         10.1.0.253/32    spine-2          bridge.100                 1h ago
1      vrf-prod         10.1.0.254/32    spine-1          bridge.100                 1h ago
1      vrf-prod         10.1.0.254/32    spine-2          bridge.100                 1h ago
1      vrf-prod         10.1.1.0/24      spine-1          bridge.101                 1h ago
1      vrf-prod         10.1.1.0/24      spine-2          bridge.101                 1h ago
1      vrf-prod         10.1.1.252/32    spine-1          bridge.101                 1h ago
1      vrf-prod         10.1.1.253/32    spine-2          bridge.101                 1h ago
1      vrf-prod         10.1.1.254/32    spine-1          bridge.101                 1h ago
1      vrf-prod         10.1.1.254/32    spine-2          bridge.101                 1h ago
1      vrf-prod         10.1.2.0/24      spine-1          bridge.102                 1h ago
1      vrf-prod         10.1.2.0/24      spine-2          bridge.102                 1h ago
1      vrf-prod         10.1.2.252/32    spine-1          bridge.102                 1h ago
1      vrf-prod         10.1.2.253/32    spine-2          bridge.102                 1h ago
1      vrf-prod         10.1.2.254/32    spine-1          bridge.102                 1h ago
1      vrf-prod         10.1.2.254/32    spine-2          bridge.102                 1h ago
netq@dc9163c7044e:/$

See Changes in Switch Fabric:

netq@dc9163c7044e:/$ netq leaf-1 show interfaces type bond
Matching interface records are:
Node             Interface        Type     State Details                     Last Changed
---------------- ---------------- -------- ----- --------------------------- --------------
leaf-1           bond1            bond     up    Slave: swp1(spine-1:swp1),  2s ago
                                                 Slave: swp2(spine-2:swp1),
                                                 VLANs:  100-199, PVID: 1,
                                                 Master: bridge, MTU: 1500
leaf-1           peerlink         bond     up    Slave: swp11(leaf-2:swp11), 21m ago
                                                 VLANs: , PVID: 0,
                                                 Master: peerlink, MTU: 1500
netq@dc9163c7044e:/$
 
cumulus@leaf-1:~$ sudo ifdown bond1
cumulus@leaf-1:~$
 
netq@dc9163c7044e:/$ netq leaf-1 show interfaces type bond
Matching interface records are:
Node             Interface        Type     State Details                     Last Changed
---------------- ---------------- -------- ----- --------------------------- --------------
leaf-1           peerlink         bond     up    Slave: swp11(leaf-2:swp11), 22m ago
                                                 VLANs: , PVID: 0,
                                                 Master: peerlink, MTU: 1500
netq@dc9163c7044e:/$
 
netq@dc9163c7044e:/$ netq leaf-1 show interfaces type bond changes
Matching interface records are:
Node             Interface        Type     State Details                     DbState Last Changed
---------------- ---------------- -------- ----- --------------------------- ------- --------------
leaf-1           bond1            bond     down  VLANs: , PVID: 0,           Del     21s ago
                                                 Master: bridge, MTU: 1500
leaf-1           bond1            bond     down  Slave: swp1(),              Add     21s ago
                                                 Slave: swp2(),
                                                 VLANs:  100-199, PVID: 1,
                                                 Master: bridge, MTU: 1500
leaf-1           bond1            bond     up    Slave: swp1(spine-1:swp1),  Add     1m ago
                                                 Slave: swp2(spine-2:swp1),
                                                 VLANs:  100-199, PVID: 1,
                                                 Master: bridge, MTU: 1500 

More information you can find in the Cumulus NetQ documentation: https://docs.cumulusnetworks.com/display/NETQ/NetQ

New Cumulus Linux leaf-switches

Just added another pair of Cumulus Linux leaf (top of rack) switches and more VMware compute nodes to our production system.

We use a Layer 2 network fabric, see my recent post about an example: Ansible Playbook for Cumulus Linux (Layer 2 Fabric)

Got a bit tight in the rack for the DA cables, basically each server has 2x 10Gbit/s data- and 2x 10Gbit/s storage-network connections.

Front rack view:

Another two racks will follow soon in the same configuration.

 

Cumulus Networks (Open Networking)

In my recent data centre network redesign project I used Cumulus Linux on Open Networking switches from Dell (S3048-ON and S4048-ON). I heard from Cumulus Networks around 2 1/2 years ago and did some testing with the VX appliance back then but I was waiting for the official release of the VRF feature. Ones I got the go ahead for the data centre redesign project, it was clear to replace Cisco and use Cumulus Linux on the all the data centre switches.

I was one of the first Cumulus users of finding a bug with VRF in Quagga because of extensive use of VRFs:

RN-518 (CM-13328) - Quagga sometimes installs a duplicate static route

After a Cumulus Linux switch with VRF routes installed was 
rebooted, the VRF routes were present in the kernel but 
not installed into hardware. Restarting Quagga resolved 
the issue.

This issue has been fixed in the update 
to the 3.1.2 repository.

Like in my latest post the idea was to use a converged network stack which needed some more advanced configuration on the switch itself with different VRFs to split the data centre into Corporate (office) and Production.

So far I have a very good experience with Cumulus Networks, the support is awesome and very skilled!