Using Cumulus NetQ fabric validation with Ansible

Here a new post about Cumulus NetQ, I build a small Ansible playbook to validate the state of MLAG within a Cumulus Linux fabric using automation.

In this case I use the command “netq check clag json” to check for nodes in failed or warning state. This example can be used when doing automated changes to MLAG and to validate the configuration afterwards, or as a pre-check before I execute the main playbook.

---
- hosts: spine leaf
  gather_facts: False
  user: cumulus

  tasks:
     - name: Gather Clag info in JSON
       command: netq check clag json
       register: result
       run_once: true
       failed_when: "'ERROR' in result.stdout"

     - name: stdout string into json
       set_fact: json_output="{{result.stdout | from_json }}"
       run_once: true

     - name: output of json_output variable
       debug:
         var: json_output
       run_once: true

     - name: check failed clag members
       debug: msg="Check failed clag members"
       when: json_output["failedNodes"]|length == 0
       run_once: true

     - name: clag members status failed
       fail: msg="Device {{item['node']}}, Why node is in failed state? {{item['reason']}}"
       with_items:  "{{json_output['failedNodes']}}"
       run_once: true

     - name: clag members status warning
       fail: msg="Device {{item['node']}}, Why node is in warning state? {{item['reason']}}"
       when: json_output["warningNodes"] is defined
       with_items:  "{{json_output['warningNodes']}}"
       run_once: true

Here the output when MLAG is healthy:

PLAY [spine leaf] *********************************************************************************************************************************************************************************************************************

TASK [Gather Clag info in JSON] *******************************************************************************************************************************************************************************************************
Friday 20 October 2017  17:56:35 +0200 (0:00:00.017)       0:00:00.017 ********
changed: [spine-1]

TASK [stdout string into json] ********************************************************************************************************************************************************************************************************
Friday 20 October 2017  17:56:35 +0200 (0:00:00.325)       0:00:00.343 ********
ok: [spine-1]

TASK [output of json_output variable] *************************************************************************************************************************************************************************************************
Friday 20 October 2017  17:56:35 +0200 (0:00:00.010)       0:00:00.353 ********
ok: [spine-1] => {
    "json_output": {
        "failedNodes": [],
        "summary": {
            "checkedNodeCount": 4,
            "failedNodeCount": 0,
            "warningNodeCount": 0
        }
    }
}

TASK [check failed clag members] ******************************************************************************************************************************************************************************************************
Friday 20 October 2017  17:56:35 +0200 (0:00:00.010)       0:00:00.363 ********
ok: [spine-1] => {
    "msg": "Check failed clag members"
}

TASK [clag members status failed] *****************************************************************************************************************************************************************************************************
Friday 20 October 2017  17:56:35 +0200 (0:00:00.011)       0:00:00.374 ********

TASK [clag members status warning] ****************************************************************************************************************************************************************************************************
Friday 20 October 2017  17:56:35 +0200 (0:00:00.007)       0:00:00.382 ********
skipping: [spine-1]

PLAY RECAP ****************************************************************************************************************************************************************************************************************************
spine-1                    : ok=4    changed=1    unreachable=0    failed=0

Friday 20 October 2017  17:56:35 +0200 (0:00:00.008)       0:00:00.391 ********
===============================================================================
Gather Clag info in JSON ------------------------------------------------ 0.33s
check failed clag members ----------------------------------------------- 0.01s
stdout string into json ------------------------------------------------- 0.01s
output of json_output variable ------------------------------------------ 0.01s
clag members status warning --------------------------------------------- 0.01s
clag members status failed ---------------------------------------------- 0.01s

In the following example leaf-1 node is in warning state because of a missing “clagd-backup-ip“, another warning could be also a single attached bond interface:

PLAY [spine leaf] *********************************************************************************************************************************************************************************************************************

TASK [Gather Clag info in JSON] *******************************************************************************************************************************************************************************************************
Friday 20 October 2017  18:02:05 +0200 (0:00:00.016)       0:00:00.016 ********
changed: [spine-1]

TASK [stdout string into json] ********************************************************************************************************************************************************************************************************
Friday 20 October 2017  18:02:05 +0200 (0:00:00.225)       0:00:00.241 ********
ok: [spine-1]

TASK [output of json_output variable] *************************************************************************************************************************************************************************************************
Friday 20 October 2017  18:02:05 +0200 (0:00:00.010)       0:00:00.251 ********
ok: [spine-1] => {
    "json_output": {
        "failedNodes": [],
        "summary": {
            "checkedNodeCount": 4,
            "failedNodeCount": 0,
            "warningNodeCount": 1
        },
        "warningNodes": [
            {
                "node": "leaf-1",
                "reason": "Backup IP Failed"
            }
        ]
    }
}

TASK [check failed clag members] ******************************************************************************************************************************************************************************************************
Friday 20 October 2017  18:02:05 +0200 (0:00:00.010)       0:00:00.261 ********
ok: [spine-1] => {
    "msg": "Check failed clag members"
}

TASK [clag members status failed] *****************************************************************************************************************************************************************************************************
Friday 20 October 2017  18:02:05 +0200 (0:00:00.011)       0:00:00.273 ********

TASK [clag members status warning] ****************************************************************************************************************************************************************************************************
Friday 20 October 2017  18:02:05 +0200 (0:00:00.007)       0:00:00.281 ********
failed: [spine-1] (item={u'node': u'leaf-1', u'reason': u'Backup IP Failed'}) => {"failed": true, "item": {"node": "leaf-1", "reason": "Backup IP Failed"}, "msg": "Device leaf-1, Why node is in warning state? Backup IP Failed"}

NO MORE HOSTS LEFT ********************************************************************************************************************************************************************************************************************
	to retry, use: --limit @/home/berndonline/cumulus-lab-vagrant/netq_check_clag.retry

PLAY RECAP ****************************************************************************************************************************************************************************************************************************
spine-1                    : ok=4    changed=1    unreachable=0    failed=1

Friday 20 October 2017  18:02:05 +0200 (0:00:00.015)       0:00:00.297 ********
===============================================================================
Gather Clag info in JSON ------------------------------------------------ 0.23s
clag members status warning --------------------------------------------- 0.02s
check failed clag members ----------------------------------------------- 0.01s
output of json_output variable ------------------------------------------ 0.01s
stdout string into json ------------------------------------------------- 0.01s
clag members status failed ---------------------------------------------- 0.01s

Another example is that NetQ reports about a problem that leaf-1 has no matching clagid on peer, in this case on leaf-2 the interface bond1 is missing in the configuration:

PLAY [spine leaf] ***********************************************************************************************************************************************************************************************************************

TASK [Gather Clag info in JSON] *********************************************************************************************************************************************************************************************************
Monday 23 October 2017  18:49:15 +0200 (0:00:00.016)       0:00:00.016 ********
changed: [spine-1]

TASK [stdout string into json] **********************************************************************************************************************************************************************************************************
Monday 23 October 2017  18:49:15 +0200 (0:00:00.223)       0:00:00.240 ********
ok: [spine-1]

TASK [output of json_output variable] ***************************************************************************************************************************************************************************************************
Monday 23 October 2017  18:49:15 +0200 (0:00:00.010)       0:00:00.250 ********
ok: [spine-1] => {
    "json_output": {
        "failedNodes": [
            {
                "node": "leaf-1",
                "reason": "Conflicted Bonds: bond1:matching clagid not configured on peer"
            }
        ],
        "summary": {
            "checkedNodeCount": 4,
            "failedNodeCount": 1,
            "warningNodeCount": 1
        },
        "warningNodes": [
            {
                "node": "leaf-1",
                "reason": "Singly Attached Bonds: bond1"
            }
        ]
    }
}

TASK [check failed clag members] ********************************************************************************************************************************************************************************************************
Monday 23 October 2017  18:49:15 +0200 (0:00:00.010)       0:00:00.260 ********
skipping: [spine-1]

TASK [clag members status failed] *******************************************************************************************************************************************************************************************************
Monday 23 October 2017  18:49:15 +0200 (0:00:00.009)       0:00:00.269 ********
failed: [spine-1] (item={u'node': u'leaf-1', u'reason': u'Conflicted Bonds: bond1:matching clagid not configured on peer'}) => {"failed": true, "item": {"node": "leaf-1", "reason": "Conflicted Bonds: bond1:matching clagid not configured on peer"}, "msg": "Device leaf-1, Why node is in failed state? Conflicted Bonds: bond1:matching clagid not configured on peer"}

NO MORE HOSTS LEFT **********************************************************************************************************************************************************************************************************************
	to retry, use: --limit @/home/berndonline/cumulus-lab-vagrant/netq_check_clag.retry

PLAY RECAP ******************************************************************************************************************************************************************************************************************************
spine-1                    : ok=3    changed=1    unreachable=0    failed=1

Monday 23 October 2017  18:49:15 +0200 (0:00:00.014)       0:00:00.284 ********
===============================================================================
Gather Clag info in JSON ------------------------------------------------ 0.22s
clag members status failed ---------------------------------------------- 0.02s
stdout string into json ------------------------------------------------- 0.01s
output of json_output variable ------------------------------------------ 0.01s
check failed clag members ----------------------------------------------- 0.01s

This is just an example to show what possibilities I have with Cumulus NetQ when I use automation to validate my changes.

There are some information in the Cumulus NetQ documentation about, taking preventive steps with your network: https://docs.cumulusnetworks.com/display/NETQ/Taking+Preventative+Steps+with+Your+Network

Continuous Integration and Delivery for Networking with Cumulus Linux

Continuous Integration – Continuous Delivery (CICD) is becoming more and more popular for network automation but the problem is how to validate your scripts and stage the configuration because you don’t want to deploy untested code to a production system. Especially in networking that could be pretty destructive if you made a mistake which could cause a loss in connectivity.

I spend some days working on a Cumulus Linux lab using Vagrant which I use to stage configuration. You find the basic Ansible playbook and the gitlab-ci configuration for the Cumulus lab in my Github repo: cumulus-lab-provision

For the continuous integration and delivery (CI/CD) pipeline I am using Gitlab.com and their Gitlab-runner which is running on my server. I will not get into too much detail what is needed on the server, basically it runs vargant, libvirt (kvm), virtualbox, ansible and the gitlab-runner.

  • You need to register your Gitlab-runner with the Gitlab repository.

  • The next step is to create your .gitlab-ci.yml which defines your CI-pipeline.
---
stages:
    - validate ansible
    - staging
    - production
validate:
    stage: validate ansible
    script:
        - bash ./linter.sh
staging:
    before_script:
        - git clone https://github.com/berndonline/cumulus-lab-vagrant.git
        - cd cumulus-lab-vagrant/
        - python ./topology_converter.py ./topology-staging.dot
          -p libvirt --ansible-hostfile
    stage: staging
    script:
        - bash ../staging.sh
production:
    before_script:
        - git clone https://github.com/berndonline/cumulus-lab-vagrant.git
        - cd cumulus-lab-vagrant/
        - python ./topology_converter.py ./topology-production.dot
          -p libvirt --ansible-hostfile
    stage: production
    when: manual
    script:
        - bash ../production.sh
    only:
        - master

In the gitlab-ci you see that I clone the cumulus vagrant lab which I use to spin-up a virtual staging environment and run the Ansible playbook against the virtual lab. The production stage is in my example also a vagrant environment because I had no physical switches for testing.

  • Basically any commit or merge in the Gitlab repo triggers the pipeline which I define in the gitlab-ci.

  • You can see the details in the running job. The first stage is only to validate that the YAML files have the correct syntax.

  • Here the details of the running job of staging and when everything goes well the job succeeded.

  • The last stage is production which needs to be triggered manually.

  • After the changes run through all defined stages you see that you successfully validate, staged and deployed your configuration to a cumulus production system.

This is a complete different way of working for a network engineer but the way it goes in fully automated datacenter network environments. It gets very powerful when you combine this with the Cumulus NetQ server to validate the state of your switch fabric after you run changes in production.

The next topic I am working on, is using Cumulus NetQ to validate configuration changes.

Here again my two repositories I use:

https://github.com/berndonline/cumulus-lab-vagrant

https://github.com/berndonline/cumulus-lab-provision

Read my new posts about an Ansible Playbook for Cumulus Linux BGP IP-Fabric and Cumulus NetQ Validation and BGP EVPN and VXLAN with Cumulus Linux.

Cumulus Linux network simulation using Vagrant

I was using GNS3 for quite some time but it was not very flexible if you quickly wanted to test something and even more complicated if you used a different computer or shared your projects.

I spend some time with Vagrant to build a virtual Cumulus Linux lab environment which can run basically on every computer. Simulating network environments is the future when you want to test and validate your automation scripts.

My lab diagram:

I created different topology.dot files and used the Cumulus topology converter on Github to create my lab with Virtualbox or Libvirt (KVM). I did some modification to the initialise scripts for the switches and the management server. Everything you find in my Github repo https://github.com/berndonline/cumulus-lab-vagrant.

The topology file basically defines your network and the converter creates the Vagrantfile.

In the management topology file you have all servers (incl. management) like in the network diagram above. The Cumulus switches you can only access via the management server.

Very similar to the topology-mgmt.dot but in this one the management server is running Cumulus NetQ which you need to first import into your Vagrant. Here the link to the Cumulus NetQ demo on Github.

In this topology file you find a basic staging lab without servers where you can access the Cumulus switches directly via their Vagrant IP. I mainly use this to quickly test something like updating Cumulus switches or validating Ansible playbooks.

In this topology file you find a basic production lab where you can access the Cumulus switches directly via their Vagrant IP and have Cumulus NetQ as management server.

Basically to convert a topology into a Vagrantfile you just need to run the following command:

python topology_converter.py topology-staging.dot -p libvirt --ansible-hostfile

I use KVM in my example and want that Vagrant creates an Ansible inventory file and run playbooks directly agains the switches.

Check the status of the vagrant environment:

[email protected]:~/cumulus-lab-vagrant$ vagrant status
Current machine states:

spine-1                   not created (libvirt)
spine-2                   not created (libvirt)
leaf-1                    not created (libvirt)
leaf-3                    not created (libvirt)
leaf-2                    not created (libvirt)
leaf-4                    not created (libvirt)
mgmt-1                    not created (libvirt)
edge-2                    not created (libvirt)
edge-1                    not created (libvirt)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.
[email protected]:~/cumulus-lab-vagrant$

To start the devices run:

vagrant up

If you use the topology files with management server you need to start first the management server and then the management switch before you boot the rest of the switches:

vagrant up mgmt-server
vagrant up mgmt-1
vagrant up

The switches will pull some part of their configuration from the management server.

Output if you start the environment:

[email protected]:~/cumulus-lab-vagrant$ vagrant up spine-1
Bringing machine 'spine-1' up with 'libvirt' provider...
==> spine-1: Creating image (snapshot of base box volume).
==> spine-1: Creating domain with the following settings...
==> spine-1:  -- Name:              cumulus-lab-vagrant_spine-1
==> spine-1:  -- Domain type:       kvm
==> spine-1:  -- Cpus:              1
==> spine-1:  -- Feature:           acpi
==> spine-1:  -- Feature:           apic
==> spine-1:  -- Feature:           pae
==> spine-1:  -- Memory:            512M
==> spine-1:  -- Management MAC:
==> spine-1:  -- Loader:
==> spine-1:  -- Base box:          CumulusCommunity/cumulus-vx
==> spine-1:  -- Storage pool:      default
==> spine-1:  -- Image:             /var/lib/libvirt/images/cumulus-lab-vagrant_spine-1.img (4G)
==> spine-1:  -- Volume Cache:      default
==> spine-1:  -- Kernel:
==> spine-1:  -- Initrd:
==> spine-1:  -- Graphics Type:     vnc
==> spine-1:  -- Graphics Port:     5900
==> spine-1:  -- Graphics IP:       127.0.0.1
==> spine-1:  -- Graphics Password: Not defined
==> spine-1:  -- Video Type:        cirrus
==> spine-1:  -- Video VRAM:        9216
==> spine-1:  -- Sound Type:
==> spine-1:  -- Keymap:            en-us
==> spine-1:  -- TPM Path:
==> spine-1:  -- INPUT:             type=mouse, bus=ps2
==> spine-1: Creating shared folders metadata...
==> spine-1: Starting domain.
==> spine-1: Waiting for domain to get an IP address...
==> spine-1: Waiting for SSH to become available...
    spine-1:
    spine-1: Vagrant insecure key detected. Vagrant will automatically replace
    spine-1: this with a newly generated keypair for better security.
    spine-1:
    spine-1: Inserting generated public key within guest...
    spine-1: Removing insecure key from the guest if it's present...
    spine-1: Key inserted! Disconnecting and reconnecting using new SSH key...
==> spine-1: Setting hostname...
==> spine-1: Configuring and enabling network interfaces...
....
==> spine-1: #################################
==> spine-1:   Running Switch Post Config (config_vagrant_switch.sh)
==> spine-1: #################################
==> spine-1:  ###Creating SSH keys for cumulus user ###
==> spine-1: #################################
==> spine-1:    Finished
==> spine-1: #################################
==> spine-1: Running provisioner: shell...
    spine-1: Running: inline script
==> spine-1: Running provisioner: shell...
    spine-1: Running: inline script
==> spine-1:   INFO: Adding UDEV Rule: a0:00:00:00:00:21 --> eth0
==> spine-1: Running provisioner: shell...
    spine-1: Running: inline script
==> spine-1:   INFO: Adding UDEV Rule: 44:38:39:00:00:30 --> swp1
==> spine-1: Running provisioner: shell...
    spine-1: Running: inline script
==> spine-1:   INFO: Adding UDEV Rule: 44:38:39:00:00:04 --> swp2
==> spine-1: Running provisioner: shell...
    spine-1: Running: inline script
==> spine-1:   INFO: Adding UDEV Rule: 44:38:39:00:00:26 --> swp3
==> spine-1: Running provisioner: shell...
    spine-1: Running: inline script
==> spine-1:   INFO: Adding UDEV Rule: 44:38:39:00:00:0a --> swp4
==> spine-1: Running provisioner: shell...
    spine-1: Running: inline script
==> spine-1:   INFO: Adding UDEV Rule: 44:38:39:00:00:22 --> swp51
==> spine-1: Running provisioner: shell...
    spine-1: Running: inline script
==> spine-1:   INFO: Adding UDEV Rule: 44:38:39:00:00:0d --> swp52
==> spine-1: Running provisioner: shell...
    spine-1: Running: inline script
==> spine-1:   INFO: Adding UDEV Rule: 44:38:39:00:00:10 --> swp53
==> spine-1: Running provisioner: shell...
    spine-1: Running: inline script
==> spine-1:   INFO: Adding UDEV Rule: 44:38:39:00:00:23 --> swp54
==> spine-1: Running provisioner: shell...
    spine-1: Running: inline script
==> spine-1:   INFO: Adding UDEV Rule: Vagrant interface = eth1
==> spine-1: #### UDEV Rules (/etc/udev/rules.d/70-persistent-net.rules) ####
==> spine-1: ACTION=="add", SUBSYSTEM=="net", ATTR{address}=="a0:00:00:00:00:21", NAME="eth0", SUBSYSTEMS=="pci"
==> spine-1: ACTION=="add", SUBSYSTEM=="net", ATTR{address}=="44:38:39:00:00:30", NAME="swp1", SUBSYSTEMS=="pci"
==> spine-1: ACTION=="add", SUBSYSTEM=="net", ATTR{address}=="44:38:39:00:00:04", NAME="swp2", SUBSYSTEMS=="pci"
==> spine-1: ACTION=="add", SUBSYSTEM=="net", ATTR{address}=="44:38:39:00:00:26", NAME="swp3", SUBSYSTEMS=="pci"
==> spine-1: ACTION=="add", SUBSYSTEM=="net", ATTR{address}=="44:38:39:00:00:0a", NAME="swp4", SUBSYSTEMS=="pci"
==> spine-1: ACTION=="add", SUBSYSTEM=="net", ATTR{address}=="44:38:39:00:00:22", NAME="swp51", SUBSYSTEMS=="pci"
==> spine-1: ACTION=="add", SUBSYSTEM=="net", ATTR{address}=="44:38:39:00:00:0d", NAME="swp52", SUBSYSTEMS=="pci"
==> spine-1: ACTION=="add", SUBSYSTEM=="net", ATTR{address}=="44:38:39:00:00:10", NAME="swp53", SUBSYSTEMS=="pci"
==> spine-1: ACTION=="add", SUBSYSTEM=="net", ATTR{address}=="44:38:39:00:00:23", NAME="swp54", SUBSYSTEMS=="pci"
==> spine-1: ACTION=="add", SUBSYSTEM=="net", ATTR{ifindex}=="2", NAME="eth1", SUBSYSTEMS=="pci"
==> spine-1: Running provisioner: shell...
    spine-1: Running: inline script
==> spine-1: ### RUNNING CUMULUS EXTRA CONFIG ###
==> spine-1:   INFO: Detected a 3.x Based Release
==> spine-1: ### Disabling default remap on Cumulus VX...
==> spine-1: ### Disabling ZTP service...
==> spine-1: Removed symlink /etc/systemd/system/multi-user.target.wants/ztp.service.
==> spine-1: ### Resetting ZTP to work next boot...
==> spine-1: Created symlink from /etc/systemd/system/multi-user.target.wants/ztp.service to /lib/systemd/system/ztp.service.
==> spine-1:   INFO: Detected Cumulus Linux v3.3.2 Release
==> spine-1: ### Fixing ONIE DHCP to avoid Vagrant Interface ###
==> spine-1:      Note: Installing from ONIE will undo these changes.
==> spine-1: ### Giving Vagrant User Ability to Run NCLU Commands ###
==> spine-1: ### DONE ###
==> spine-1: ### Rebooting Device to Apply Remap...

At the end you are able to connect to the Cumulus switch:

[email protected]:~/cumulus-lab-vagrant$ vagrant ssh spine-1

Welcome to Cumulus VX (TM)

Cumulus VX (TM) is a community supported virtual appliance designed for
experiencing, testing and prototyping Cumulus Networks' latest technology.
For any questions or technical support, visit our community site at:
http://community.cumulusnetworks.com

The registered trademark Linux (R) is used pursuant to a sublicense from LMI,
the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide
basis.
[email protected]:~$

To destroy the Vagrant environment:

[email protected]:~/cumulus-lab-vagrant$ vagrant destroy spine-1
==> spine-2: Remove stale volume...
==> spine-2: Domain is not created. Please run `vagrant up` first.
==> spine-1: Removing domain...

My goal is to adopt some NetDevOps practice and use this in networking = NetOps, currently working on an Continuous Integration and Delivery (CI/CD) pipeline for Cumulus Linux network environments. The Vagrant lab was one of the prerequisites to simulate the changes before deploying this to production but more will follow in my next blog post.

Read my new post about an Ansible Playbook for Cumulus Linux BGP IP-Fabric and Cumulus NetQ Validation.

Ansible Playbook for Cumulus NetQ Agent Installation

Here a short Ansible script to install the Cumulus NetQ agent on Cumulus Linux switches.

---
- hosts: spine leaf
  remote_user: cumulus
  gather_facts: no
  become: yes
  vars:
    ansible_become_pass: "CumulusLinux!"
  tasks:
    - name: Install cumulus-netq
      apt: name=cumulus-netq update_cache=yes state=present
      register: result

    - name: Restart Syslog service
      service: name=rsyslog state=restarted
      when: result.stdout is defined

    - pause: seconds=5

    - name: Add netq server IP addr
      command: netq config add server 192.168.100.133
      when: result.stdout is defined

    - name: Start netq-agent
      service: name=netq-agent state=restarted
      when: result.stdout is defined

Your NetQ VM needs to be reachable from the switches otherwise the command “netq add server…” will fail.

You find more information in the official Cumulus NetQ documentation:  https://docs.cumulusnetworks.com/display/NETQ/Getting+Started+with+NetQ

Ansible Playbook for Cumulus Linux (Layer 3 Fabric)

Like promised, here a basic Ansible Playbook for a Cumulus Linux Layer 3 Fabric running BGP which you see in large-scale data centre deployments.

You push the layer 2 network as close as possible to the server and use ECMP (Equal-cost multi-path) routing to distribute your traffic via multiple uplinks.

These kind of network designs are highly scalable and in my example a 2-Tier deployment but you can easily use 3-Tiers where the Leaf switches become the distribution layer and you add additional ToR (Top of Rack) switches.

Here some interesting information about Facebook’s next-generation data centre fabric: Introducing data center fabric, the next-generation Facebook data center network

I use the same hosts file like from my previous blog post Ansible Playbook for Cumulus Linux (Layer 2 Fabric)

Hosts file:

[spine]
spine-1
spine-2
[leaf]
leaf-1
leaf-2

 

Ansible Playbook:

---
- hosts: all
  remote_user: cumulus
  gather_facts: no
  become: yes
  vars:
    ansible_become_pass: "CumulusLinux!"
    spine_interfaces:
      - { port: swp1, desc: leaf-1, address: "{{ swp1_address}}" }
      - { port: swp2, desc: leaf-2, address: "{{ swp2_address}}" }
      - { port: swp6, desc: layer3_peerlink, address: "{{ peer_address}}" }
    leaf_interfaces:
      - { port: swp1, desc: spine-1, address: "{{ swp1_address}}" }
      - { port: swp2, desc: spine-2, address: "{{ swp2_address}}" }      
  handlers:
    - name: ifreload
      command: ifreload -a
    - name: restart quagga
      service: name=quagga state=restarted
  tasks:
    - name: deploys spine interface configuration
      template: src=templates/spine_routing_interfaces.j2 dest=/etc/network/interfaces
      when: "'spine' in group_names"
      notify: ifreload
    - name: deploys leaf interface configuration
      template: src=templates/leaf_routing_interfaces.j2 dest=/etc/network/interfaces
      when: "'leaf' in group_names"
      notify: ifreload
    - name: deploys quagga configuration
      template: src=templates/quagga.conf.j2 dest=/etc/quagga/Quagga.conf
      notify: restart quagga

Let’s run the Playbook and see the output:

[[email protected] cumulus]$ ansible-playbook routing.yml -i hosts

PLAY [all] *********************************************************************

TASK [deploys spine interface configuration] ***********************************
skipping: [leaf-2]
skipping: [leaf-1]
changed: [spine-2]
changed: [spine-1]

TASK [deploys leaf interface configuration] ************************************
skipping: [spine-1]
skipping: [spine-2]
changed: [leaf-2]
changed: [leaf-1]

TASK [deploys quagga configuration] ********************************************
changed: [leaf-2]
changed: [spine-2]
changed: [spine-1]
changed: [leaf-1]

RUNNING HANDLER [ifreload] *****************************************************
changed: [leaf-2]
changed: [leaf-1]
changed: [spine-2]
changed: [spine-1]

RUNNING HANDLER [restart quagga] ***********************************************
changed: [leaf-1]
changed: [leaf-2]
changed: [spine-1]
changed: [spine-2]

PLAY RECAP *********************************************************************
leaf-1                     : ok=4    changed=4    unreachable=0    failed=0
leaf-2                     : ok=4    changed=4    unreachable=0    failed=0
spine-1                    : ok=4    changed=4    unreachable=0    failed=0
spine-2                    : ok=4    changed=4    unreachable=0    failed=0

[[email protected] cumulus]$

To verify the configuration let’s look at the BGP routes on the leaf switches:

[email protected]:/home/cumulus# net show route bgp
RIB entry for bgp
=================
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, P - PIM, T - Table, v - VNC,
       V - VPN,
       > - selected route, * - FIB route

B>* 10.0.0.0/30 [20/0] via 10.0.1.1, swp1, 00:02:14
  *                    via 10.0.1.5, swp2, 00:02:14
B   10.0.1.0/30 [20/0] via 10.0.1.1 inactive, 00:02:14
                       via 10.0.1.5, swp2, 00:02:14
B   10.0.1.4/30 [20/0] via 10.0.1.5 inactive, 00:02:14
                       via 10.0.1.1, swp1, 00:02:14
B>* 10.0.2.0/30 [20/0] via 10.0.1.5, swp2, 00:02:14
  *                    via 10.0.1.1, swp1, 00:02:14
B>* 10.0.2.4/30 [20/0] via 10.0.1.1, swp1, 00:02:14
  *                    via 10.0.1.5, swp2, 00:02:14
B>* 10.200.0.0/24 [20/0] via 10.0.1.1, swp1, 00:02:14
  *                      via 10.0.1.5, swp2, 00:02:14
[email protected]:/home/cumulus#
[email protected]:/home/cumulus# net show route bgp
RIB entry for bgp
=================
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, P - PIM, T - Table, v - VNC,
       V - VPN,
       > - selected route, * - FIB route

B>* 10.0.0.0/30 [20/0] via 10.0.2.5, swp1, 00:02:22
  *                    via 10.0.2.1, swp2, 00:02:22
B>* 10.0.1.0/30 [20/0] via 10.0.2.5, swp1, 00:02:22
  *                    via 10.0.2.1, swp2, 00:02:22
B>* 10.0.1.4/30 [20/0] via 10.0.2.1, swp2, 00:02:22
  *                    via 10.0.2.5, swp1, 00:02:22
B   10.0.2.0/30 [20/0] via 10.0.2.1 inactive, 00:02:22
                       via 10.0.2.5, swp1, 00:02:22
B   10.0.2.4/30 [20/0] via 10.0.2.5 inactive, 00:02:22
                       via 10.0.2.1, swp2, 00:02:22
B>* 10.100.0.0/24 [20/0] via 10.0.2.5, swp1, 00:02:22
  *                      via 10.0.2.1, swp2, 00:02:22
[email protected]:/home/cumulus#

Have fun!

Read my new post about an Ansible Playbook for Cumulus Linux BGP IP-Fabric and Cumulus NetQ Validation.