Internet Edge and WAN Routing with Cumulus Linux

With this article I wanted to focus on something different than the usual spine and leaf topology and talk about datacenter edge routing.

I was using Cisco routers for many years for Internet Edge and WAN connectivity. The problem with using a vendor like Cisco is the price tag you have to pay and there still might a reason for it to spend the money. But nowadays you get leased-lines handed over as normal Ethernet connection and using a dedicated routers maybe not always necessary if you are not getting too crazy with BGP routing or quality of service.

I was experimenting over the last weeks if I could use a Cumulus Linux switch as an Internet Edge and Wide Area Network router with running different VRFs for internet and WAN connectivity. I came up with the following edge network layout you see below:

For this network, I build an Vagrant topology with Cumulus VX to simulate the edge routing and being able to test the connectivity. Below you see a more detailed view of the Vagrant topology:

Everything is running on Cumulus VX even the firewalls because I just wanted to simulate the traffic flow and see if the network communication is functioning. Also having separate WAN switches might be useful because 1Gbit/s switches are cheaper then 40Gbit/s switches and you need additional SFP for 1Gbit/s connections, another point is to separate your layer 2 WAN connectivity from your internal datacenter network.

Here the assigned IP addresses for this lab:

wan-1 VLAN801 PIP: VIP:
wan-2 VLAN801 PIP: VIP:
wan-1 VLAN802 PIP: 
wan-2 VLAN802 PIP:
wan-1 VLAN904 PIP: VIP:
wan-2 VLAN904 PIP: VIP:
fw-1 VLAN904 PIP:
wan-1 VLAN903 PIP: VIP:
wan-2 VLAN903 PIP: VIP:
fw-2 VLAN903 PIP:
edge-1 VLAN901 PIP: VIP:
edge-2 VLAN901 PIP: VIP:
fw-1 VLAN901 PIP:
fw-2 VLAN901 PIP:
edge-1 VLAN902 PIP: VIP:
edge-2 VLAN902 PIP: VIP:
fw-1 VLAN902 PIP:

You can find the Github repository for the Vagrant topology here:

[email protected]:~/cumulus-edge-vagrant$ vagrant status
Current machine states:

fw-2                      running (libvirt)
fw-1                      running (libvirt)
mgmt-1                    running (libvirt)
edge-2                    running (libvirt)
edge-1                    running (libvirt)
wan-1                     running (libvirt)
wan-2                     running (libvirt)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.
[email protected]:~/cumulus-edge-vagrant$

I wrote as well an Ansible Playbook to deploy the initial configuration which you can find here:

Let’s execute the playbook:

[email protected]:~/cumulus-edge-vagrant$ ansible-playbook ../cumulus-edge-provision/site.yml

PLAY [edge] ********************************************************************************************************************************************************

TASK [switchgroups : create switch groups based on clag_pairs] *****************************************************************************************************
skipping: [edge-2] => (item=(u'wan', [u'wan-1', u'wan-2']))
skipping: [edge-1] => (item=(u'wan', [u'wan-1', u'wan-2']))
ok: [edge-2] => (item=(u'edge', [u'edge-1', u'edge-2']))
ok: [wan-1] => (item=(u'wan', [u'wan-1', u'wan-2']))
skipping: [wan-1] => (item=(u'edge', [u'edge-1', u'edge-2']))
ok: [edge-1] => (item=(u'edge', [u'edge-1', u'edge-2']))
ok: [wan-2] => (item=(u'wan', [u'wan-1', u'wan-2']))
skipping: [wan-2] => (item=(u'edge', [u'edge-1', u'edge-2']))

TASK [switchgroups : include switch group variables] ***************************************************************************************************************
skipping: [edge-2] => (item=(u'wan', [u'wan-1', u'wan-2']))
skipping: [edge-1] => (item=(u'wan', [u'wan-1', u'wan-2']))
ok: [wan-1] => (item=(u'wan', [u'wan-1', u'wan-2']))
skipping: [wan-1] => (item=(u'edge', [u'edge-1', u'edge-2']))
ok: [wan-2] => (item=(u'wan', [u'wan-1', u'wan-2']))
skipping: [wan-2] => (item=(u'edge', [u'edge-1', u'edge-2']))
ok: [edge-2] => (item=(u'edge', [u'edge-1', u'edge-2']))
ok: [edge-1] => (item=(u'edge', [u'edge-1', u'edge-2']))


RUNNING HANDLER [interfaces : reload networking] *******************************************************************************************************************
changed: [edge-2] => (item=ifreload -a)
changed: [edge-1] => (item=ifreload -a)
changed: [wan-1] => (item=ifreload -a)
changed: [wan-2] => (item=ifreload -a)
changed: [edge-2] => (item=sleep 10)
changed: [edge-1] => (item=sleep 10)
changed: [wan-2] => (item=sleep 10)
changed: [wan-1] => (item=sleep 10)

RUNNING HANDLER [routing : reload frr] *****************************************************************************************************************************
changed: [edge-2]
changed: [wan-1]
changed: [wan-2]
changed: [edge-1]

RUNNING HANDLER [ptm : restart ptmd] *******************************************************************************************************************************
changed: [edge-2]
changed: [edge-1]
changed: [wan-2]
changed: [wan-1]

RUNNING HANDLER [ntp : restart ntp] ********************************************************************************************************************************
changed: [wan-1]
changed: [edge-1]
changed: [wan-2]
changed: [edge-2]

RUNNING HANDLER [ifplugd : restart ifplugd] ************************************************************************************************************************
changed: [edge-1]
changed: [wan-1]
changed: [edge-2]
changed: [wan-2]

PLAY RECAP *********************************************************************************************************************************************************
edge-1                     : ok=21   changed=17   unreachable=0    failed=0
edge-2                     : ok=21   changed=17   unreachable=0    failed=0
wan-1                      : ok=21   changed=17   unreachable=0    failed=0
wan-2                      : ok=21   changed=17   unreachable=0    failed=0

[email protected]:~/cumulus-edge-vagrant$

At last but not least I wrote a simple Ansible Playbook for connectivity testing using ping what you can find here:

[email protected]:~/cumulus-edge-vagrant$ ansible-playbook ../cumulus-edge-provision/check_icmp.yml

PLAY [exit edge] *********************************************************************************************************************************************************************************************************************

TASK [connectivity check from frontend firewall] *************************************************************************************************************************************************************************************
skipping: [fw-2] => (item=
skipping: [fw-2] => (item=
skipping: [fw-2] => (item=
skipping: [fw-2] => (item=
skipping: [edge-2] => (item=
skipping: [edge-2] => (item=
skipping: [edge-2] => (item=
skipping: [edge-1] => (item=
skipping: [edge-2] => (item=
skipping: [edge-1] => (item=
skipping: [edge-1] => (item=
skipping: [wan-1] => (item=
skipping: [edge-1] => (item=
skipping: [wan-1] => (item=
skipping: [wan-1] => (item=
skipping: [wan-1] => (item=
skipping: [wan-2] => (item=
skipping: [wan-2] => (item=
skipping: [wan-2] => (item=
skipping: [wan-2] => (item=
changed: [fw-1] => (item=
changed: [fw-1] => (item=
changed: [fw-1] => (item=
changed: [fw-1] => (item=
PLAY RECAP ***************************************************************************************************************************************************************************************************************************
edge-1                     : ok=2    changed=2    unreachable=0    failed=0
edge-2                     : ok=2    changed=2    unreachable=0    failed=0
fw-1                       : ok=1    changed=1    unreachable=0    failed=0
fw-2                       : ok=1    changed=1    unreachable=0    failed=0
wan-1                      : ok=2    changed=2    unreachable=0    failed=0
wan-2                      : ok=2    changed=2    unreachable=0    failed=0

[email protected]:~/cumulus-edge-vagrant$

The icmp check shows that in general the edge routing is working but I need to do some further testing with this if this can be used in a production environment.

If using switch hardware is not the right fit you can still install and use Free Range Routing (FRR) from Cumulus Networks on other Linux distributions and pick server hardware for your own custom edge router. I would only recommend checking Linux kernel support for VRF when choosing another Linux OS. Also have a look at my article about Open Source Routing GRE over IPSec with StrongSwan and Cisco IOS-XE where I build a Debian software router.

Please share your feedback and leave a comment.

Getting started with Jenkins for Network Automation

As I have mentioned my previous post about Getting started with Gitlab-CI for Network Automation, Jenkins is another continuous integration pipelining tool you can use for network automation. Have a look about how to install Jenkins:

To use the Jenkins with Vagrant and KVM (libvirt) there are a few changes needed on the linux server similar with the Gitlab-Runner. The Jenkins user account needs to be able to control KVM and you need to install the vagrant-libvirt plugin:

usermod -aG libvirtd jenkins
sudo su jenkins
vagrant plugin install vagrant-libvirt

Optional: you may need to copy custom Vagrant boxes into the users vagrant folder ‘/var/lib/jenkins/.vagrant.d/boxes/*’. Note that the Jenkins home directory is not located under /home.

Now lets start configuring a Jenkins CI-pipeline, click on ‘New item’:

This creates an empty pipeline where you need to add the different stages  of what needs to be executed:

Below is an example Jenkins pipeline script which is very similar to the Gitlab-CI pipeline I have used with my Cumulus Linux Lab in the past.

pipeline {
    agent any
    stages {
        stage('Clean and prep workspace') {
            steps {
                sh 'rm -r *'
                git ''
                sh 'git clone --origin master'
        stage('Validate Ansible') {
            steps {
                sh 'bash ./'
        stage('Staging') {
            steps {
                sh 'cd ./cumulus-lab-vagrant/ && ./'
                sh 'cd ./cumulus-lab-vagrant/ && bash ../'
        stage('Deploy production approval') {
            steps {
                input 'Deploy to prod?'
        stage('Production') {
            steps {
                sh 'cd ./cumulus-lab-vagrant/ && ./'
                sh 'cd ./cumulus-lab-vagrant/ && bash ../'

Let’s run the build pipeline:

The stages get executed one by one and, as you can see below, the production stage has an manual approval build-in that nothing gets deployed to production without someone to approve before, for a controlled production deployment:

Finished pipeline:

This is just a simple example of a network automation pipeline, this can of course be more complex if needed. It should just help you a bit on how to start using Jenkins for network automation.

Please share your feedback and leave a comment.

Ansible Automation with Cisco ASA Multi-Context Mode

I thought I’d share my experience using Ansible and Cisco ASA firewalls in multi-context mode. Right from the beginning I had a few issues deploying the configuration and the switch between the different security context didn’t work well. I got the error you see below when I tried to run a playbook. Other times the changeto context didn’t work well and applied the wrong config:

[email protected]:~$ ansible-playbook -i inventory site.yml --ask-vault-pass
Vault password:

PLAY [all] ***************************************************************************************************************************************************************************

TASK [hostname : set dns and hostname] ***********************************************************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: error: [Errno 61] Connection refused
fatal: [fwcontext01]: FAILED! => {"changed": false, "err": "[Errno 61] Connection refused", "msg": "unable to connect to socket"}
ok: [fwcontext02]

TASK [interfaces : write interfaces config] ******************************************************************************************************************************************
ok: [fwcontext02]


After a bit of troubleshooting I found a workaround to limit the amount of processes Ansible use and set this limit to one in the Ansible.cfg. The default is five processes if forks is not defined as far as I remember.

inventory = ./inventory
forks = 1

In the example inventory file, the “inventory_hostname” variable represents the security context and as you see the “ansible_ssh_host” is set to the IP address of the admin context:

fwcontext01 ansible_ssh_host= ansible_ssh_port=22 ansible_ssh_user='ansible' ansible_ssh_pass='cisco'
fwcontext02 ansible_ssh_host= ansible_ssh_port=22 ansible_ssh_user='ansible' ansible_ssh_pass='cisco'

When you run the playbook again you can see that the playbook runs successfully but deploys the changes one by one to each firewall security context, the disadvantage is that the playbook takes much longer to run:

[email protected]:~$ ansible-playbook site.yml

PLAY [all] ***************************************************************************************************************************************************************************

TASK [hostname : set dns and hostname] ***********************************************************************************************************************************************
ok: [fwcontext01]
ok: [fwcontext02]

TASK [interfaces : write interfaces config] ******************************************************************************************************************************************
ok: [fwcontext01]
ok: [fwcontext02]

Example site.yml


- hosts: all
  connection: local
  gather_facts: 'no'

      username: "{{ ansible_ssh_user }}"
      password: "{{ ansible_ssh_pass }}"
      host: "{{ ansible_ssh_host }}"

    - interfaces

In the example Interface role you see that the context is set to “inventory_hostname” variable:


- name: write interfaces config
    src: "templates/interfaces.j2"
    provider: "{{ cli }}"
    context: "{{ inventory_hostname }}"
  register: result

- name: enable interfaces
    parents: "interface {{ item.0 }}"
    lines: "no shutdown"
    match: none
    provider: "{{ cli }}"
    context: "{{ inventory_hostname }}"
  when: result.changed
    - "{{ interfaces.items() }}"

After modifying the forks, the Ansible playbook runs well with Cisco ASA in multi-context mode, like mentioned before it is a bit slow to deploy the configuration if I compare this to Cumulus Linux or any other Linux system.

Please share your feedback.

Leave a comment

Deploying OpenShift Origin Cluster using Ansible

Something completely different to my more network related posts, this time it is about Platform as a Service with OpenShift Origin. There is a big push for containerized platform services from development.

I was testing the official OpenShift Origin Ansible Playbook to install a small 5 node cluster and created an OpenShift Vagrant environment for this.

Cluster overview:

I recommend having a look at the official RedHat OpenShift documentation to understand the architecture because it is quite a complex platform.

As a pre-requisite, you need to install the vagrant hostmanager because Openshift needs to resolve hostnames and I don’t want to install a separate DNS server. Here you find more information:

vagrant plugin install vagrant-hostmanager

sudo bash -c 'cat << EOF > /etc/sudoers.d/vagrant_hostmanager2
Cmnd_Alias VAGRANT_HOSTMANAGER_UPDATE = /bin/cp <your-home-folder>/.vagrant.d/tmp/hosts.local /etc/hosts

Next, clone my Vagrant repository and the official OpenShift Origin ansible:

git clone [email protected]:berndonline/openshift-origin-vagrant.git
git clone [email protected]:openshift/openshift-ansible.git

Let’s start first by booting the OpenShift vagrant environment:

cd openshift-origin-vagrant/

The vagrant host manager will update dynamically the /etc/hosts file on both the Guest and the Host machine:

## vagrant-hostmanager-start id: 55ed9acf-25e9-4b19-bfab-e0812a292dc0	origin-master	origin-etcd	origin-infra	origin-node-1	origin-node-2

## vagrant-hostmanager-end

Let’s have a quick look at the OpenShift inventory file. This has settings for the different node types and custom OpenShift and Vagrant variables. You need to modify a few things like public hostname and default subdomain:




# use htpasswd authentication with demo/demo
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
openshift_master_htpasswd_users={'demo': '$apr1$.MaA77kd$Rlnn6RXq9kCjnEfh5I3w/.'}

# put the router on dedicated infra node

# put the image registry on dedicated infra node

# project pods should be placed on primary nodes

# Vagrant variables

origin-master  openshift_public_hostname=""


# master needs to be included in the node to be configured in the SDN
origin-infra openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
origin-node-[1:2] openshift_node_labels="{'region': 'primary', 'zone': 'default'}"

Now that we are ready, we need to check out the latest release and execute the Ansible Playbook:

cd openshift-ansible/
git checkout release-3.7
ansible-playbook ./playbooks/byo/config.yml -i ../openshift-origin-vagrant/inventory

The playbook takes forever to run, so do something else for the next 10 to 15 mins.


PLAY RECAP **********************************************************************************************************************************************************
localhost                  : ok=13   changed=0    unreachable=0    failed=0
origin-etcd                : ok=147  changed=47   unreachable=0    failed=0
origin-infra               : ok=202  changed=61   unreachable=0    failed=0
origin-master              : ok=561  changed=224  unreachable=0    failed=0
origin-node                : ok=201  changed=61   unreachable=0    failed=0

INSTALLER STATUS ****************************************************************************************************************************************************
Initialization             : Complete
Health Check               : Complete
etcd Install               : Complete
Master Install             : Complete
Master Additional Install  : Complete
Node Install               : Complete
Hosted Install             : Complete
Service Catalog Install    : Complete

Sunday 21 January 2018  20:55:16 +0100 (0:00:00.011)       0:11:56.549 ********
etcd : Pull etcd container ---------------------------------------------------------------------------------------------------------------------------------- 79.51s
openshift_hosted : Ensure OpenShift pod correctly rolls out (best-effort today) ----------------------------------------------------------------------------- 31.54s
openshift_node : Pre-pull node image when containerized ----------------------------------------------------------------------------------------------------- 31.28s
template_service_broker : Verify that TSB is running -------------------------------------------------------------------------------------------------------- 30.87s
docker : Install Docker ------------------------------------------------------------------------------------------------------------------------------------- 30.41s
docker : Install Docker ------------------------------------------------------------------------------------------------------------------------------------- 26.32s
openshift_cli : Pull CLI Image ------------------------------------------------------------------------------------------------------------------------------ 23.03s
openshift_service_catalog : wait for api server to be ready ------------------------------------------------------------------------------------------------- 21.32s
openshift_hosted : Ensure OpenShift pod correctly rolls out (best-effort today) ----------------------------------------------------------------------------- 16.27s
restart master api ------------------------------------------------------------------------------------------------------------------------------------------ 10.69s
restart master controllers ---------------------------------------------------------------------------------------------------------------------------------- 10.62s
openshift_node : Start and enable node ---------------------------------------------------------------------------------------------------------------------- 10.42s
openshift_node : Start and enable node ---------------------------------------------------------------------------------------------------------------------- 10.30s
openshift_master : Start and enable master api on first master ---------------------------------------------------------------------------------------------- 10.21s
openshift_master : Start and enable master controller service ----------------------------------------------------------------------------------------------- 10.19s
os_firewall : Install iptables packages --------------------------------------------------------------------------------------------------------------------- 10.15s
os_firewall : Wait 10 seconds after disabling firewalld ----------------------------------------------------------------------------------------------------- 10.07s
os_firewall : need to pause here, otherwise the iptables service starting can sometimes cause ssh to fail --------------------------------------------------- 10.05s
openshift_node : Pre-pull node image when containerized ------------------------------------------------------------------------------------------------------ 7.85s
openshift_service_catalog : oc_process ----------------------------------------------------------------------------------------------------------------------- 7.44s

To publish both the openshift_public_hostname and openshift_master_default_subdomain, I have a Nginx reverse proxy running and publish 8443 from the origin-master and 80, 443 from the origin-infra nodes.

Here a Nginx example:

server {
  listen 8443 ssl;
  listen [::]:8443 ssl;

  ssl on;
  ssl_certificate /etc/nginx/ssl/;
  ssl_certificate_key /etc/nginx/ssl/;

  access_log  /var/log/nginx/openshift-console_access.log;
  error_log   /var/log/nginx/openshift-console_error.log;

location / {
  proxy_http_version 1.1;
  proxy_set_header Upgrade $http_upgrade;
  proxy_set_header Connection 'upgrade';
  proxy_set_header Host $host;
  proxy_cache_bypass $http_upgrade;


I will try to write more about OpenShift and Platform as a Service and how to deploy small applications like WordPress.

Have fun testing OpenShift and please share your feedback.

Leave a comment

BGP EVPN and VXLAN with Cumulus Linux

I did some updates on my Cumulus Linux Vagrant topology and added new functions to my post about an Ansible Playbook for the Cumulus Linux BGP IP-Fabric.

To the Vagrant topology, I added 6x servers and per clag-pair, each server is connected to a VLAN and the second server is connected to a VXLAN.

Here are the links to the repositories where you find the Ansible Playbook and the Vagrantfile

In the Ansible Playbook, I added BGP EVPN and one VXLAN which spreads over all Leaf and Edge switches. VXLAN routing is happening on the Edge switches into the rest of the virtual data centre network.

Here is an example of the additional variables I added to edge-1 for BGP EVPN and VXLAN:



The VXLAN anycast IP is needed in BGP for EVPN and the same IP is shared between edge-1 and edge-2. The same is for the other leaf switches, per clag pair they share the same anycast IP address.




  asn: 65001
    - swp51
    - swp52
  evpn: true
  advertise_vni: true

  bond_slaves: swp53 swp54
  mtu: 9216
  vlan: 4094
  clagd_sys_mac: 44:38:39:FF:40:94
  clagd_priority: 4096

  ports: peerlink vxlan10201
  vids: 901 201

    alias: edge-transit-901
    vmac: 00:00:5e:00:09:01
    alias: prod-server-10201
    vmac: 00:00:00:00:02:01
    vlan_id: 201
    vlan_raw_device: bridge

    alias: prod-server-10201
    bridge_access: 201
    bridge_learning: 'off'
    bridge_arp_nd_suppress: 'on'

On the Edge switches, because of VXLAN routing, you find a mapping between VXLAN 10201 to VLAN 201 which has VRR running.

I needed to do some modifications to the interfaces template interfaces_config.j2:

{% if loopback is defined %}
auto lo
iface lo inet loopback
    address {{ loopback }}
{% if clagd_vxlan_anycast_ip is defined %}
    clagd-vxlan-anycast-ip {{ clagd_vxlan_anycast_ip }}
{% endif %}
{% endif %}
{% if bridge is defined %}
{% for vxlan_id, value in vxlans.items() %}
auto vxlan{{ vxlan_id }}
iface vxlan{{ vxlan_id }}
    alias {{ value.alias }}
    vxlan-id {{ vxlan_id }}
    vxlan-local-tunnelip {{ value.vxlan_local_tunnelip }}
    bridge-access {{ value.bridge_access }}
    bridge-learning {{ value.bridge_learning }}
    bridge-arp-nd-suppress {{ value.bridge_arp_nd_suppress }}
    mstpctl-bpduguard yes
    mstpctl-portbpdufilter yes

{% endfor %}
{% endif %}

There were also some modifications needed to the FRR template frr.j2 to add EVPN to the BGP configuration:

{% if bgp_fabric.evpn is defined %}
 address-family ipv6 unicast
  neighbor fabric activate
 address-family l2vpn evpn
  neighbor fabric activate
{% if bgp_fabric.advertise_vni is defined %}
{% endif %}
{% endif %}
{% endif %}

For more detailed information about EVPN and VXLAN routing on Cumulus Linux, I recommend reading the documentation Ethernet Virtual Private Network – EVPN and VXLAN Routing.

Have fun testing the new features in my Ansible Playbook and please share your feedback.

Leave a comment