Ansible Playbook for deploying AVI Controller nodes and Service Engines

After my first blog post about Software defined Load Balancing with AVI Networks, here is how to automatically deploy AVI controller and services engines via Ansible.

Here are the links to my repositories; AVI Vagrant environment: https://github.com/berndonline/avi-lab-vagrant and AVI Ansible Playbook: https://github.com/berndonline/avi-lab-provision

Make sure that your vagrant environment is running,

[email protected]:~/avi-lab-vagrant$ vagrant status
Current machine states:

avi-controller-1          running (libvirt)
avi-controller-2          running (libvirt)
avi-controller-3          running (libvirt)
avi-se-1                  running (libvirt)
avi-se-2                  running (libvirt)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.

I needed to modify the ansible.cfg to integrate a filter plugin:

[defaults]
inventory = ./.vagrant/provisioners/ansible/inventory/vagrant_ansible_inventory
host_key_checking=False

library = /home/berndonline/avi-lab-provision/lib
filter_plugins = /home/berndonline/avi-lab-provision/lib/filter_plugins

The controller installation is actually very simple and I got it from the official AVI ansible role they created, I added a second role to check ones the controller nodes are successfully booted:

---
- hosts: avi-controller
  user: '{{ ansible_ssh_user }}'
  gather_facts: "true"
  roles:
    - {role: ansible-role-avicontroller, become: true}
    - {role: avi-post-controller, become: false}

There’s one important thing to know before we run the playbook. When you have an AVI subscription you get custom container images with a predefined default password which makes it easier for you to do the cluster setup fully automated. You find the default password variable in group_vars/all.yml there you set as well if the password should be changed.

Let’s execute the ansible playbook, it takes a bit time for the three nodes to boot up:

[email protected]:~/avi-lab-vagrant$ ansible-playbook ../avi-lab-provision/playbooks/avi-controller-install.yml

PLAY [avi-controller] *********************************************************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************************************
ok: [avi-controller-3]
ok: [avi-controller-2]
ok: [avi-controller-1]

TASK [ansible-role-avicontroller : Avi Controller | Deployment] ***************************************************************************************************
included: /home/berndonline/avi-lab-provision/roles/ansible-role-avicontroller/tasks/docker/main.yml for avi-controller-1, avi-controller-2, avi-controller-3

TASK [ansible-role-avicontroller : Avi Controller | Services | systemd | Check if Avi Controller installed] *******************************************************
included: /home/berndonline/avi-lab-provision/roles/ansible-role-avicontroller/tasks/docker/services/systemd/check.yml for avi-controller-1, avi-controller-2, avi-controller-3

TASK [ansible-role-avicontroller : Avi Controller | Check if Avi Controller installed] ****************************************************************************
ok: [avi-controller-3]
ok: [avi-controller-2]
ok: [avi-controller-1]

TASK [ansible-role-avicontroller : Avi Controller | Services | init.d | Check if Avi Controller installed] ********************************************************
skipping: [avi-controller-1]
skipping: [avi-controller-2]
skipping: [avi-controller-3]

TASK [ansible-role-avicontroller : Avi Controller | Check minimum requirements] ***********************************************************************************
included: /home/berndonline/avi-lab-provision/roles/ansible-role-avicontroller/tasks/docker/requirements.yml for avi-controller-1, avi-controller-2, avi-controller-3

TASK [ansible-role-avicontroller : Avi Controller | Requirements | Check for docker] ******************************************************************************
ok: [avi-controller-2]
ok: [avi-controller-3]
ok: [avi-controller-1]

...

TASK [avi-post-controller : wait for cluster nodes up] ************************************************************************************************************
FAILED - RETRYING: wait for cluster nodes up (30 retries left).
FAILED - RETRYING: wait for cluster nodes up (30 retries left).
FAILED - RETRYING: wait for cluster nodes up (30 retries left).

...

FAILED - RETRYING: wait for cluster nodes up (7 retries left).
FAILED - RETRYING: wait for cluster nodes up (8 retries left).
FAILED - RETRYING: wait for cluster nodes up (7 retries left).
FAILED - RETRYING: wait for cluster nodes up (7 retries left).
ok: [avi-controller-2]
ok: [avi-controller-3]
ok: [avi-controller-1]

PLAY RECAP ********************************************************************************************************************************************************
avi-controller-1           : ok=36   changed=6    unreachable=0    failed=0
avi-controller-2           : ok=35   changed=5    unreachable=0    failed=0
avi-controller-3           : ok=35   changed=5    unreachable=0    failed=0

[email protected]:~/avi-lab-vagrant$

We are not finished yet and need to set basic settings like NTP and DNS, and need to configure the AVI three node controller cluster with another playbook:

---
- hosts: localhost
  connection: local
  roles:
    - {role: avi-cluster-setup, become: false}
    - {role: avi-change-password, become: false, when: avi_change_password == true}

The first role uses the REST API to do the configuration changes and requires the AVI ansible sdk role and for these reason it is very useful using the custom subscription images because you know the default password otherwise you need to modify the main setup.json file.

Let’s run the AVI cluster setup playbook:

[email protected]:~/avi-lab-vagrant$ ansible-playbook ../avi-lab-provision/playbooks/avi-cluster-setup.yml

PLAY [localhost] **************************************************************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************************************
ok: [localhost]

TASK [ansible-role-avisdk : Checking if avisdk python library is present] *****************************************************************************************
ok: [localhost] => {
    "msg": "Please make sure avisdk is installed via pip. 'pip install avisdk --upgrade'"
}

TASK [avi-cluster-setup : set AVI dns and ntp facts] **************************************************************************************************************
ok: [localhost]

TASK [avi-cluster-setup : set AVI cluster facts] ******************************************************************************************************************
ok: [localhost]

TASK [avi-cluster-setup : configure ntp and dns controller nodes] *************************************************************************************************
changed: [localhost]

TASK [avi-cluster-setup : configure AVI cluster] ******************************************************************************************************************
changed: [localhost]

TASK [avi-cluster-setup : wait for cluster become active] *********************************************************************************************************
FAILED - RETRYING: wait for cluster become active (30 retries left).
FAILED - RETRYING: wait for cluster become active (29 retries left).
FAILED - RETRYING: wait for cluster become active (28 retries left).

...

FAILED - RETRYING: wait for cluster become active (14 retries left).
FAILED - RETRYING: wait for cluster become active (13 retries left).
FAILED - RETRYING: wait for cluster become active (12 retries left).
ok: [localhost]

TASK [avi-change-password : change default admin password on cluster build when subscription] *********************************************************************
skipping: [localhost]

PLAY RECAP ********************************************************************************************************************************************************
localhost                  : ok=7    changed=2    unreachable=0    failed=0

[email protected]:~/avi-lab-vagrant$

We can check in the web console to see if the cluster is booted and correctly setup:

Last but not least we need the ansible playbook for the AVI service engines installation which relies on the official AVI ansible se role:

---
- hosts: avi-se
  user: '{{ ansible_ssh_user }}'
  gather_facts: "true"
  roles:
    - {role: ansible-role-avise, become: true}

Let’s run the playbook for the service engines installation:

[email protected]lab:~/avi-lab-vagrant$ ansible-playbook ../avi-lab-provision/playbooks/avi-se-install.yml

PLAY [avi-se] *****************************************************************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************************************
ok: [avi-se-2]
ok: [avi-se-1]

TASK [ansible-role-avisdk : Checking if avisdk python library is present] *****************************************************************************************
ok: [avi-se-1] => {
    "msg": "Please make sure avisdk is installed via pip. 'pip install avisdk --upgrade'"
}
ok: [avi-se-2] => {
    "msg": "Please make sure avisdk is installed via pip. 'pip install avisdk --upgrade'"
}

TASK [ansible-role-avise : Avi SE | Set facts] ********************************************************************************************************************
skipping: [avi-se-1]
skipping: [avi-se-2]

TASK [ansible-role-avise : Avi SE | Deployment] *******************************************************************************************************************
included: /home/berndonline/avi-lab-provision/roles/ansible-role-avise/tasks/docker/main.yml for avi-se-1, avi-se-2

TASK [ansible-role-avise : Avi SE | Check minimum requirements] ***************************************************************************************************
included: /home/berndonline/avi-lab-provision/roles/ansible-role-avise/tasks/docker/requirements.yml for avi-se-1, avi-se-2

TASK [ansible-role-avise : Avi SE | Requirements | Check for docker] **********************************************************************************************
ok: [avi-se-2]
ok: [avi-se-1]

TASK [ansible-role-avise : Avi SE | Requirements | Set facts] *****************************************************************************************************
ok: [avi-se-1]
ok: [avi-se-2]

TASK [ansible-role-avise : Avi SE | Requirements | Validate Parameters] *******************************************************************************************
ok: [avi-se-1] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [avi-se-2] => {
    "changed": false,
    "msg": "All assertions passed"
}

...

TASK [ansible-role-avise : Avi SE | Services | systemd | Start the service since it's not running] ****************************************************************
changed: [avi-se-1]
changed: [avi-se-2]

RUNNING HANDLER [ansible-role-avise : Avi SE | Services | systemd | Daemon reload] ********************************************************************************
ok: [avi-se-2]
ok: [avi-se-1]

RUNNING HANDLER [ansible-role-avise : Avi SE | Services | Restart the avise service] ******************************************************************************
changed: [avi-se-2]
changed: [avi-se-1]

PLAY RECAP ********************************************************************************************************************************************************
avi-se-1                   : ok=47   changed=7    unreachable=0    failed=0
avi-se-2                   : ok=47   changed=7    unreachable=0    failed=0

[email protected]:~/avi-lab-vagrant$

After a few minutes you see the AVI service engines automatically register on the controller cluster and you are ready start configuring the detailed load balancing configuration:

Please share your feedback and leave a comment.

Ansible Automation with Cisco ASA Multi-Context Mode

I thought I’d share my experience using Ansible and Cisco ASA firewalls in multi-context mode. Right from the beginning I had a few issues deploying the configuration and the switch between the different security context didn’t work well. I got the error you see below when I tried to run a playbook. Other times the changeto context didn’t work well and applied the wrong config:

[email protected]:~$ ansible-playbook -i inventory site.yml --ask-vault-pass
Vault password:

PLAY [all] ***************************************************************************************************************************************************************************

TASK [hostname : set dns and hostname] ***********************************************************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: error: [Errno 61] Connection refused
fatal: [fwcontext01]: FAILED! => {"changed": false, "err": "[Errno 61] Connection refused", "msg": "unable to connect to socket"}
ok: [fwcontext02]

TASK [interfaces : write interfaces config] ******************************************************************************************************************************************
ok: [fwcontext02]

....

After a bit of troubleshooting I found a workaround to limit the amount of processes Ansible use and set this limit to one in the Ansible.cfg. The default is five processes if forks is not defined as far as I remember.

[defaults]
inventory = ./inventory
host_key_checking=False
jinja2_extensions=jinja2.ext.do
forks = 1

In the example inventory file, the “inventory_hostname” variable represents the security context and as you see the “ansible_ssh_host” is set to the IP address of the admin context:

fwcontext01 ansible_ssh_host=192.168.0.1 ansible_ssh_port=22 ansible_ssh_user='ansible' ansible_ssh_pass='cisco'
fwcontext02 ansible_ssh_host=192.168.0.1 ansible_ssh_port=22 ansible_ssh_user='ansible' ansible_ssh_pass='cisco'

When you run the playbook again you can see that the playbook runs successfully but deploys the changes one by one to each firewall security context, the disadvantage is that the playbook takes much longer to run:

[email protected]:~$ ansible-playbook site.yml

PLAY [all] ***************************************************************************************************************************************************************************

TASK [hostname : set dns and hostname] ***********************************************************************************************************************************************
ok: [fwcontext01]
ok: [fwcontext02]

TASK [interfaces : write interfaces config] ******************************************************************************************************************************************
ok: [fwcontext01]
ok: [fwcontext02]

Example site.yml

---

- hosts: all
  connection: local
  gather_facts: 'no'

  vars:
    cli:
      username: "{{ ansible_ssh_user }}"
      password: "{{ ansible_ssh_pass }}"
      host: "{{ ansible_ssh_host }}"

  roles:
    - interfaces

In the example Interface role you see that the context is set to “inventory_hostname” variable:

---

- name: write interfaces config
  asa_config:
    src: "templates/interfaces.j2"
    provider: "{{ cli }}"
    context: "{{ inventory_hostname }}"
  register: result

- name: enable interfaces
  asa_config:
    parents: "interface {{ item.0 }}"
    lines: "no shutdown"
    match: none
    provider: "{{ cli }}"
    context: "{{ inventory_hostname }}"
  when: result.changed
  with_items:
    - "{{ interfaces.items() }}"

After modifying the forks, the Ansible playbook runs well with Cisco ASA in multi-context mode, like mentioned before it is a bit slow to deploy the configuration if I compare this to Cumulus Linux or any other Linux system.

Please share your feedback.

Leave a comment

Deploying OpenShift Origin Cluster using Ansible

Something completely different to my more network related posts, this time it is about Platform as a Service with OpenShift Origin. There is a big push for containerized platform services from development.

I was testing the official OpenShift Origin Ansible Playbook to install a small 5 node cluster and created an OpenShift Vagrant environment for this.

Cluster overview:

I recommend having a look at the official RedHat OpenShift documentation to understand the architecture because it is quite a complex platform.

As a pre-requisite, you need to install the vagrant hostmanager because Openshift needs to resolve hostnames and I don’t want to install a separate DNS server. Here you find more information: https://github.com/devopsgroup-io/vagrant-hostmanager

vagrant plugin install vagrant-hostmanager

sudo bash -c 'cat << EOF > /etc/sudoers.d/vagrant_hostmanager2
Cmnd_Alias VAGRANT_HOSTMANAGER_UPDATE = /bin/cp <your-home-folder>/.vagrant.d/tmp/hosts.local /etc/hosts
%sudo ALL=(root) NOPASSWD: VAGRANT_HOSTMANAGER_UPDATE
EOF'

Next, clone my Vagrant repository and the official OpenShift Origin ansible:

git clone [email protected]:berndonline/openshift-origin-vagrant.git
git clone [email protected]:openshift/openshift-ansible.git

Let’s start first by booting the OpenShift vagrant environment:

cd openshift-origin-vagrant/
./vagrant_up.sh

The vagrant host manager will update dynamically the /etc/hosts file on both the Guest and the Host machine:

...
## vagrant-hostmanager-start id: 55ed9acf-25e9-4b19-bfab-e0812a292dc0
10.255.1.81	origin-master

10.255.1.231	origin-etcd

10.255.1.182	origin-infra

10.255.1.72	origin-node-1

10.255.1.145	origin-node-2

## vagrant-hostmanager-end
...

Let’s have a quick look at the OpenShift inventory file. This has settings for the different node types and custom OpenShift and Vagrant variables. You need to modify a few things like public hostname and default subdomain:

OSEv3:children]
masters
nodes
etcd

[OSEv3:vars]
ansible_ssh_user=vagrant
ansible_become=yes

deployment_type=origin
openshift_release=v3.7.0
containerized=true
openshift_install_examples=true
enable_excluders=false
openshift_check_min_host_memory_gb=4
openshift_disable_check=docker_image_availability,docker_storage,disk_availability

# use htpasswd authentication with demo/demo
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
openshift_master_htpasswd_users={'demo': '$apr1$.MaA77kd$Rlnn6RXq9kCjnEfh5I3w/.'}

# put the router on dedicated infra node
openshift_hosted_router_selector='region=infra'
openshift_master_default_subdomain=origin.paas.domain.com

# put the image registry on dedicated infra node
openshift_hosted_registry_selector='region=infra'

# project pods should be placed on primary nodes
osm_default_node_selector='region=primary'

# Vagrant variables
ansible_port='22' 
ansible_user='vagrant'
ansible_ssh_private_key_file='/home/berndonline/.vagrant.d/insecure_private_key'

[masters]
origin-master  openshift_public_hostname="console.paas.domain.com"

[etcd]
origin-etcd

[nodes]
# master needs to be included in the node to be configured in the SDN
origin-master
origin-infra openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
origin-node-[1:2] openshift_node_labels="{'region': 'primary', 'zone': 'default'}"

Now that we are ready, we need to check out the latest release and execute the Ansible Playbook:

cd openshift-ansible/
git checkout release-3.7
ansible-playbook ./playbooks/byo/config.yml -i ../openshift-origin-vagrant/inventory

The playbook takes forever to run, so do something else for the next 10 to 15 mins.

...

PLAY RECAP **********************************************************************************************************************************************************
localhost                  : ok=13   changed=0    unreachable=0    failed=0
origin-etcd                : ok=147  changed=47   unreachable=0    failed=0
origin-infra               : ok=202  changed=61   unreachable=0    failed=0
origin-master              : ok=561  changed=224  unreachable=0    failed=0
origin-node                : ok=201  changed=61   unreachable=0    failed=0


INSTALLER STATUS ****************************************************************************************************************************************************
Initialization             : Complete
Health Check               : Complete
etcd Install               : Complete
Master Install             : Complete
Master Additional Install  : Complete
Node Install               : Complete
Hosted Install             : Complete
Service Catalog Install    : Complete

Sunday 21 January 2018  20:55:16 +0100 (0:00:00.011)       0:11:56.549 ********
===============================================================================
etcd : Pull etcd container ---------------------------------------------------------------------------------------------------------------------------------- 79.51s
openshift_hosted : Ensure OpenShift pod correctly rolls out (best-effort today) ----------------------------------------------------------------------------- 31.54s
openshift_node : Pre-pull node image when containerized ----------------------------------------------------------------------------------------------------- 31.28s
template_service_broker : Verify that TSB is running -------------------------------------------------------------------------------------------------------- 30.87s
docker : Install Docker ------------------------------------------------------------------------------------------------------------------------------------- 30.41s
docker : Install Docker ------------------------------------------------------------------------------------------------------------------------------------- 26.32s
openshift_cli : Pull CLI Image ------------------------------------------------------------------------------------------------------------------------------ 23.03s
openshift_service_catalog : wait for api server to be ready ------------------------------------------------------------------------------------------------- 21.32s
openshift_hosted : Ensure OpenShift pod correctly rolls out (best-effort today) ----------------------------------------------------------------------------- 16.27s
restart master api ------------------------------------------------------------------------------------------------------------------------------------------ 10.69s
restart master controllers ---------------------------------------------------------------------------------------------------------------------------------- 10.62s
openshift_node : Start and enable node ---------------------------------------------------------------------------------------------------------------------- 10.42s
openshift_node : Start and enable node ---------------------------------------------------------------------------------------------------------------------- 10.30s
openshift_master : Start and enable master api on first master ---------------------------------------------------------------------------------------------- 10.21s
openshift_master : Start and enable master controller service ----------------------------------------------------------------------------------------------- 10.19s
os_firewall : Install iptables packages --------------------------------------------------------------------------------------------------------------------- 10.15s
os_firewall : Wait 10 seconds after disabling firewalld ----------------------------------------------------------------------------------------------------- 10.07s
os_firewall : need to pause here, otherwise the iptables service starting can sometimes cause ssh to fail --------------------------------------------------- 10.05s
openshift_node : Pre-pull node image when containerized ------------------------------------------------------------------------------------------------------ 7.85s
openshift_service_catalog : oc_process ----------------------------------------------------------------------------------------------------------------------- 7.44s

To publish both the openshift_public_hostname and openshift_master_default_subdomain, I have a Nginx reverse proxy running and publish 8443 from the origin-master and 80, 443 from the origin-infra nodes.

Here a Nginx example:

server {
  listen 8443 ssl;
  listen [::]:8443 ssl;
  server_name console.paas.domain.com;

  ssl on;
  ssl_certificate /etc/nginx/ssl/paas.domain.com-cert.pem;
  ssl_certificate_key /etc/nginx/ssl/paas.domain.com-key.pem;

  access_log  /var/log/nginx/openshift-console_access.log;
  error_log   /var/log/nginx/openshift-console_error.log;

location / {
  proxy_pass https://10.255.1.81:8443;
  proxy_http_version 1.1;
  proxy_set_header Upgrade $http_upgrade;
  proxy_set_header Connection 'upgrade';
  proxy_set_header Host $host;
  proxy_cache_bypass $http_upgrade;

  }
}

I will try to write more about OpenShift and Platform as a Service and how to deploy small applications like WordPress.

Have fun testing OpenShift and please share your feedback.

Leave a comment

Ansible Playbook for VyOS and BGP Routing

I am currently looking into different possibilities for Open Source alternatives to commercial routers from Cisco or Juniper to use in Amazon AWS Transit VPCs. One option is to completely build the software router by myself with a Debian Linux, FRR (Free Range Routing) and StrongSwan, read my post about the self-build software router: Open Source Routing GRE over IPSec with StrongSwan and Cisco IOS-XE

A few years back I was working with Juniper JunOS routers and I thought I’d give VyOS a try because the command line which is very similar.

I replicated the same Vagrant topology for my Ansible Playbook for Cisco BGP Routing Topology but used VyOS instead of Cisco.

Network overview:

Here are the repositories for the Vagrant topology https://github.com/berndonline/vyos-lab-vagrant and the Ansible Playbook https://github.com/berndonline/vyos-lab-provision.

The Ansible Playbook site.yml is very simple, using the Ansible vyos_system for changing the hostname and the module vyos_config for interface and routing configuration:

---

- hosts: all

  connection: local
  user: '{{ ansible_ssh_user }}'
  gather_facts: 'no'

  roles:
    - hostname
    - interfaces
    - routing

Here is an example from host_vars rtr-1.yml:

---

hostname: rtr-1
domain_name: lab.local

loopback:
  dum0:
    alias: dummy loopback0
    address: 10.255.0.1
    mask: /32

interfaces:
  eth1:
    alias: connection rtr-2
    address: 10.0.255.1
    mask: /30

  eth2:
    alias: connection rtr-3
    address: 10.0.255.5
    mask: /30

bgp:
  asn: 65001
  neighbor:
    - {address: 10.0.255.2, remote_as: 65000}
    - {address: 10.0.255.6, remote_as: 65000}
  networks:
    - {network: 10.0.255.0, mask: /30}
    - {network: 10.0.255.4, mask: /30}
    - {network: 10.255.0.1, mask: /32}
  maxpath: 2

The template interfaces.j2 for the interface configuration:

{% if loopback is defined %}
{% for port, value in loopback.items() %}
set interfaces dummy {{ port }} address '{{ value.address }}{{ value.mask }}'
set interfaces dummy {{ port }} description '{{ value.alias }}'
{% endfor %}
{% endif %}

{% if interfaces is defined %}
{% for port, value in interfaces.items() %}
set interfaces ethernet {{ port }} address '{{ value.address }}{{ value.mask }}'
set interfaces ethernet {{ port }} description '{{ value.alias }}'
{% endfor %}
{% endif %}

This is the template routing.j2 for the routing configuration:

{% if bgp is defined %}
{% if bgp.maxpath is defined %}
set protocols bgp {{ bgp.asn }} maximum-paths ebgp '{{ bgp.maxpath }}'
{% endif %}
{% for item in bgp.neighbor %}
set protocols bgp {{ bgp.asn }} neighbor {{ item.address }} ebgp-multihop '2'
set protocols bgp {{ bgp.asn }} neighbor {{ item.address }} remote-as '{{ item.remote_as }}'
{% endfor %}
{% for item in bgp.networks %}
set protocols bgp {{ bgp.asn }} network '{{ item.network }}{{ item.mask }}'
{% endfor %}
set protocols bgp {{ bgp.asn }} parameters router-id '{{ loopback.dum0.address }}'
{% endif %}

The output of the running Ansible Playbook:

PLAY [all] *********************************************************************

TASK [hostname : write hostname and domain-name] *******************************
changed: [rtr-3]
changed: [rtr-2]
changed: [rtr-4]
changed: [rtr-1]

TASK [interfaces : write interfaces config] ************************************
changed: [rtr-4]
changed: [rtr-1]
changed: [rtr-3]
changed: [rtr-2]

TASK [routing : write routing config] ******************************************
changed: [rtr-2]
changed: [rtr-4]
changed: [rtr-3]
changed: [rtr-1]

PLAY RECAP *********************************************************************
rtr-1                      : ok=3    changed=3    unreachable=0    failed=0   
rtr-2                      : ok=3    changed=3    unreachable=0    failed=0   
rtr-3                      : ok=3    changed=3    unreachable=0    failed=0   
rtr-4                      : ok=3    changed=3    unreachable=0    failed=0   

Like in all my other Ansible Playbooks I use some kind of validation, a simple ping check vyos_check_icmp.yml to see if the configuration is correctly deployed:

---

- hosts: all

  connection: local
  user: '{{ ansible_ssh_user }}'
  gather_facts: 'no'

  tasks:
    - name: validate connection from rtr-1
      vyos_command:
        commands: 'ping {{ item }} count 4'
      when: "'rtr-1' in inventory_hostname"
      with_items:
        - '10.0.255.2'
        - '10.0.255.6'

    - name: validate connection from rtr-2
      vyos_command:
        commands: 'ping {{ item }} count 4'
      when: "'rtr-2' in inventory_hostname"
      with_items:
        - '10.0.255.1'
        - '10.0.254.1'
        - '10.0.253.2'
...

The output of the icmp validation Playbook:

PLAY [all] *********************************************************************

TASK [validate connection from rtr-1] ******************************************
skipping: [rtr-3] => (item=10.0.255.2) 
skipping: [rtr-3] => (item=10.0.255.6) 
skipping: [rtr-2] => (item=10.0.255.2) 
skipping: [rtr-2] => (item=10.0.255.6) 
skipping: [rtr-4] => (item=10.0.255.2) 
skipping: [rtr-4] => (item=10.0.255.6) 
ok: [rtr-1] => (item=10.0.255.2)
ok: [rtr-1] => (item=10.0.255.6)

TASK [validate connection from rtr-2] ******************************************
skipping: [rtr-3] => (item=10.0.255.1) 
skipping: [rtr-3] => (item=10.0.254.1) 
skipping: [rtr-1] => (item=10.0.255.1) 
skipping: [rtr-3] => (item=10.0.253.2) 
skipping: [rtr-1] => (item=10.0.254.1) 
skipping: [rtr-1] => (item=10.0.253.2) 
skipping: [rtr-4] => (item=10.0.255.1) 
skipping: [rtr-4] => (item=10.0.254.1) 
skipping: [rtr-4] => (item=10.0.253.2) 
ok: [rtr-2] => (item=10.0.255.1)
ok: [rtr-2] => (item=10.0.254.1)
ok: [rtr-2] => (item=10.0.253.2)

TASK [validate connection from rtr-3] ******************************************
skipping: [rtr-1] => (item=10.0.255.5) 
skipping: [rtr-1] => (item=10.0.254.5) 
skipping: [rtr-2] => (item=10.0.255.5) 
skipping: [rtr-1] => (item=10.0.253.1) 
skipping: [rtr-2] => (item=10.0.254.5) 
skipping: [rtr-2] => (item=10.0.253.1) 
skipping: [rtr-4] => (item=10.0.255.5) 
skipping: [rtr-4] => (item=10.0.254.5) 
skipping: [rtr-4] => (item=10.0.253.1) 
ok: [rtr-3] => (item=10.0.255.5)
ok: [rtr-3] => (item=10.0.254.5)
ok: [rtr-3] => (item=10.0.253.1)

TASK [validate connection from rtr-4] ******************************************
skipping: [rtr-3] => (item=10.0.254.2) 
skipping: [rtr-3] => (item=10.0.254.6) 
skipping: [rtr-1] => (item=10.0.254.2) 
skipping: [rtr-1] => (item=10.0.254.6) 
skipping: [rtr-2] => (item=10.0.254.2) 
skipping: [rtr-2] => (item=10.0.254.6) 
ok: [rtr-4] => (item=10.0.254.2)
ok: [rtr-4] => (item=10.0.254.6)

TASK [validate bgp connection from rtr-1] **************************************
skipping: [rtr-3] => (item=10.255.0.4) 
skipping: [rtr-2] => (item=10.255.0.4) 
skipping: [rtr-4] => (item=10.255.0.4) 
ok: [rtr-1] => (item=10.255.0.4)

TASK [validate bgp connection from rtr-4] **************************************
skipping: [rtr-3] => (item=10.255.0.1) 
skipping: [rtr-1] => (item=10.255.0.1) 
skipping: [rtr-2] => (item=10.255.0.1) 
ok: [rtr-4] => (item=10.255.0.1)

PLAY RECAP *********************************************************************
rtr-1                      : ok=2    changed=0    unreachable=0    failed=0   
rtr-2                      : ok=1    changed=0    unreachable=0    failed=0   
rtr-3                      : ok=1    changed=0    unreachable=0    failed=0   
rtr-4                      : ok=2    changed=0    unreachable=0    failed=0   

As you see, the configuration is successfully deployed and BGP connectivity between the nodes.

Leave a comment

BGP EVPN and VXLAN with Cumulus Linux

I did some updates on my Cumulus Linux Vagrant topology and added new functions to my post about an Ansible Playbook for the Cumulus Linux BGP IP-Fabric.

To the Vagrant topology, I added 6x servers and per clag-pair, each server is connected to a VLAN and the second server is connected to a VXLAN.

Here are the links to the repositories where you find the Ansible Playbook https://github.com/berndonline/cumulus-lab-provision and the Vagrantfile https://github.com/berndonline/cumulus-lab-vagrant

In the Ansible Playbook, I added BGP EVPN and one VXLAN which spreads over all Leaf and Edge switches. VXLAN routing is happening on the Edge switches into the rest of the virtual data centre network.

Here is an example of the additional variables I added to edge-1 for BGP EVPN and VXLAN:

group_vars/edge.yml:

clagd_vxlan_anycast_ip: 10.255.100.1

The VXLAN anycast IP is needed in BGP for EVPN and the same IP is shared between edge-1 and edge-2. The same is for the other leaf switches, per clag pair they share the same anycast IP address.

host_vars/edge-1.yml:

---

loopback: 10.255.0.3/32

bgp_fabric:
  asn: 65001
  router_id: 10.255.0.3
  neighbor:
    - swp51
    - swp52
  networks:
    - 10.0.4.0/24
    - 10.255.0.3/32
    - 10.255.100.1/32
    - 10.0.255.0/28
  evpn: true
  advertise_vni: true

peerlink:
  bond_slaves: swp53 swp54
  mtu: 9216
  vlan: 4094
  address: 169.254.1.1/30
  clagd_peer_ip: 169.254.1.2
  clagd_backup_ip: 192.168.100.4
  clagd_sys_mac: 44:38:39:FF:40:94
  clagd_priority: 4096

bridge:
  ports: peerlink vxlan10201
  vids: 901 201

vlans:
  901:
    alias: edge-transit-901
    vipv4: 10.0.255.14/28
    vmac: 00:00:5e:00:09:01
    pipv4: 10.0.255.12/28
  201:
    alias: prod-server-10201
    vipv4: 10.0.4.254/24
    vmac: 00:00:00:00:02:01
    pipv4: 10.0.4.252/24
    vlan_id: 201
    vlan_raw_device: bridge

vxlans:
  10201:
    alias: prod-server-10201
    vxlan_local_tunnelip: 10.255.0.3
    bridge_access: 201
    bridge_learning: 'off'
    bridge_arp_nd_suppress: 'on'

On the Edge switches, because of VXLAN routing, you find a mapping between VXLAN 10201 to VLAN 201 which has VRR running.

I needed to do some modifications to the interfaces template interfaces_config.j2:

{% if loopback is defined %}
auto lo
iface lo inet loopback
    address {{ loopback }}
{% if clagd_vxlan_anycast_ip is defined %}
    clagd-vxlan-anycast-ip {{ clagd_vxlan_anycast_ip }}
{% endif %}
{% endif %}
...
{% if bridge is defined %}
{% for vxlan_id, value in vxlans.items() %}
auto vxlan{{ vxlan_id }}
iface vxlan{{ vxlan_id }}
    alias {{ value.alias }}
    vxlan-id {{ vxlan_id }}
    vxlan-local-tunnelip {{ value.vxlan_local_tunnelip }}
    bridge-access {{ value.bridge_access }}
    bridge-learning {{ value.bridge_learning }}
    bridge-arp-nd-suppress {{ value.bridge_arp_nd_suppress }}
    mstpctl-bpduguard yes
    mstpctl-portbpdufilter yes

{% endfor %}
{% endif %}

There were also some modifications needed to the FRR template frr.j2 to add EVPN to the BGP configuration:

...
{% if bgp_fabric.evpn is defined %}
 address-family ipv6 unicast
  neighbor fabric activate
 exit-address-family
 !
 address-family l2vpn evpn
  neighbor fabric activate
{% if bgp_fabric.advertise_vni is defined %}
  advertise-all-vni
{% endif %}
 exit-address-family
{% endif %}
{% endif %}
...

For more detailed information about EVPN and VXLAN routing on Cumulus Linux, I recommend reading the documentation Ethernet Virtual Private Network – EVPN and VXLAN Routing.

Have fun testing the new features in my Ansible Playbook and please share your feedback.

Leave a comment