Using HashiCorp Terraform to deploy Amazon AWS VPC

Before I start deploying the AWS VPC with HashCorp’s Terraform I want to explain the design of the Virtual Private Cloud. The main focus here is primarily for redundancy to ensure that if one Availability Zone (AZ) becomes unavailable that it is not interrupting the traffic and causing outages in your network, the NAT Gateway for example run per AZ so you need to make sure that these services are spread over multiple AZs.

AWS VPC network overview:

Before you start using Terraform you need to install the binary and it is also very useful to install the AWS command line interface. Please don’t forget to register the AWS CLI and add access and secure key.

pip install awscli --upgrade --user
wget https://releases.hashicorp.com/terraform/0.11.7/terraform_0.11.7_linux_amd64.zip
unzip terraform_0.11.7_linux_amd64.zip
sudo mv terraform /usr/local/bin/

Terraform is a great product and creates infrastructure as code, and is independent from any cloud provider so there is no need to use AWS CloudFormation like in my example. My repository for the Terraform files can be found here: https://github.com/berndonline/aws-terraform

Let’s start with the variables file, which defines the needed settings for deploying the VPC. Basically you only need to change the variables to deploy the VPC to another AWS region:

...
variable "aws_region" {
  description = "AWS region to launch servers."
  default     = "eu-west-1"
}
...
variable "vpc_cidr" {
    default = "10.0.0.0/20"
  description = "the vpc cdir range"
}
variable "public_subnet_a" {
  default = "10.0.0.0/24"
  description = "Public subnet AZ A"
}
variable "public_subnet_b" {
  default = "10.0.4.0/24"
  description = "Public subnet AZ A"
}
variable "public_subnet_c" {
  default = "10.0.8.0/24"
  description = "Public subnet AZ A"
}
...

The vpc.tf file is the Terraform template which deploys the private and public subnets, the internet gateway, multiple NAT gateways and the different routing tables and adds the needed routes towards the internet:

# Create a VPC to launch our instances into
resource "aws_vpc" "default" {
    cidr_block = "${var.vpc_cidr}"
    enable_dns_support = true
    enable_dns_hostnames = true
    tags {
      Name = "VPC"
    }
}

resource "aws_subnet" "PublicSubnetA" {
  vpc_id = "${aws_vpc.default.id}"
  cidr_block = "${var.public_subnet_a}"
  tags {
        Name = "Public Subnet A"
  }
 availability_zone = "${data.aws_availability_zones.available.names[0]}"
}
...

In the main.tf you define which provider to use:

# Specify the provider and access details
provider "aws" {
  region = "${var.aws_region}"
}

# Declare the data source
data "aws_availability_zones" "available" {}

Now let’s start deploying the environment, first you need to initialise Terraform “terraform init“:

[email protected]:~/aws-terraform$ terraform init

Initializing provider plugins...
- Checking for available provider plugins on https://releases.hashicorp.com...
- Downloading plugin for provider "aws" (1.25.0)...

The following providers do not have any version constraints in configuration,
so the latest version was installed.

To prevent automatic upgrades to new major versions that may contain breaking
changes, it is recommended to add version = "..." constraints to the
corresponding provider blocks in configuration, with the constraint strings
suggested below.

* provider.aws: version = "~> 1.25"

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
[email protected]:~/aws-terraform$

Next, let’s do a dry run “terraform plan” to see all changes Terraform would apply:

[email protected]:~/aws-terraform$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.aws_availability_zones.available: Refreshing state...

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  + aws_eip.natgw_a
      id:                                          
      allocation_id:                               
      association_id:                              
      domain:                                      
      instance:                                    
      network_interface:                           
      private_ip:                                  
      public_ip:                                   
      vpc:                                         "true"

...

  + aws_vpc.default
      id:                                          
      assign_generated_ipv6_cidr_block:            "false"
      cidr_block:                                  "10.0.0.0/20"
      default_network_acl_id:                      
      default_route_table_id:                      
      default_security_group_id:                   
      dhcp_options_id:                             
      enable_classiclink:                          
      enable_classiclink_dns_support:              
      enable_dns_hostnames:                        "true"
      enable_dns_support:                          "true"
      instance_tenancy:                            "default"
      ipv6_association_id:                         
      ipv6_cidr_block:                             
      main_route_table_id:                         
      tags.%:                                      "1"
      tags.Name:                                   "VPC"


Plan: 27 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

Note: You didn't specify an "-out" parameter to save this plan, so Terraform
can't guarantee that exactly these actions will be performed if
"terraform apply" is subsequently run.

[email protected]:~/aws-terraform$

Because nothing is deployed, Terraform would apply 27 changes, so let’s do this by running “terraform apply“. Terraform will check the state and will ask you to confirm and then apply the changes:

[email protected]:~/aws-terraform$ terraform apply
data.aws_availability_zones.available: Refreshing state...

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  + aws_eip.natgw_a
      id:                                          
      allocation_id:                               
      association_id:                              
      domain:                                      
      instance:                                    
      network_interface:                           
      private_ip:                                  
      public_ip:                                   
      vpc:                                         "true"

...

  + aws_vpc.default
      id:                                          
      assign_generated_ipv6_cidr_block:            "false"
      cidr_block:                                  "10.0.0.0/20"
      default_network_acl_id:                      
      default_route_table_id:                      
      default_security_group_id:                   
      dhcp_options_id:                             
      enable_classiclink:                          
      enable_classiclink_dns_support:              
      enable_dns_hostnames:                        "true"
      enable_dns_support:                          "true"
      instance_tenancy:                            "default"
      ipv6_association_id:                         
      ipv6_cidr_block:                             
      main_route_table_id:                         
      tags.%:                                      "1"
      tags.Name:                                   "VPC"


Plan: 27 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aws_eip.natgw_c: Creating...
  allocation_id:     "" => ""
  association_id:    "" => ""
  domain:            "" => ""
  instance:          "" => ""
  network_interface: "" => ""
  private_ip:        "" => ""
  public_ip:         "" => ""
  vpc:               "" => "true"
aws_eip.natgw_a: Creating...
  allocation_id:     "" => ""
  association_id:    "" => ""
  domain:            "" => ""
  instance:          "" => ""
  network_interface: "" => ""
  private_ip:        "" => ""
  public_ip:         "" => ""
  vpc:               "" => "true"

...

aws_route_table_association.PrivateSubnetB: Creation complete after 0s (ID: rtbassoc-174ba16c)
aws_nat_gateway.public_nat_c: Still creating... (1m40s elapsed)
aws_nat_gateway.public_nat_c: Still creating... (1m50s elapsed)
aws_nat_gateway.public_nat_c: Creation complete after 1m56s (ID: nat-093319a1fa62c3eda)
aws_route_table.private_route_c: Creating...
  propagating_vgws.#:                         "" => ""
  route.#:                                    "" => "1"
  route.4170986711.cidr_block:                "" => "0.0.0.0/0"
  route.4170986711.egress_only_gateway_id:    "" => ""
  route.4170986711.gateway_id:                "" => ""
  route.4170986711.instance_id:               "" => ""
  route.4170986711.ipv6_cidr_block:           "" => ""
  route.4170986711.nat_gateway_id:            "" => "nat-093319a1fa62c3eda"
  route.4170986711.network_interface_id:      "" => ""
  route.4170986711.vpc_peering_connection_id: "" => ""
  tags.%:                                     "" => "1"
  tags.Name:                                  "" => "Private Route C"
  vpc_id:                                     "" => "vpc-fdffb19b"
aws_route_table.private_route_c: Creation complete after 1s (ID: rtb-d64632af)
aws_route_table_association.PrivateSubnetC: Creating...
  route_table_id: "" => "rtb-d64632af"
  subnet_id:      "" => "subnet-17da194d"
aws_route_table_association.PrivateSubnetC: Creation complete after 1s (ID: rtbassoc-35749e4e)

Apply complete! Resources: 27 added, 0 changed, 0 destroyed.
[email protected]:~/aws-terraform$

Terraform successfully applied all the changes so let’s have a quick look in the AWS web console:

You can change the environment and run “terraform apply” again and Terraform would deploy the changes you have made. In my example below I didn’t, so Terraform would do nothing because it tracks the state that is deployed and that I have defined in the vpc.tf:

[email protected]:~/aws-terraform$ terraform apply
aws_eip.natgw_c: Refreshing state... (ID: eipalloc-7fa0eb42)
aws_vpc.default: Refreshing state... (ID: vpc-fdffb19b)
aws_eip.natgw_a: Refreshing state... (ID: eipalloc-3ca7ec01)
aws_eip.natgw_b: Refreshing state... (ID: eipalloc-e6bbf0db)
data.aws_availability_zones.available: Refreshing state...
aws_subnet.PublicSubnetC: Refreshing state... (ID: subnet-d6e4278c)
aws_subnet.PrivateSubnetC: Refreshing state... (ID: subnet-17da194d)
aws_subnet.PrivateSubnetA: Refreshing state... (ID: subnet-6ea62708)
aws_subnet.PublicSubnetA: Refreshing state... (ID: subnet-1ab0317c)
aws_network_acl.all: Refreshing state... (ID: acl-c75f9ebe)
aws_internet_gateway.gw: Refreshing state... (ID: igw-27652940)
aws_subnet.PrivateSubnetB: Refreshing state... (ID: subnet-ab59c8e3)
aws_subnet.PublicSubnetB: Refreshing state... (ID: subnet-4a51c002)
aws_route_table.public_route_b: Refreshing state... (ID: rtb-a45d29dd)
aws_route_table.public_route_a: Refreshing state... (ID: rtb-5b423622)
aws_route_table.public_route_c: Refreshing state... (ID: rtb-0453277d)
aws_nat_gateway.public_nat_b: Refreshing state... (ID: nat-0376fc652d362a3b1)
aws_nat_gateway.public_nat_a: Refreshing state... (ID: nat-073ed904d4cf2d30e)
aws_route_table_association.PublicSubnetA: Refreshing state... (ID: rtbassoc-b14ba1ca)
aws_route_table_association.PublicSubnetB: Refreshing state... (ID: rtbassoc-277d975c)
aws_route_table.private_route_a: Refreshing state... (ID: rtb-0745317e)
aws_route_table.private_route_b: Refreshing state... (ID: rtb-a15a2ed8)
aws_route_table_association.PrivateSubnetB: Refreshing state... (ID: rtbassoc-174ba16c)
aws_route_table_association.PrivateSubnetA: Refreshing state... (ID: rtbassoc-60759f1b)
aws_nat_gateway.public_nat_c: Refreshing state... (ID: nat-093319a1fa62c3eda)
aws_route_table_association.PublicSubnetC: Refreshing state... (ID: rtbassoc-307e944b)
aws_route_table.private_route_c: Refreshing state... (ID: rtb-d64632af)
aws_route_table_association.PrivateSubnetC: Refreshing state... (ID: rtbassoc-35749e4e)

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
[email protected]:~/aws-terraform$

To remove the environment use run “terraform destroy“:

[email protected]:~/aws-terraform$ terraform destroy
aws_eip.natgw_c: Refreshing state... (ID: eipalloc-7fa0eb42)
data.aws_availability_zones.available: Refreshing state...
aws_eip.natgw_a: Refreshing state... (ID: eipalloc-3ca7ec01)
aws_vpc.default: Refreshing state... (ID: vpc-fdffb19b)
aws_eip.natgw_b: Refreshing state... (ID: eipalloc-e6bbf0db)

...

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

  - aws_eip.natgw_a

  - aws_eip.natgw_b

  - aws_eip.natgw_c

...

Plan: 0 to add, 0 to change, 27 to destroy.

Do you really want to destroy?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes

aws_network_acl.all: Destroying... (ID: acl-c75f9ebe)
aws_route_table_association.PrivateSubnetA: Destroying... (ID: rtbassoc-60759f1b)
aws_route_table_association.PublicSubnetC: Destroying... (ID: rtbassoc-307e944b)
aws_route_table_association.PublicSubnetA: Destroying... (ID: rtbassoc-b14ba1ca)
aws_route_table_association.PublicSubnetB: Destroying... (ID: rtbassoc-277d975c)
aws_route_table_association.PrivateSubnetC: Destroying... (ID: rtbassoc-35749e4e)
aws_route_table_association.PrivateSubnetB: Destroying... (ID: rtbassoc-174ba16c)
aws_route_table_association.PrivateSubnetB: Destruction complete after 0s

...

aws_internet_gateway.gw: Destroying... (ID: igw-27652940)
aws_eip.natgw_c: Destroying... (ID: eipalloc-7fa0eb42)
aws_subnet.PrivateSubnetC: Destroying... (ID: subnet-17da194d)
aws_subnet.PrivateSubnetC: Destruction complete after 1s
aws_eip.natgw_c: Destruction complete after 1s
aws_internet_gateway.gw: Still destroying... (ID: igw-27652940, 10s elapsed)
aws_internet_gateway.gw: Destruction complete after 11s
aws_vpc.default: Destroying... (ID: vpc-fdffb19b)
aws_vpc.default: Destruction complete after 0s

Destroy complete! Resources: 27 destroyed.
[email protected]:~/aws-terraform$

I hope this article was informative and explains how to deploy a VPC with Terraform. In the coming weeks I will add additional functions like deploying EC2 Instances and Load Balancing.

Please share your feedback and leave a comment.

Ansible Playbook for deploying AVI Controller nodes and Service Engines

After my first blog post about Software defined Load Balancing with AVI Networks, here is how to automatically deploy AVI controller and services engines via Ansible.

Here are the links to my repositories; AVI Vagrant environment: https://github.com/berndonline/avi-lab-vagrant and AVI Ansible Playbook: https://github.com/berndonline/avi-lab-provision

Make sure that your vagrant environment is running,

[email protected]:~/avi-lab-vagrant$ vagrant status
Current machine states:

avi-controller-1          running (libvirt)
avi-controller-2          running (libvirt)
avi-controller-3          running (libvirt)
avi-se-1                  running (libvirt)
avi-se-2                  running (libvirt)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.

I needed to modify the ansible.cfg to integrate a filter plugin:

[defaults]
inventory = ./.vagrant/provisioners/ansible/inventory/vagrant_ansible_inventory
host_key_checking=False

library = /home/berndonline/avi-lab-provision/lib
filter_plugins = /home/berndonline/avi-lab-provision/lib/filter_plugins

The controller installation is actually very simple and I got it from the official AVI ansible role they created, I added a second role to check ones the controller nodes are successfully booted:

---
- hosts: avi-controller
  user: '{{ ansible_ssh_user }}'
  gather_facts: "true"
  roles:
    - {role: ansible-role-avicontroller, become: true}
    - {role: avi-post-controller, become: false}

There’s one important thing to know before we run the playbook. When you have an AVI subscription you get custom container images with a predefined default password which makes it easier for you to do the cluster setup fully automated. You find the default password variable in group_vars/all.yml there you set as well if the password should be changed.

Let’s execute the ansible playbook, it takes a bit time for the three nodes to boot up:

[email protected]:~/avi-lab-vagrant$ ansible-playbook ../avi-lab-provision/playbooks/avi-controller-install.yml

PLAY [avi-controller] *********************************************************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************************************
ok: [avi-controller-3]
ok: [avi-controller-2]
ok: [avi-controller-1]

TASK [ansible-role-avicontroller : Avi Controller | Deployment] ***************************************************************************************************
included: /home/berndonline/avi-lab-provision/roles/ansible-role-avicontroller/tasks/docker/main.yml for avi-controller-1, avi-controller-2, avi-controller-3

TASK [ansible-role-avicontroller : Avi Controller | Services | systemd | Check if Avi Controller installed] *******************************************************
included: /home/berndonline/avi-lab-provision/roles/ansible-role-avicontroller/tasks/docker/services/systemd/check.yml for avi-controller-1, avi-controller-2, avi-controller-3

TASK [ansible-role-avicontroller : Avi Controller | Check if Avi Controller installed] ****************************************************************************
ok: [avi-controller-3]
ok: [avi-controller-2]
ok: [avi-controller-1]

TASK [ansible-role-avicontroller : Avi Controller | Services | init.d | Check if Avi Controller installed] ********************************************************
skipping: [avi-controller-1]
skipping: [avi-controller-2]
skipping: [avi-controller-3]

TASK [ansible-role-avicontroller : Avi Controller | Check minimum requirements] ***********************************************************************************
included: /home/berndonline/avi-lab-provision/roles/ansible-role-avicontroller/tasks/docker/requirements.yml for avi-controller-1, avi-controller-2, avi-controller-3

TASK [ansible-role-avicontroller : Avi Controller | Requirements | Check for docker] ******************************************************************************
ok: [avi-controller-2]
ok: [avi-controller-3]
ok: [avi-controller-1]

...

TASK [avi-post-controller : wait for cluster nodes up] ************************************************************************************************************
FAILED - RETRYING: wait for cluster nodes up (30 retries left).
FAILED - RETRYING: wait for cluster nodes up (30 retries left).
FAILED - RETRYING: wait for cluster nodes up (30 retries left).

...

FAILED - RETRYING: wait for cluster nodes up (7 retries left).
FAILED - RETRYING: wait for cluster nodes up (8 retries left).
FAILED - RETRYING: wait for cluster nodes up (7 retries left).
FAILED - RETRYING: wait for cluster nodes up (7 retries left).
ok: [avi-controller-2]
ok: [avi-controller-3]
ok: [avi-controller-1]

PLAY RECAP ********************************************************************************************************************************************************
avi-controller-1           : ok=36   changed=6    unreachable=0    failed=0
avi-controller-2           : ok=35   changed=5    unreachable=0    failed=0
avi-controller-3           : ok=35   changed=5    unreachable=0    failed=0

[email protected]:~/avi-lab-vagrant$

We are not finished yet and need to set basic settings like NTP and DNS, and need to configure the AVI three node controller cluster with another playbook:

---
- hosts: localhost
  connection: local
  roles:
    - {role: avi-cluster-setup, become: false}
    - {role: avi-change-password, become: false, when: avi_change_password == true}

The first role uses the REST API to do the configuration changes and requires the AVI ansible sdk role and for these reason it is very useful using the custom subscription images because you know the default password otherwise you need to modify the main setup.json file.

Let’s run the AVI cluster setup playbook:

[email protected]:~/avi-lab-vagrant$ ansible-playbook ../avi-lab-provision/playbooks/avi-cluster-setup.yml

PLAY [localhost] **************************************************************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************************************
ok: [localhost]

TASK [ansible-role-avisdk : Checking if avisdk python library is present] *****************************************************************************************
ok: [localhost] => {
    "msg": "Please make sure avisdk is installed via pip. 'pip install avisdk --upgrade'"
}

TASK [avi-cluster-setup : set AVI dns and ntp facts] **************************************************************************************************************
ok: [localhost]

TASK [avi-cluster-setup : set AVI cluster facts] ******************************************************************************************************************
ok: [localhost]

TASK [avi-cluster-setup : configure ntp and dns controller nodes] *************************************************************************************************
changed: [localhost]

TASK [avi-cluster-setup : configure AVI cluster] ******************************************************************************************************************
changed: [localhost]

TASK [avi-cluster-setup : wait for cluster become active] *********************************************************************************************************
FAILED - RETRYING: wait for cluster become active (30 retries left).
FAILED - RETRYING: wait for cluster become active (29 retries left).
FAILED - RETRYING: wait for cluster become active (28 retries left).

...

FAILED - RETRYING: wait for cluster become active (14 retries left).
FAILED - RETRYING: wait for cluster become active (13 retries left).
FAILED - RETRYING: wait for cluster become active (12 retries left).
ok: [localhost]

TASK [avi-change-password : change default admin password on cluster build when subscription] *********************************************************************
skipping: [localhost]

PLAY RECAP ********************************************************************************************************************************************************
localhost                  : ok=7    changed=2    unreachable=0    failed=0

[email protected]:~/avi-lab-vagrant$

We can check in the web console to see if the cluster is booted and correctly setup:

Last but not least we need the ansible playbook for the AVI service engines installation which relies on the official AVI ansible se role:

---
- hosts: avi-se
  user: '{{ ansible_ssh_user }}'
  gather_facts: "true"
  roles:
    - {role: ansible-role-avise, become: true}

Let’s run the playbook for the service engines installation:

[email protected]:~/avi-lab-vagrant$ ansible-playbook ../avi-lab-provision/playbooks/avi-se-install.yml

PLAY [avi-se] *****************************************************************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************************************
ok: [avi-se-2]
ok: [avi-se-1]

TASK [ansible-role-avisdk : Checking if avisdk python library is present] *****************************************************************************************
ok: [avi-se-1] => {
    "msg": "Please make sure avisdk is installed via pip. 'pip install avisdk --upgrade'"
}
ok: [avi-se-2] => {
    "msg": "Please make sure avisdk is installed via pip. 'pip install avisdk --upgrade'"
}

TASK [ansible-role-avise : Avi SE | Set facts] ********************************************************************************************************************
skipping: [avi-se-1]
skipping: [avi-se-2]

TASK [ansible-role-avise : Avi SE | Deployment] *******************************************************************************************************************
included: /home/berndonline/avi-lab-provision/roles/ansible-role-avise/tasks/docker/main.yml for avi-se-1, avi-se-2

TASK [ansible-role-avise : Avi SE | Check minimum requirements] ***************************************************************************************************
included: /home/berndonline/avi-lab-provision/roles/ansible-role-avise/tasks/docker/requirements.yml for avi-se-1, avi-se-2

TASK [ansible-role-avise : Avi SE | Requirements | Check for docker] **********************************************************************************************
ok: [avi-se-2]
ok: [avi-se-1]

TASK [ansible-role-avise : Avi SE | Requirements | Set facts] *****************************************************************************************************
ok: [avi-se-1]
ok: [avi-se-2]

TASK [ansible-role-avise : Avi SE | Requirements | Validate Parameters] *******************************************************************************************
ok: [avi-se-1] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [avi-se-2] => {
    "changed": false,
    "msg": "All assertions passed"
}

...

TASK [ansible-role-avise : Avi SE | Services | systemd | Start the service since it's not running] ****************************************************************
changed: [avi-se-1]
changed: [avi-se-2]

RUNNING HANDLER [ansible-role-avise : Avi SE | Services | systemd | Daemon reload] ********************************************************************************
ok: [avi-se-2]
ok: [avi-se-1]

RUNNING HANDLER [ansible-role-avise : Avi SE | Services | Restart the avise service] ******************************************************************************
changed: [avi-se-2]
changed: [avi-se-1]

PLAY RECAP ********************************************************************************************************************************************************
avi-se-1                   : ok=47   changed=7    unreachable=0    failed=0
avi-se-2                   : ok=47   changed=7    unreachable=0    failed=0

[email protected]:~/avi-lab-vagrant$

After a few minutes you see the AVI service engines automatically register on the controller cluster and you are ready start configuring the detailed load balancing configuration:

Please share your feedback and leave a comment.

Internet Edge and WAN Routing with Cumulus Linux

With this article I wanted to focus on something different than the usual spine and leaf topology and talk about datacenter edge routing.

I was using Cisco routers for many years for Internet Edge and WAN connectivity. The problem with using a vendor like Cisco is the price tag you have to pay and there still might a reason for it to spend the money. But nowadays you get leased-lines handed over as normal Ethernet connection and using a dedicated routers maybe not always necessary if you are not getting too crazy with BGP routing or quality of service.

I was experimenting over the last weeks if I could use a Cumulus Linux switch as an Internet Edge and Wide Area Network router with running different VRFs for internet and WAN connectivity. I came up with the following edge network layout you see below:

For this network, I build an Vagrant topology with Cumulus VX to simulate the edge routing and being able to test the connectivity. Below you see a more detailed view of the Vagrant topology:

Everything is running on Cumulus VX even the firewalls because I just wanted to simulate the traffic flow and see if the network communication is functioning. Also having separate WAN switches might be useful because 1Gbit/s switches are cheaper then 40Gbit/s switches and you need additional SFP for 1Gbit/s connections, another point is to separate your layer 2 WAN connectivity from your internal datacenter network.

Here the assigned IP addresses for this lab:

wan-1 VLAN801 PIP: 217.0.1.2/29 VIP: 217.0.1.1/29
wan-2 VLAN801 PIP: 217.0.1.3/29 VIP: 217.0.1.1/29
wan-1 VLAN802 PIP: 10.100.0.1/29 
wan-2 VLAN802 PIP: 10.100.0.2/29
wan-1 VLAN904 PIP: 217.0.0.2/28 VIP: 217.0.0.1/28
wan-2 VLAN904 PIP: 217.0.0.3/28 VIP: 217.0.0.1/28
fw-1 VLAN904 PIP: 217.0.0.14/28
wan-1 VLAN903 PIP: 10.0.255.34/28 VIP: 10.0.255.33/28
wan-2 VLAN903 PIP: 10.0.255.35/28 VIP: 10.0.255.33/28
fw-2 VLAN903 PIP: 10.0.255.46/28
edge-1 VLAN901 PIP: 10.0.255.2/28 VIP: 10.0.255.1/28
edge-2 VLAN901 PIP: 10.0.255.3/28 VIP: 10.0.255.1/28
fw-1 VLAN901 PIP: 10.0.255.14/28
fw-2 VLAN901 PIP: 10.0.255.12/28
edge-1 VLAN902 PIP: 10.0.255.18/28 VIP: 10.0.255.17/28
edge-2 VLAN902 PIP: 10.0.255.19/28 VIP: 10.0.255.17/28
fw-1 VLAN902 PIP: 10.0.255.30/28

You can find the Github repository for the Vagrant topology here: https://github.com/berndonline/cumulus-edge-vagrant

[email protected]:~/cumulus-edge-vagrant$ vagrant status
Current machine states:

fw-2                      running (libvirt)
fw-1                      running (libvirt)
mgmt-1                    running (libvirt)
edge-2                    running (libvirt)
edge-1                    running (libvirt)
wan-1                     running (libvirt)
wan-2                     running (libvirt)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.
[email protected]:~/cumulus-edge-vagrant$

I wrote as well an Ansible Playbook to deploy the initial configuration which you can find here: https://github.com/berndonline/cumulus-edge-provision

Let’s execute the playbook:

[email protected]:~/cumulus-edge-vagrant$ ansible-playbook ../cumulus-edge-provision/site.yml

PLAY [edge] ********************************************************************************************************************************************************

TASK [switchgroups : create switch groups based on clag_pairs] *****************************************************************************************************
skipping: [edge-2] => (item=(u'wan', [u'wan-1', u'wan-2']))
skipping: [edge-1] => (item=(u'wan', [u'wan-1', u'wan-2']))
ok: [edge-2] => (item=(u'edge', [u'edge-1', u'edge-2']))
ok: [wan-1] => (item=(u'wan', [u'wan-1', u'wan-2']))
skipping: [wan-1] => (item=(u'edge', [u'edge-1', u'edge-2']))
ok: [edge-1] => (item=(u'edge', [u'edge-1', u'edge-2']))
ok: [wan-2] => (item=(u'wan', [u'wan-1', u'wan-2']))
skipping: [wan-2] => (item=(u'edge', [u'edge-1', u'edge-2']))

TASK [switchgroups : include switch group variables] ***************************************************************************************************************
skipping: [edge-2] => (item=(u'wan', [u'wan-1', u'wan-2']))
skipping: [edge-1] => (item=(u'wan', [u'wan-1', u'wan-2']))
ok: [wan-1] => (item=(u'wan', [u'wan-1', u'wan-2']))
skipping: [wan-1] => (item=(u'edge', [u'edge-1', u'edge-2']))
ok: [wan-2] => (item=(u'wan', [u'wan-1', u'wan-2']))
skipping: [wan-2] => (item=(u'edge', [u'edge-1', u'edge-2']))
ok: [edge-2] => (item=(u'edge', [u'edge-1', u'edge-2']))
ok: [edge-1] => (item=(u'edge', [u'edge-1', u'edge-2']))

...

RUNNING HANDLER [interfaces : reload networking] *******************************************************************************************************************
changed: [edge-2] => (item=ifreload -a)
changed: [edge-1] => (item=ifreload -a)
changed: [wan-1] => (item=ifreload -a)
changed: [wan-2] => (item=ifreload -a)
changed: [edge-2] => (item=sleep 10)
changed: [edge-1] => (item=sleep 10)
changed: [wan-2] => (item=sleep 10)
changed: [wan-1] => (item=sleep 10)

RUNNING HANDLER [routing : reload frr] *****************************************************************************************************************************
changed: [edge-2]
changed: [wan-1]
changed: [wan-2]
changed: [edge-1]

RUNNING HANDLER [ptm : restart ptmd] *******************************************************************************************************************************
changed: [edge-2]
changed: [edge-1]
changed: [wan-2]
changed: [wan-1]

RUNNING HANDLER [ntp : restart ntp] ********************************************************************************************************************************
changed: [wan-1]
changed: [edge-1]
changed: [wan-2]
changed: [edge-2]

RUNNING HANDLER [ifplugd : restart ifplugd] ************************************************************************************************************************
changed: [edge-1]
changed: [wan-1]
changed: [edge-2]
changed: [wan-2]

PLAY RECAP *********************************************************************************************************************************************************
edge-1                     : ok=21   changed=17   unreachable=0    failed=0
edge-2                     : ok=21   changed=17   unreachable=0    failed=0
wan-1                      : ok=21   changed=17   unreachable=0    failed=0
wan-2                      : ok=21   changed=17   unreachable=0    failed=0

[email protected]:~/cumulus-edge-vagrant$

At last but not least I wrote a simple Ansible Playbook for connectivity testing using ping what you can find here: https://github.com/berndonline/cumulus-edge-provision/blob/master/icmp_check.yml

[email protected]:~/cumulus-edge-vagrant$ ansible-playbook ../cumulus-edge-provision/check_icmp.yml

PLAY [exit edge] *********************************************************************************************************************************************************************************************************************

TASK [connectivity check from frontend firewall] *************************************************************************************************************************************************************************************
skipping: [fw-2] => (item=10.0.255.33)
skipping: [fw-2] => (item=10.0.255.17)
skipping: [fw-2] => (item=10.0.255.1)
skipping: [fw-2] => (item=217.0.0.1)
skipping: [edge-2] => (item=10.0.255.33)
skipping: [edge-2] => (item=10.0.255.17)
skipping: [edge-2] => (item=10.0.255.1)
skipping: [edge-1] => (item=10.0.255.33)
skipping: [edge-2] => (item=217.0.0.1)
skipping: [edge-1] => (item=10.0.255.17)
skipping: [edge-1] => (item=10.0.255.1)
skipping: [wan-1] => (item=10.0.255.33)
skipping: [edge-1] => (item=217.0.0.1)
skipping: [wan-1] => (item=10.0.255.17)
skipping: [wan-1] => (item=10.0.255.1)
skipping: [wan-1] => (item=217.0.0.1)
skipping: [wan-2] => (item=10.0.255.33)
skipping: [wan-2] => (item=10.0.255.17)
skipping: [wan-2] => (item=10.0.255.1)
skipping: [wan-2] => (item=217.0.0.1)
changed: [fw-1] => (item=10.0.255.33)
changed: [fw-1] => (item=10.0.255.17)
changed: [fw-1] => (item=10.0.255.1)
changed: [fw-1] => (item=217.0.0.1)
...
PLAY RECAP ***************************************************************************************************************************************************************************************************************************
edge-1                     : ok=2    changed=2    unreachable=0    failed=0
edge-2                     : ok=2    changed=2    unreachable=0    failed=0
fw-1                       : ok=1    changed=1    unreachable=0    failed=0
fw-2                       : ok=1    changed=1    unreachable=0    failed=0
wan-1                      : ok=2    changed=2    unreachable=0    failed=0
wan-2                      : ok=2    changed=2    unreachable=0    failed=0

[email protected]:~/cumulus-edge-vagrant$

The icmp check shows that in general the edge routing is working but I need to do some further testing with this if this can be used in a production environment.

If using switch hardware is not the right fit you can still install and use Free Range Routing (FRR) from Cumulus Networks on other Linux distributions and pick server hardware for your own custom edge router. I would only recommend checking Linux kernel support for VRF when choosing another Linux OS. Also have a look at my article about Open Source Routing GRE over IPSec with StrongSwan and Cisco IOS-XE where I build a Debian software router.

Please share your feedback and leave a comment.

Getting started with Jenkins for Network Automation

As I have mentioned my previous post about Getting started with Gitlab-CI for Network Automation, Jenkins is another continuous integration pipelining tool you can use for network automation. Have a look about how to install Jenkins: https://wiki.jenkins.io/display/JENKINS/Installing+Jenkins+on+Ubuntu

To use the Jenkins with Vagrant and KVM (libvirt) there are a few changes needed on the linux server similar with the Gitlab-Runner. The Jenkins user account needs to be able to control KVM and you need to install the vagrant-libvirt plugin:

usermod -aG libvirtd jenkins
sudo su jenkins
vagrant plugin install vagrant-libvirt

Optional: you may need to copy custom Vagrant boxes into the users vagrant folder ‘/var/lib/jenkins/.vagrant.d/boxes/*’. Note that the Jenkins home directory is not located under /home.

Now lets start configuring a Jenkins CI-pipeline, click on ‘New item’:

This creates an empty pipeline where you need to add the different stages  of what needs to be executed:

Below is an example Jenkins pipeline script which is very similar to the Gitlab-CI pipeline I have used with my Cumulus Linux Lab in the past.

pipeline {
    agent any
    stages {
        stage('Clean and prep workspace') {
            steps {
                sh 'rm -r *'
                git 'https://github.com/berndonline/cumulus-lab-provision'
                sh 'git clone --origin master https://github.com/berndonline/cumulus-lab-vagrant'
            }
        }
        stage('Validate Ansible') {
            steps {
                sh 'bash ./linter.sh'
            }
        }
        stage('Staging') {
            steps {
                sh 'cd ./cumulus-lab-vagrant/ && ./vagrant_create.sh'
                sh 'cd ./cumulus-lab-vagrant/ && bash ../staging.sh'
            }
        }
        stage('Deploy production approval') {
            steps {
                input 'Deploy to prod?'
            }
        }
        stage('Production') {
            steps {
                sh 'cd ./cumulus-lab-vagrant/ && ./vagrant_create.sh'
                sh 'cd ./cumulus-lab-vagrant/ && bash ../production.sh'
            }
        }
    }
}

Let’s run the build pipeline:

The stages get executed one by one and, as you can see below, the production stage has an manual approval build-in that nothing gets deployed to production without someone to approve before, for a controlled production deployment:

Finished pipeline:

This is just a simple example of a network automation pipeline, this can of course be more complex if needed. It should just help you a bit on how to start using Jenkins for network automation.

Please share your feedback and leave a comment.

Ansible Automation with Cisco ASA Multi-Context Mode

I thought I’d share my experience using Ansible and Cisco ASA firewalls in multi-context mode. Right from the beginning I had a few issues deploying the configuration and the switch between the different security context didn’t work well. I got the error you see below when I tried to run a playbook. Other times the changeto context didn’t work well and applied the wrong config:

[email protected]:~$ ansible-playbook -i inventory site.yml --ask-vault-pass
Vault password:

PLAY [all] ***************************************************************************************************************************************************************************

TASK [hostname : set dns and hostname] ***********************************************************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: error: [Errno 61] Connection refused
fatal: [fwcontext01]: FAILED! => {"changed": false, "err": "[Errno 61] Connection refused", "msg": "unable to connect to socket"}
ok: [fwcontext02]

TASK [interfaces : write interfaces config] ******************************************************************************************************************************************
ok: [fwcontext02]

....

After a bit of troubleshooting I found a workaround to limit the amount of processes Ansible use and set this limit to one in the Ansible.cfg. The default is five processes if forks is not defined as far as I remember.

[defaults]
inventory = ./inventory
host_key_checking=False
jinja2_extensions=jinja2.ext.do
forks = 1

In the example inventory file, the “inventory_hostname” variable represents the security context and as you see the “ansible_ssh_host” is set to the IP address of the admin context:

fwcontext01 ansible_ssh_host=192.168.0.1 ansible_ssh_port=22 ansible_ssh_user='ansible' ansible_ssh_pass='cisco'
fwcontext02 ansible_ssh_host=192.168.0.1 ansible_ssh_port=22 ansible_ssh_user='ansible' ansible_ssh_pass='cisco'

When you run the playbook again you can see that the playbook runs successfully but deploys the changes one by one to each firewall security context, the disadvantage is that the playbook takes much longer to run:

[email protected]:~$ ansible-playbook site.yml

PLAY [all] ***************************************************************************************************************************************************************************

TASK [hostname : set dns and hostname] ***********************************************************************************************************************************************
ok: [fwcontext01]
ok: [fwcontext02]

TASK [interfaces : write interfaces config] ******************************************************************************************************************************************
ok: [fwcontext01]
ok: [fwcontext02]

Example site.yml

---

- hosts: all
  connection: local
  gather_facts: 'no'

  vars:
    cli:
      username: "{{ ansible_ssh_user }}"
      password: "{{ ansible_ssh_pass }}"
      host: "{{ ansible_ssh_host }}"

  roles:
    - interfaces

In the example Interface role you see that the context is set to “inventory_hostname” variable:

---

- name: write interfaces config
  asa_config:
    src: "templates/interfaces.j2"
    provider: "{{ cli }}"
    context: "{{ inventory_hostname }}"
  register: result

- name: enable interfaces
  asa_config:
    parents: "interface {{ item.0 }}"
    lines: "no shutdown"
    match: none
    provider: "{{ cli }}"
    context: "{{ inventory_hostname }}"
  when: result.changed
  with_items:
    - "{{ interfaces.items() }}"

After modifying the forks, the Ansible playbook runs well with Cisco ASA in multi-context mode, like mentioned before it is a bit slow to deploy the configuration if I compare this to Cumulus Linux or any other Linux system.

Please share your feedback.

Leave a comment