Kubernetes Cluster API – Provision workload clusters on AWS

The past few months I have been following the progress of the Kubernetes Cluster API which is part of the Kubernetes SIG (special interest group) Cluster-Lifecycle because they made good progress and wanted to try out the AWS provider version to deploy Kubeadm clusters. There are multiple infrastructure / cloud providers available which can be used, have a look at supported providers.

RedHat has based the Machine API Operator for the OpenShift 4 platform on the Kubernetes Cluster API and forked some of the cloud provider integrations but in OpenShift 4 this has a different use-case for the cluster to managed itself without the need of a central management cluster. I actually like RedHat’s concept and adaptation of the Cluster API and I hope we will see something similar in the upstream project.

Bootstrapping workload clusters are pretty straight forward but before we can start with deploying the workload cluster we need a central Kubernetes management cluster for running the Cluster API components for your selected cloud provider. In The Cluster API Book for example they use a KinD (Kubernetes in Docker) cluster to provision the workload clusters.

To deploy the Cluster API components you need the clusterctl (Cluster API) and clusterawsadm (Cluster API AWS Provider) command-line utilities.

curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/v0.3.14/clusterctl-linux-amd64 -o clusterctl
chmod +x ./clusterctl
sudo mv ./clusterctl /usr/local/bin/clusterctl
curl -L https://github.com/kubernetes-sigs/cluster-api-provider-aws/releases/download/v0.6.4/clusterawsadm-linux-amd64 -o clusterawsadm
chmod +x ./clusterawsadm
sudo mv ./clusterawsadm /usr/local/bin/clusterawsadm

Let’s start to prepare to initialise the management cluster. You need a AWS IAM service account and in my example I enabled the experimental features-gates for MachinePool and ClusterResourceSets before running clusterawsadm to apply the required AWS IAM configuration.

$ export AWS_ACCESS_KEY_ID='<-YOUR-ACCESS-KEY->'
$ export AWS_SECRET_ACCESS_KEY='<-YOUR-SECRET-ACCESS-KEY->'
$ export EXP_MACHINE_POOL=true
$ export EXP_CLUSTER_RESOURCE_SET=true
$ clusterawsadm bootstrap iam create-cloudformation-stack
Attempting to create AWS CloudFormation stack cluster-api-provider-aws-sigs-k8s-io
I1206 22:23:19.620891  357601 service.go:59] AWS Cloudformation stack "cluster-api-provider-aws-sigs-k8s-io" already exists, updating

Following resources are in the stack: 

Resource                  |Type                                                                                |Status
AWS::IAM::InstanceProfile |control-plane.cluster-api-provider-aws.sigs.k8s.io                                  |CREATE_COMPLETE
AWS::IAM::InstanceProfile |controllers.cluster-api-provider-aws.sigs.k8s.io                                    |CREATE_COMPLETE
AWS::IAM::InstanceProfile |nodes.cluster-api-provider-aws.sigs.k8s.io                                          |CREATE_COMPLETE
AWS::IAM::ManagedPolicy   |arn:aws:iam::552276840222:policy/control-plane.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE
AWS::IAM::ManagedPolicy   |arn:aws:iam::552276840222:policy/nodes.cluster-api-provider-aws.sigs.k8s.io         |CREATE_COMPLETE
AWS::IAM::ManagedPolicy   |arn:aws:iam::552276840222:policy/controllers.cluster-api-provider-aws.sigs.k8s.io   |CREATE_COMPLETE
AWS::IAM::Role            |control-plane.cluster-api-provider-aws.sigs.k8s.io                                  |CREATE_COMPLETE
AWS::IAM::Role            |controllers.cluster-api-provider-aws.sigs.k8s.io                                    |CREATE_COMPLETE
AWS::IAM::Role            |nodes.cluster-api-provider-aws.sigs.k8s.io                                          |CREATE_COMPLETE

This might take a few minutes before you can continue and run clusterctl to initialise the Cluster API components on your Kubernetes management cluster with the option –watching-namespace where you can apply the cluster deployment manifests.

$ export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm bootstrap credentials encode-as-profile)

WARNING: `encode-as-profile` should only be used for bootstrapping.

$ clusterctl init --infrastructure aws --watching-namespace k8s
Fetching providers
Installing cert-manager Version="v0.16.1"
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v0.3.14" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-aws" Version="v0.6.3" TargetNamespace="capa-system"

Your management cluster has been initialized successfully!

You can now create your first workload cluster by running the following:

  clusterctl config cluster [name] --kubernetes-version [version] | kubectl apply -f -

Now we have finished deploying the needed Cluster API components and are ready to create your first Kubernetes workload cluster. I go through the different custom resources and configuration options for the cluster provisioning. This starts with the cloud infrastructure configuration as you see in the example below for the VPC setup. You don’t have to use all three Availability Zone and can start with a single AZ in a region.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSCluster
metadata:
  name: cluster-1
  namespace: k8s
spec:
  region: eu-west-1
  sshKeyName: default
  networkSpec:
    vpc:
      cidrBlock: "10.0.0.0/23"
    subnets:
    - availabilityZone: eu-west-1a
      cidrBlock: "10.0.0.0/27"
      isPublic: true
    - availabilityZone: eu-west-1b
      cidrBlock: "10.0.0.32/27"
      isPublic: true
    - availabilityZone: eu-west-1c
      cidrBlock: "10.0.0.64/27"
      isPublic: true
    - availabilityZone: eu-west-1a
      cidrBlock: "10.0.1.0/27"
    - availabilityZone: eu-west-1b
      cidrBlock: "10.0.1.32/27"
    - availabilityZone: eu-west-1c
      cidrBlock: "10.0.1.64/27"

Alternatively you can also provision the workload cluster into an existing VPC, in this case your cloud infrastructure configuration looks slightly different and you need to specify VPC and subnet IDs.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSCluster
metadata:
  name: cluster-1
  namespace: k8s
spec:
  region: eu-west-1
  sshKeyName: default
  networkSpec:
    vpc:
      id: vpc-0425c335226437144
    subnets:
    - id: subnet-0261219d564bb0dc5
    - id: subnet-0fdcccba78668e013
...

Next we define the Kubeadm control-plane configuration and start with the AWS Machine Template to define the instance type and custom node configuration. Then follows the Kubeadm control-plane config referencing the machine template and amounts of replicas and Kubernetes control-plane version:

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSMachineTemplate
metadata:
  name: cluster-1
  namespace: k8s
spec:
  template:
    spec:
      iamInstanceProfile: control-plane.cluster-api-provider-aws.sigs.k8s.io
      instanceType: t3.small
      sshKeyName: default
---
apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
kind: KubeadmControlPlane
metadata:
  name: cluster-1-control-plane
  namespace: k8s
spec:
  infrastructureTemplate:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: AWSMachineTemplate
    name: cluster-1-control-plane
  kubeadmConfigSpec:
    clusterConfiguration:
      apiServer:
        extraArgs:
          cloud-provider: aws
      controllerManager:
        extraArgs:
          cloud-provider: aws
    initConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: aws
        name: '{{ ds.meta_data.local_hostname }}'
    joinConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: aws
        name: '{{ ds.meta_data.local_hostname }}'
  replicas: 1
  version: v1.20.4

We continue with the data-plane (worker) nodes which also starts with the AWS machine template, additionally we need a Kubeadm Config Template and then the Machine Deployment for the worker nodes with a number of replicas and used Kubernetes version.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSMachineTemplate
metadata:
  name: cluster-1-data-plane-0
  namespace: k8s
spec:
  template:
    spec:
      iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
      instanceType: t3.small
      sshKeyName: default
---
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
kind: KubeadmConfigTemplate
metadata:
  name: cluster-1-data-plane-0
  namespace: k8s
spec:
  template:
    spec:
      joinConfiguration:
        nodeRegistration:
          kubeletExtraArgs:
            cloud-provider: aws
          name: '{{ ds.meta_data.local_hostname }}'
---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachineDeployment
metadata:
  name: cluster-1-data-plane-0
  namespace: k8s
spec:
  clusterName: cluster-1
  replicas: 1
  selector:
    matchLabels: null
  template:
    metadata:
      labels:
        "nodepool": "nodepool-0"
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
          kind: KubeadmConfigTemplate
          name: cluster-1-data-plane-0
      clusterName: cluster-1
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
        kind: AWSMachineTemplate
        name: cluster-1-data-plane-0
      version: v1.20.4

A workload cluster can be very easily upgraded by changing the .spec.version in the MachineDeployment and KubeadmControlPlane configuration. You can’t jump over a Kubernetes versions and can only upgrade to the next available version example: v1.18.4 to v1.19.8 or v1.19.8 to v1.20.4. See the list of supported AMIs and Kubernetes versions for the AWS provider.

At the beginning we enabled the feature-gates when we were initialising the management cluster to allow us to use ClusterResourceSets. This is incredible useful because I can define a set of resources which gets applied during the provisioning of the cluster. This only get executed one time during the bootstrap and will be not reconciled afterwards. In the configuration you see the reference to two configmaps for adding the Calico CNI plugin and the Nginx Ingress controller.

---
apiVersion: addons.cluster.x-k8s.io/v1alpha3
kind: ClusterResourceSet
metadata:
  name: cluster-1-crs-0
  namespace: k8s
spec:
  clusterSelector:
    matchLabels:
      cluster.x-k8s.io/cluster-name: cluster-1
  resources:
  - kind: ConfigMap
    name: calico-cni
  - kind: ConfigMap
    name: nginx-ingress

Example of the two configmaps which contain the YAML manifests:

apiVersion: v1
kind: ConfigMap
metadata:
  creationTimestamp: null
  name: calico-cni
  namespace: k8s
data:
  calico.yaml: |+
    ---
    # Source: calico/templates/calico-config.yaml
    # This ConfigMap is used to configure a self-hosted Calico installation.
    kind: ConfigMap
    apiVersion: v1
    metadata:
      name: calico-config
      namespace: kube-system
...
---
apiVersion: v1
data:
  deploy.yaml: |+
    ---
    apiVersion: v1
    kind: Namespace
    metadata:
      name: ingress-nginx
      labels:
        app.kubernetes.io/name: ingress-nginx
        app.kubernetes.io/instance: ingress-nginx
...

Without ClusterResourceSet you would need to manually apply the CNI and ingress controller manifests which is not great because you need the CNI plugin for all nodes to go into Ready state.

$ kubectl --kubeconfig=./cluster-1.kubeconfig   apply -f https://docs.projectcalico.org/v3.15/manifests/calico.yaml
$ kubectl --kubeconfig=./cluster-1.kubeconfig apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.41.2/deploy/static/provider/aws/deploy.yaml

Finally after we have created the configuration of the workload cluster we can apply cluster manifest with the option for setting custom clusterNetwork and specify with service and pod IP range.

---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: Cluster
metadata:
  name: cluster-1
  namespace: k8s
  labels:
    cluster.x-k8s.io/cluster-name: cluster-1
spec:
  clusterNetwork:
    services:
      cidrBlocks:
      - 172.30.0.0/16
    pods:
      cidrBlocks:
      - 10.128.0.0/14
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
    kind: KubeadmControlPlane
    name: cluster-1-control-plane
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: AWSCluster
    name: cluster-1

The provisioning of the workload cluster will take around 10 to 15 mins and you can follow the progress by checking the status of different configurations we have applied previously.

You can scale both Kubeadm control-plane and MachineDeployment afterwards to change the size of your cluster. MachineDeployment can be scaled down to zero to save cost.

$ kubectl scale KubeadmControlPlane cluster-1-control-plane --replicas=1
$ kubectl scale MachineDeployment cluster-1-data-plane-0 --replicas=0

After the provisioning is completed you can get kubeconfig of the cluster from the secret which got created during the bootstrap:

$ kubectl --namespace=k8s get secret cluster-1-kubeconfig    -o jsonpath={.data.value} | base64 --decode    > cluster-1.kubeconfig

Example check the node state.

$ kubectl --kubeconfig=./cluster-1.kubeconfig get nodes

When your cluster is provisioned and nodes are in Ready state you can apply the MachineHealthCheck for the data-plane (worker) nodes. This automatically remediate unhealthy nodes and provisions new nodes to join them into the cluster.

---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachineHealthCheck
metadata:
  name: cluster-1-node-unhealthy-5m
  namespace: k8s
spec:
  # clusterName is required to associate this MachineHealthCheck with a particular cluster
  clusterName: cluster-1
  # (Optional) maxUnhealthy prevents further remediation if the cluster is already partially unhealthy
  maxUnhealthy: 40%
  # (Optional) nodeStartupTimeout determines how long a MachineHealthCheck should wait for
  # a Node to join the cluster, before considering a Machine unhealthy
  nodeStartupTimeout: 10m
  # selector is used to determine which Machines should be health checked
  selector:
    matchLabels:
      nodepool: nodepool-0 
  # Conditions to check on Nodes for matched Machines, if any condition is matched for the duration of its timeout, the Machine is considered unhealthy
  unhealthyConditions:
  - type: Ready
    status: Unknown
    timeout: 300s
  - type: Ready
    status: "False"
    timeout: 300s

I hope this is a useful article for getting started with the Kubernetes Cluster API.

Deploy Amazon EC2 Autoscaling Group and AWS Load Balancers with Terraform

This is the next article about using Terraform to create EC2 autoscaling group and the different load balancing options for EC2 instances. This setup depends on my previous blog post about using Terraform to deploy a AWS VPC so please read this first. In my Github repository you will find all the needed Terraform files ec2.tf and vpc.tf to deploy the full environment.

EC2 resource overview:

Let’s start with the launch configuration and creating the autoscaling group. I am using eu-west-1 and a standard Ubuntu 16.04 AMI. The instances are created in the private subnet and don’t get a public IP address assigned but have internet access via the NAT gateway:

resource "aws_launch_configuration" "autoscale_launch" {
  image_id = "${lookup(var.aws_amis, var.aws_region)}"
  instance_type = "t2.micro"
  security_groups = ["${aws_security_group.sec_web.id}"]
  key_name = "${aws_key_pair.auth.id}"
  user_data = <<-EOF
              #!/bin/bash
              sudo apt-get -y update
              sudo apt-get -y install nginx
              EOF
  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "autoscale_group" {
  launch_configuration = "${aws_launch_configuration.autoscale_launch.id}"
  vpc_zone_identifier = ["${aws_subnet.PrivateSubnetA.id}","${aws_subnet.PrivateSubnetB.id}","${aws_subnet.PrivateSubnetC.id}"]
  load_balancers = ["${aws_elb.elb.name}"]
  min_size = 3
  max_size = 3
  tag {
    key = "Name"
    value = "autoscale"
    propagate_at_launch = true
  }
}

I also created a few security groups to allow the traffic,  please have look for more detail in the ec2.tf.

Autoscaling Group

Now the configuration for a AWS Elastic (Classic) Load Balancer:

resource "aws_elb" "elb" {
  name = "elb"
  security_groups = ["${aws_security_group.sec_lb.id}"]
  subnets            = ["${aws_subnet.PublicSubnetA.id}","${aws_subnet.PublicSubnetB.id}","${aws_subnet.PublicSubnetC.id}"]
  cross_zone_load_balancing   = true
  health_check {
    healthy_threshold = 2
    unhealthy_threshold = 2
    timeout = 3
    interval = 30
    target = "HTTP:80/"
  }
  listener {
    lb_port = 80
    lb_protocol = "http"
    instance_port = "80"
    instance_protocol = "http"
  }
}

Elastic Load Balancer (Classic LB)

Use the Application Load Balancing (ALB) for more advanced web load balancing which only support http and https protocols. You start with creating the ALB resource, afterwards creating the target group where you can define stickiness and health checks. The listener defines which protocol type the ALB uses and assigns the target group. In the end you attach the target- with the autoscaling group:

resource "aws_lb" "alb" {  
  name            = "alb"  
  subnets         = ["${aws_subnet.PublicSubnetA.id}","${aws_subnet.PublicSubnetB.id}","${aws_subnet.PublicSubnetC.id}"]
  security_groups = ["${aws_security_group.sec_lb.id}"]
  internal        = false 
  idle_timeout    = 60   
  tags {    
    Name    = "alb"    
  }   
}

resource "aws_lb_target_group" "alb_target_group" {  
  name     = "alb-target-group"  
  port     = "80"  
  protocol = "HTTP"  
  vpc_id   = "${aws_vpc.default.id}"   
  tags {    
    name = "alb_target_group"    
  }   
  stickiness {    
    type            = "lb_cookie"    
    cookie_duration = 1800    
    enabled         = true 
  }   
  health_check {    
    healthy_threshold   = 3    
    unhealthy_threshold = 10    
    timeout             = 5    
    interval            = 10    
    path                = "/"    
    port                = 80
  }
}

resource "aws_lb_listener" "alb_listener" {  
  load_balancer_arn = "${aws_lb.alb.arn}"  
  port              = 80  
  protocol          = "http"
  
  default_action {    
    target_group_arn = "${aws_lb_target_group.alb_target_group.arn}"
    type             = "forward"  
  }
}

resource "aws_autoscaling_attachment" "alb_autoscale" {
  alb_target_group_arn   = "${aws_lb_target_group.alb_target_group.arn}"
  autoscaling_group_name = "${aws_autoscaling_group.autoscale_group.id}"
}

Application Load Balancer (ALB)

ALB Target Group

The Network Load Balancing (NLB) is very similar to the configuration like the ALB only that it supports the TCP protocol which should be only used for performance because of the limited health check functionality:

resource "aws_lb" "nlb" {
  name               = "nlb"
  internal           = false
  load_balancer_type = "network"
  subnets            = ["${aws_subnet.PublicSubnetA.id}","${aws_subnet.PublicSubnetB.id}","${aws_subnet.PublicSubnetC.id}"]
  enable_cross_zone_load_balancing  = true
  tags {
    Name = "nlb"
  }
}

resource "aws_lb_target_group" "nlb_target_group" {  
  name     = "nlb-target-group"  
  port     = "80"  
  protocol = "TCP"  
  vpc_id   = "${aws_vpc.default.id}"   
  tags {    
    name = "nlb_target_group"    
  }     
}

resource "aws_lb_listener" "nlb_listener" {  
  load_balancer_arn = "${aws_lb.nlb.arn}"  
  port              = 80  
  protocol          = "TCP"
  
  default_action {    
    target_group_arn = "${aws_lb_target_group.nlb_target_group.arn}"
    type             = "forward"  
  }
}

resource "aws_autoscaling_attachment" "nlb_autoscale" {
  alb_target_group_arn   = "${aws_lb_target_group.nlb_target_group.arn}"
  autoscaling_group_name = "${aws_autoscaling_group.autoscale_group.id}"
}

Network Load Balancer (NLB)

NLB Target Group

Let’s run terraform apply:

[email protected]:~/aws-terraform$ terraform apply
data.aws_availability_zones.available: Refreshing state...

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  + aws_autoscaling_attachment.alb_autoscale
      id:                                          
      alb_target_group_arn:                        "${aws_lb_target_group.alb_target_group.arn}"
      autoscaling_group_name:                      "${aws_autoscaling_group.autoscale_group.id}"

  + aws_autoscaling_attachment.nlb_autoscale
      id:                                          
      alb_target_group_arn:                        "${aws_lb_target_group.nlb_target_group.arn}"
      autoscaling_group_name:                      "${aws_autoscaling_group.autoscale_group.id}"

...

Plan: 41 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

...

aws_lb.nlb: Creation complete after 2m53s (ID: arn:aws:elasticloadbalancing:eu-west-1:...:loadbalancer/net/nlb/235e69c61779b723)
aws_lb_listener.nlb_listener: Creating...
  arn:                               "" => ""
  default_action.#:                  "" => "1"
  default_action.0.target_group_arn: "" => "arn:aws:elasticloadbalancing:eu-west-1:552276840222:targetgroup/nlb-target-group/7b3c10cbdd411669"
  default_action.0.type:             "" => "forward"
  load_balancer_arn:                 "" => "arn:aws:elasticloadbalancing:eu-west-1:552276840222:loadbalancer/net/nlb/235e69c61779b723"
  port:                              "" => "80"
  protocol:                          "" => "TCP"
  ssl_policy:                        "" => ""
aws_lb_listener.nlb_listener: Creation complete after 0s (ID: arn:aws:elasticloadbalancing:eu-west-1:.../nlb/235e69c61779b723/dfde2530387b470f)

Apply complete! Resources: 41 added, 0 changed, 0 destroyed.

Outputs:

alb_dns_name = alb-1295224636.eu-west-1.elb.amazonaws.com
elb_dns_name = elb-611107604.eu-west-1.elb.amazonaws.com
nlb_dns_name = nlb-235e69c61779b723.elb.eu-west-1.amazonaws.com
[email protected]:~/aws-terraform$

Together with the VPC configuration from my previous article, this deploys the different load balancers and provides you the DNS names as an output and ready to use.

Over the coming weeks I will optimise the Terraform code and move some of the resource settings into the variables.tf file to make this more scaleable.

If you like this article, please share your feedback and leave a comment.

Using HashiCorp Terraform to deploy Amazon AWS VPC

Before I start deploying the AWS VPC with HashCorp’s Terraform I want to explain the design of the Virtual Private Cloud. The main focus here is primarily for redundancy to ensure that if one Availability Zone (AZ) becomes unavailable that it is not interrupting the traffic and causing outages in your network, the NAT Gateway for example run per AZ so you need to make sure that these services are spread over multiple AZs.

AWS VPC network overview:

Before you start using Terraform you need to install the binary and it is also very useful to install the AWS command line interface. Please don’t forget to register the AWS CLI and add access and secure key.

pip install awscli --upgrade --user
wget https://releases.hashicorp.com/terraform/0.11.7/terraform_0.11.7_linux_amd64.zip
unzip terraform_0.11.7_linux_amd64.zip
sudo mv terraform /usr/local/bin/

Terraform is a great product and creates infrastructure as code, and is independent from any cloud provider so there is no need to use AWS CloudFormation like in my example. My repository for the Terraform files can be found here: https://github.com/berndonline/aws-terraform

Let’s start with the variables file, which defines the needed settings for deploying the VPC. Basically you only need to change the variables to deploy the VPC to another AWS region:

...
variable "aws_region" {
  description = "AWS region to launch servers."
  default     = "eu-west-1"
}
...
variable "vpc_cidr" {
    default = "10.0.0.0/20"
  description = "the vpc cdir range"
}
variable "public_subnet_a" {
  default = "10.0.0.0/24"
  description = "Public subnet AZ A"
}
variable "public_subnet_b" {
  default = "10.0.4.0/24"
  description = "Public subnet AZ A"
}
variable "public_subnet_c" {
  default = "10.0.8.0/24"
  description = "Public subnet AZ A"
}
...

The vpc.tf file is the Terraform template which deploys the private and public subnets, the internet gateway, multiple NAT gateways and the different routing tables and adds the needed routes towards the internet:

# Create a VPC to launch our instances into
resource "aws_vpc" "default" {
    cidr_block = "${var.vpc_cidr}"
    enable_dns_support = true
    enable_dns_hostnames = true
    tags {
      Name = "VPC"
    }
}

resource "aws_subnet" "PublicSubnetA" {
  vpc_id = "${aws_vpc.default.id}"
  cidr_block = "${var.public_subnet_a}"
  tags {
        Name = "Public Subnet A"
  }
 availability_zone = "${data.aws_availability_zones.available.names[0]}"
}
...

In the main.tf you define which provider to use:

# Specify the provider and access details
provider "aws" {
  region = "${var.aws_region}"
}

# Declare the data source
data "aws_availability_zones" "available" {}

Now let’s start deploying the environment, first you need to initialise Terraform “terraform init“:

[email protected]:~/aws-terraform$ terraform init

Initializing provider plugins...
- Checking for available provider plugins on https://releases.hashicorp.com...
- Downloading plugin for provider "aws" (1.25.0)...

The following providers do not have any version constraints in configuration,
so the latest version was installed.

To prevent automatic upgrades to new major versions that may contain breaking
changes, it is recommended to add version = "..." constraints to the
corresponding provider blocks in configuration, with the constraint strings
suggested below.

* provider.aws: version = "~> 1.25"

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
[email protected]:~/aws-terraform$

Next, let’s do a dry run “terraform plan” to see all changes Terraform would apply:

[email protected]:~/aws-terraform$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.aws_availability_zones.available: Refreshing state...

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  + aws_eip.natgw_a
      id:                                          
      allocation_id:                               
      association_id:                              
      domain:                                      
      instance:                                    
      network_interface:                           
      private_ip:                                  
      public_ip:                                   
      vpc:                                         "true"

...

  + aws_vpc.default
      id:                                          
      assign_generated_ipv6_cidr_block:            "false"
      cidr_block:                                  "10.0.0.0/20"
      default_network_acl_id:                      
      default_route_table_id:                      
      default_security_group_id:                   
      dhcp_options_id:                             
      enable_classiclink:                          
      enable_classiclink_dns_support:              
      enable_dns_hostnames:                        "true"
      enable_dns_support:                          "true"
      instance_tenancy:                            "default"
      ipv6_association_id:                         
      ipv6_cidr_block:                             
      main_route_table_id:                         
      tags.%:                                      "1"
      tags.Name:                                   "VPC"


Plan: 27 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

Note: You didn't specify an "-out" parameter to save this plan, so Terraform
can't guarantee that exactly these actions will be performed if
"terraform apply" is subsequently run.

[email protected]:~/aws-terraform$

Because nothing is deployed, Terraform would apply 27 changes, so let’s do this by running “terraform apply“. Terraform will check the state and will ask you to confirm and then apply the changes:

[email protected]:~/aws-terraform$ terraform apply
data.aws_availability_zones.available: Refreshing state...

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  + aws_eip.natgw_a
      id:                                          
      allocation_id:                               
      association_id:                              
      domain:                                      
      instance:                                    
      network_interface:                           
      private_ip:                                  
      public_ip:                                   
      vpc:                                         "true"

...

  + aws_vpc.default
      id:                                          
      assign_generated_ipv6_cidr_block:            "false"
      cidr_block:                                  "10.0.0.0/20"
      default_network_acl_id:                      
      default_route_table_id:                      
      default_security_group_id:                   
      dhcp_options_id:                             
      enable_classiclink:                          
      enable_classiclink_dns_support:              
      enable_dns_hostnames:                        "true"
      enable_dns_support:                          "true"
      instance_tenancy:                            "default"
      ipv6_association_id:                         
      ipv6_cidr_block:                             
      main_route_table_id:                         
      tags.%:                                      "1"
      tags.Name:                                   "VPC"


Plan: 27 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aws_eip.natgw_c: Creating...
  allocation_id:     "" => ""
  association_id:    "" => ""
  domain:            "" => ""
  instance:          "" => ""
  network_interface: "" => ""
  private_ip:        "" => ""
  public_ip:         "" => ""
  vpc:               "" => "true"
aws_eip.natgw_a: Creating...
  allocation_id:     "" => ""
  association_id:    "" => ""
  domain:            "" => ""
  instance:          "" => ""
  network_interface: "" => ""
  private_ip:        "" => ""
  public_ip:         "" => ""
  vpc:               "" => "true"

...

aws_route_table_association.PrivateSubnetB: Creation complete after 0s (ID: rtbassoc-174ba16c)
aws_nat_gateway.public_nat_c: Still creating... (1m40s elapsed)
aws_nat_gateway.public_nat_c: Still creating... (1m50s elapsed)
aws_nat_gateway.public_nat_c: Creation complete after 1m56s (ID: nat-093319a1fa62c3eda)
aws_route_table.private_route_c: Creating...
  propagating_vgws.#:                         "" => ""
  route.#:                                    "" => "1"
  route.4170986711.cidr_block:                "" => "0.0.0.0/0"
  route.4170986711.egress_only_gateway_id:    "" => ""
  route.4170986711.gateway_id:                "" => ""
  route.4170986711.instance_id:               "" => ""
  route.4170986711.ipv6_cidr_block:           "" => ""
  route.4170986711.nat_gateway_id:            "" => "nat-093319a1fa62c3eda"
  route.4170986711.network_interface_id:      "" => ""
  route.4170986711.vpc_peering_connection_id: "" => ""
  tags.%:                                     "" => "1"
  tags.Name:                                  "" => "Private Route C"
  vpc_id:                                     "" => "vpc-fdffb19b"
aws_route_table.private_route_c: Creation complete after 1s (ID: rtb-d64632af)
aws_route_table_association.PrivateSubnetC: Creating...
  route_table_id: "" => "rtb-d64632af"
  subnet_id:      "" => "subnet-17da194d"
aws_route_table_association.PrivateSubnetC: Creation complete after 1s (ID: rtbassoc-35749e4e)

Apply complete! Resources: 27 added, 0 changed, 0 destroyed.
[email protected]:~/aws-terraform$

Terraform successfully applied all the changes so let’s have a quick look in the AWS web console:

You can change the environment and run “terraform apply” again and Terraform would deploy the changes you have made. In my example below I didn’t, so Terraform would do nothing because it tracks the state that is deployed and that I have defined in the vpc.tf:

[email protected]:~/aws-terraform$ terraform apply
aws_eip.natgw_c: Refreshing state... (ID: eipalloc-7fa0eb42)
aws_vpc.default: Refreshing state... (ID: vpc-fdffb19b)
aws_eip.natgw_a: Refreshing state... (ID: eipalloc-3ca7ec01)
aws_eip.natgw_b: Refreshing state... (ID: eipalloc-e6bbf0db)
data.aws_availability_zones.available: Refreshing state...
aws_subnet.PublicSubnetC: Refreshing state... (ID: subnet-d6e4278c)
aws_subnet.PrivateSubnetC: Refreshing state... (ID: subnet-17da194d)
aws_subnet.PrivateSubnetA: Refreshing state... (ID: subnet-6ea62708)
aws_subnet.PublicSubnetA: Refreshing state... (ID: subnet-1ab0317c)
aws_network_acl.all: Refreshing state... (ID: acl-c75f9ebe)
aws_internet_gateway.gw: Refreshing state... (ID: igw-27652940)
aws_subnet.PrivateSubnetB: Refreshing state... (ID: subnet-ab59c8e3)
aws_subnet.PublicSubnetB: Refreshing state... (ID: subnet-4a51c002)
aws_route_table.public_route_b: Refreshing state... (ID: rtb-a45d29dd)
aws_route_table.public_route_a: Refreshing state... (ID: rtb-5b423622)
aws_route_table.public_route_c: Refreshing state... (ID: rtb-0453277d)
aws_nat_gateway.public_nat_b: Refreshing state... (ID: nat-0376fc652d362a3b1)
aws_nat_gateway.public_nat_a: Refreshing state... (ID: nat-073ed904d4cf2d30e)
aws_route_table_association.PublicSubnetA: Refreshing state... (ID: rtbassoc-b14ba1ca)
aws_route_table_association.PublicSubnetB: Refreshing state... (ID: rtbassoc-277d975c)
aws_route_table.private_route_a: Refreshing state... (ID: rtb-0745317e)
aws_route_table.private_route_b: Refreshing state... (ID: rtb-a15a2ed8)
aws_route_table_association.PrivateSubnetB: Refreshing state... (ID: rtbassoc-174ba16c)
aws_route_table_association.PrivateSubnetA: Refreshing state... (ID: rtbassoc-60759f1b)
aws_nat_gateway.public_nat_c: Refreshing state... (ID: nat-093319a1fa62c3eda)
aws_route_table_association.PublicSubnetC: Refreshing state... (ID: rtbassoc-307e944b)
aws_route_table.private_route_c: Refreshing state... (ID: rtb-d64632af)
aws_route_table_association.PrivateSubnetC: Refreshing state... (ID: rtbassoc-35749e4e)

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
[email protected]:~/aws-terraform$

To remove the environment use run “terraform destroy“:

[email protected]:~/aws-terraform$ terraform destroy
aws_eip.natgw_c: Refreshing state... (ID: eipalloc-7fa0eb42)
data.aws_availability_zones.available: Refreshing state...
aws_eip.natgw_a: Refreshing state... (ID: eipalloc-3ca7ec01)
aws_vpc.default: Refreshing state... (ID: vpc-fdffb19b)
aws_eip.natgw_b: Refreshing state... (ID: eipalloc-e6bbf0db)

...

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

  - aws_eip.natgw_a

  - aws_eip.natgw_b

  - aws_eip.natgw_c

...

Plan: 0 to add, 0 to change, 27 to destroy.

Do you really want to destroy?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes

aws_network_acl.all: Destroying... (ID: acl-c75f9ebe)
aws_route_table_association.PrivateSubnetA: Destroying... (ID: rtbassoc-60759f1b)
aws_route_table_association.PublicSubnetC: Destroying... (ID: rtbassoc-307e944b)
aws_route_table_association.PublicSubnetA: Destroying... (ID: rtbassoc-b14ba1ca)
aws_route_table_association.PublicSubnetB: Destroying... (ID: rtbassoc-277d975c)
aws_route_table_association.PrivateSubnetC: Destroying... (ID: rtbassoc-35749e4e)
aws_route_table_association.PrivateSubnetB: Destroying... (ID: rtbassoc-174ba16c)
aws_route_table_association.PrivateSubnetB: Destruction complete after 0s

...

aws_internet_gateway.gw: Destroying... (ID: igw-27652940)
aws_eip.natgw_c: Destroying... (ID: eipalloc-7fa0eb42)
aws_subnet.PrivateSubnetC: Destroying... (ID: subnet-17da194d)
aws_subnet.PrivateSubnetC: Destruction complete after 1s
aws_eip.natgw_c: Destruction complete after 1s
aws_internet_gateway.gw: Still destroying... (ID: igw-27652940, 10s elapsed)
aws_internet_gateway.gw: Destruction complete after 11s
aws_vpc.default: Destroying... (ID: vpc-fdffb19b)
aws_vpc.default: Destruction complete after 0s

Destroy complete! Resources: 27 destroyed.
[email protected]:~/aws-terraform$

I hope this article was informative and explains how to deploy a VPC with Terraform. In the coming weeks I will add additional functions like deploying EC2 Instances and Load Balancing.

Please share your feedback and leave a comment.

Open Source Routing GRE over IPSec with StrongSwan and Cisco IOS-XE

In my previous post about the Ansible Playbook for VyOS and BGP Routing, I wrote that I was looking for some Open Source alternatives for software routers to use in AWS Transit VPCs.

Here is the example using a Debian Linux,  FRR (Free Range Routing) and StrongSwan connecting over a GRE over IPSec tunnel to a Cisco IOS-XE (CSRv) router:

You can find the Vagrantfile in my Github repo https://github.com/berndonline/debian-router-vagrant. During the boot Ansible runs and pre-configures both nodes but continue reading about the detailed configuration:

sudo apt-get update
sudo apt-get upgrade -y

Enable IP routing by adding the following line to /etc/sysctl.conf:

sudo vi /etc/sysctl.conf
net.ipv4.ip_forward = 1
sudo sysctl -p /etc/sysctl.conf

Download the latest FRR release for Debian 9 x86_64 from https://github.com/FRRouting/frr/releases

Install FRR and don’t worry about any dependency errors from the first command, the second command will install the missing dependencies. Next, enable the needed FRR daemons and start the service:

wget https://github.com/FRRouting/frr/releases/download/frr-3.0.3/frr_3.0.3-1_debian9.1_amd64.deb
wget https://github.com/FRRouting/frr/releases/download/frr-3.0.3/frr-pythontools_3.0.3-1_debian9.1_all.deb
wget https://github.com/FRRouting/frr/releases/download/frr-3.0.3/frr-doc_3.0.3-1_debian9.1_all.deb
sudo dpkg -i frr_3.0.3-1_debian9.1_amd64.deb frr-pythontools_3.0.3-1_debian9.1_all.deb frr-doc_3.0.3-1_debian9.1_all.deb
sudo apt-get install -f -y

sudo bash -c 'cat << EOF > /etc/frr/daemons
zebra=yes
bgpd=yes
ospfd=no
ospf6d=no
ripd=no
ripngd=no
isisd=no
pimd=no
ldpd=no
nhrpd=no
EOF'

sudo bash -c 'cat << EOF > /etc/frr/frr.conf
!
frr version 3.0.3
frr defaults traditional
no ipv6 forwarding
!
router bgp 65001
 neighbor 192.168.0.2 remote-as 65002
 !
 address-family ipv4 unicast
  network 10.255.0.1/32
 exit-address-family
 vnc defaults
  response-lifetime 3600
  exit-vnc
!
line vty
!
EOF'

sudo systemctl enable frr
sudo systemctl start frr

Install StrongSwan and change a few settings before you can enable and start the service:

sudo apt-get install -y strongswan-swanctl charon-systemd

sudo bash -c 'cat << EOF > /etc/strongswan.d/charon/connmark.conf
connmark {
 
    # Disable connmark plugin
    # Needed for tunnels - see https://wiki.strongswan.org/projects/strongswan/wiki/RouteBasedVPN
    load = no
 
}
EOF'
sudo bash -c 'cat << EOF > /etc/strongswan.d/charon.conf
charon {
 
    # Cisco IKEv2 wants to use reauth - need to set make_before_break otherwise
    # strongSwan will have a very brief outage during IKEv2 reauth
    make_before_break = yes
 
    # Needed for tunnels - see https://wiki.strongswan.org/projects/strongswan/wiki/RouteBasedVPN
    install_routes = no
 
}
EOF'

sudo systemctl enable strongswan-swanctl
sudo systemctl start strongswan-swanctl

Setting TCP MSS to path MTU with iptables:

sudo DEBIAN_FRONTEND=noninteractive apt-get install -y -q iptables-persistent

sudo bash -c 'cat << EOF > /etc/iptables/rules.v4
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
COMMIT
*mangle
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
-A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
COMMIT
EOF'

Let us continue with the Debian router interface configuration, here you also find the GRE tunnel settings:

sudo bash -c 'cat << EOF > /etc/network/interfaces
source-directory /etc/network/interfaces.d
auto lo
iface lo inet loopback

auto lo:1
iface lo:1 inet static
      address 10.255.0.1
      netmask 255.255.255.255

auto ens5
iface ens5 inet dhcp

auto ens6
iface ens6 inet static
      address 10.0.0.1
      netmask 255.255.255.0

auto gre1
iface gre1 inet tunnel
      address 192.168.0.1
      netmask 255.255.255.0
      mode gre
      endpoint 10.0.0.2
EOF'

sudo systemctl restart networking

In StrongSwan you configure the IPSec settings:

sudo bash -c 'cat << EOF > /etc/swanctl/swanctl.conf
connections {
    my-vpn {
        remote_addrs = 10.0.0.2
        version = 1
        proposals = aes256-sha1-modp1536
        reauth_time = 1440m
        local-1 {
            auth = psk
            id = debian-router.domain.com
        }
        remote-1 {
            # id field here is inferred from the remote address
            auth = psk
        }
        children {
            my-vpn-1 {
                local_ts = 10.0.0.1/32[gre]
                remote_ts = 10.0.0.2/32[gre]
                mode = transport
                esp_proposals = aes128-sha1-modp1536
                rekey_time = 60m
                start_action = trap
                dpd_action = restart
            }
        }
    }

}
secrets {
    ike-my-vpn-1 {
        id-1 = cisco-iosxe.domain.com
        id-2 = 10.0.0.2
        secret = "secret"
    }
}
EOF'

sudo systemctl restart strongswan-swanctl

We finished the Debian host configuration and continue with the Cisco  router configuration to connect the Debian router to the tunnel 0 interface on the Cisco router:

conf t
hostname cisco-iosxe

crypto keyring my-keyring  
  pre-shared-key address 10.0.0.1 key secret
  exit

crypto isakmp policy 20
 encr aes 256
 authentication pre-share
 group 5
crypto isakmp identity hostname
crypto isakmp profile my-isakmp-profile
   keyring my-keyring
   match identity host debian-router.domain.com
   exit
crypto ipsec transform-set my-transform-set esp-aes esp-sha-hmac
 mode transport
 exit
crypto ipsec profile my-ipsec-profile
 set transform-set my-transform-set
 set pfs group5
 set isakmp-profile my-isakmp-profile
 exit

interface Loopback1
 ip address 10.255.0.2 255.255.255.255
 exit

interface Tunnel0
 ip address 192.168.0.2 255.255.255.0
 tunnel source GigabitEthernet2
 tunnel destination 10.0.0.1
 tunnel protection ipsec profile my-ipsec-profile
 no shut
 exit

interface GigabitEthernet2
 ip address 10.0.0.2 255.255.255.0
 no shut
 exit

router bgp 65002
 bgp log-neighbor-changes
 neighbor 192.168.0.1 remote-as 65001
 address-family ipv4
  network 10.255.0.2 mask 255.255.255.255
  neighbor 192.168.0.1 activate
 exit-address-family
 exit

exit
wr mem

Clone my Github repo https://github.com/berndonline/debian-router-vagrant and boot the environment with “./vagrant_up.sh”. After the two VMs are booted wait a few seconds and run the validation playbook to check the connectivity between the nodes:

[email protected]:~/debian-router-vagrant$ ansible-playbook ./validate_connectivity.yml

PLAY [debian-router] ***********************************************************************************************************************************************

TASK [check connectivity from debian router] ***********************************************************************************************************************
changed: [debian-router]

PLAY [cisco-iosxe] *************************************************************************************************************************************************

TASK [check connectivity from cisco iosxe] *************************************************************************************************************************
ok: [cisco-iosxe]

PLAY RECAP *********************************************************************************************************************************************************
cisco-iosxe                : ok=1    changed=0    unreachable=0    failed=0
debian-router              : ok=1    changed=1    unreachable=0    failed=0

[email protected]:~/debian-router-vagrant$

I will continue improving the config, and do some more testing with AWS VPN gateways (VGW).

Please share your feedback.

Leave a comment