Deploy OpenShift 3.11 Container Platform on Google Cloud Platform using Terraform

Over the past few days I have converted the OpenShift 3.11 infrastructure on Amazon AWS to run on Google Cloud Platform. I have kept the similar VPC network layout and instances to run OpenShift.

Before you start you need to create a project on Google Cloud Platform, then continue to create the service account and generate the private key and download the credential as JSON file.

Create the new project:

Create the service account:

Give the service account compute admin and storage object creator permissions:

Then create a storage bucket for the Terraform backend state and assign the correct bucket permission to the terraform service account:

Bucket permissions:

To start, clone my openshift-terraform github repository and checkout the google-dev branch:

git clone https://github.com/berndonline/openshift-terraform.git
cd ./openshift-terraform/ && git checkout google-dev

Add your previously downloaded credentials json file:

cat << EOF > ./credentials.json
{
  "type": "service_account",
  "project_id": "<--your-project-->",
  "private_key_id": "<--your-key-id-->",
  "private_key": "-----BEGIN PRIVATE KEY-----

...

}
EOF

There are a few things you need to modify in the main.tf and variables.tf before you can start:

...
terraform {
  backend "gcs" {
    bucket    = "<--your-bucket-name-->"
    prefix    = "openshift-311"
    credentials = "credentials.json"
  }
}
...
...
variable "gcp_region" {
  description = "Google Compute Platform region to launch servers."
  default     = "europe-west3"
}
variable "gcp_project" {
  description = "Google Compute Platform project name."
  default     = "<--your-project-name-->"
}
variable "gcp_zone" {
  type = "string"
  default = "europe-west3-a"
  description = "The zone to provision into"
}
...

Add the needed environment variables to apply changes to CloudFlare DNS:

export TF_VAR_email='<-YOUR-CLOUDFLARE-EMAIL-ADDRESS->'
export TF_VAR_token='<-YOUR-CLOUDFLARE-TOKEN->'
export TF_VAR_domain='<-YOUR-CLOUDFLARE-DOMAIN->'
export TF_VAR_htpasswd='<-YOUR-OPENSHIFT-DEMO-USER-HTPASSWD->'

Let’s start creating the infrastructure and verify afterwards the created resources on GCP.

terraform init && terraform apply -auto-approve

VPC and public and private subnets in region europe-west3:

Created instances:

Created load balancers for master and infra nodes:

Copy the ssh key and ansible-hosts file to the bastion host from where you need to run the Ansible OpenShift playbooks.

scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -r ./helper_scripts/id_rsa centos@$(terraform output bastion):/home/centos/.ssh/
scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -r ./inventory/ansible-hosts  centos@$(terraform output bastion):/home/centos/ansible-hosts

I recommend waiting a few minutes as the cloud-init script prepares the bastion host. Afterwards continue with the pre and install playbooks. You can connect to the bastion host and run the playbooks directly.

ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -l centos $(terraform output bastion) -A "cd /openshift-ansible/ && ansible-playbook ./playbooks/openshift-pre.yml -i ~/ansible-hosts"
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -l centos $(terraform output bastion) -A "cd /openshift-ansible/ && ansible-playbook ./playbooks/openshift-install.yml -i ~/ansible-hosts"

After the installation is completed, continue to create your project and applications:

When you are finished with the testing, run terraform destroy.

terraform destroy -force 

Please share your feedback and leave a comment.

OpenShift Infra node “Not Ready” running Avi Service Engine

I had to troubleshoot an interesting issue with OpenShift Infra nodes suddenly going into “Not Ready” state during an OpenShift upgrade or not registering on Master nodes after a re-install of OpenShift cluster. On the Infra nodes Avi Service Engines were running for ingress traffic. The problem was not very obvious and RedHat and Avi Networks were not able to identify the issue.

Here the output of oc get nodes:

[root@master01 ~]# oc get nodes -o wide --show-labels | grep 'region=infra'
infra01   NotReady                   1d        v1.7.6+a08f5eeb62           Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-693.11.6.el7.x86_64   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D8_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=uksouth,failure-domain.beta.kubernetes.io/zone=1,kubernetes.io/hostname=infra01,logging-infra-fluentd=true,purpose=infra,region=infra,zone=1
infra02   NotReady                   1d        v1.7.6+a08f5eeb62           Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-693.11.6.el7.x86_64   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D8_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=uksouth,failure-domain.beta.kubernetes.io/zone=1,kubernetes.io/hostname=infra02,logging-infra-fluentd=true,purpose=infra,region=infra,zone=0
infra03   NotReady                   1d        v1.7.6+a08f5eeb62           Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-693.11.6.el7.x86_64   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D8_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=uksouth,failure-domain.beta.kubernetes.io/zone=0,kubernetes.io/hostname=infra03,logging-infra-fluentd=true,purpose=infra,region=infra,zone=2
[root@master01 ~]#

On the Infra node itself you could see that the atomic node service had successfully started but I saw a very strange error message from the kubelet not being able to synchronise the pod:

I1206 14:52:28.115735   21690 cloud_request_manager.go:89] Requesting node addresses from cloud provider for node "infra01"
I1206 14:52:28.170366   21690 cloud_request_manager.go:108] Node addresses from cloud provider for node "infra01" collected
E1206 14:52:28.533560   21690 eviction_manager.go:238] eviction manager: unexpected err: failed GetNode: node 'infra01' not found
I1206 14:52:32.840769   21690 kubelet.go:1808] skipping pod synchronization - [Kubelet failed to get node info: failed to get zone from cloud provider: Get http://169.254.169.254/metadata/v1/InstanceInfo: dial tcp 169.254.169.254:80: getsockopt: no route to host]
I1206 14:52:37.841235   21690 kubelet.go:1808] skipping pod synchronization - [Kubelet failed to get node info: failed to get zone from cloud provider: Get http://169.254.169.254/metadata/v1/InstanceInfo: dial tcp 169.254.169.254:80: getsockopt: no route to host]
I1206 14:52:38.170604   21690 cloud_request_manager.go:89] Requesting node addresses from cloud provider for node "infra01"
I1206 14:52:38.222439   21690 cloud_request_manager.go:108] Node addresses from cloud provider for node "infra01" collected
E1206 14:52:38.545991   21690 eviction_manager.go:238] eviction manager: unexpected err: failed GetNode: node 'infra01' not found
I1206 14:52:42.841547   21690 kubelet.go:1808] skipping pod synchronization - [Kubelet failed to get node info: failed to get zone from cloud provider: Get http://169.254.169.254/metadata/v1/InstanceInfo: dial tcp 169.254.169.254:80: getsockopt: no route to host]
I1206 14:52:47.841819   21690 kubelet.go:1808] skipping pod synchronization - [Kubelet failed to get node info: failed to get zone from cloud provider: Get http://169.254.169.254/metadata/v1/InstanceInfo: dial tcp 169.254.169.254:80: getsockopt: no route to host]

Even stranger is that the kubelet was not able to get the metadata information of the Azure Cloud provider with the fault domain in which the instance is running.

About the “no route to host” error I thought this must be a network issue and that I could reproduce this with a simple curl command:

[root@infra01 ~]# curl -v http://169.254.169.254/metadata/v1/InstanceInfo
* About to connect() to 169.254.169.254 port 80 (#0)
*   Trying 169.254.169.254...
* No route to host
* Failed connect to 169.254.169.254:80; No route to host
* Closing connection 0
curl: (7) Failed connect to 169.254.169.254:80; No route to host
[root@infra01 ~]#

The routing table on the node looked fine and theoretically forwarded the traffic to the default gateway in the subnet.

[root@infra01 ~]# ip route show
default via 10.1.1.1 dev eth0
10.1.1.0/27 dev eth0 proto kernel scope link src 10.1.1.10 
10.128.0.0/15 dev tun0 scope link
10.128.0.0/15 dev tun0
168.63.129.16 via 10.1.1.1 dev eth0 proto static
169.254.0.0/16 dev eth0 scope link metric 1002
169.254.169.254 via 10.1.1.1 dev eth0 proto static
172.17.0.0/16 via 172.17.0.1 dev docker0
172.18.0.0/16 dev bravi proto kernel scope link src 172.18.0.1
[root@infra01 ~]#

What was a bit strange when I used tracepath was that the packets weren’t forwarded to the default gateway but instead forwarded to the node itself:

[root@infra01 ~]# tracepath 169.254.169.254
1?: [LOCALHOST]                                         pmtu 1500
1:  infra01                                     3006.801ms !H
    Resume: pmtu 1500

Same with traceroute or ping output:

[root@infra01 ~]# traceroute 169.254.169.254
traceroute to 169.254.169.254 (169.254.169.254), 30 hops max, 60 byte packets
1  infra01 (172.18.0.1)  1178.146 ms !H  1178.104 ms !H  1178.057 ms !H
[root@infra01 ~]# ping 169.254.169.254
PING 169.254.169.254 (169.254.169.254) 56(84) bytes of data.
From 172.18.0.1 icmp_seq=1 Destination Host Unreachable
From 172.18.0.1 icmp_seq=2 Destination Host Unreachable
From 172.18.0.1 icmp_seq=3 Destination Host Unreachable
From 172.18.0.1 icmp_seq=4 Destination Host Unreachable
^C
--- 169.254.169.254 ping statistics ---
5 packets transmitted, 0 received, +4 errors, 100% packet loss, time 4000ms
pipe 4

What was very obvious was that the packets were forwarded to the bravi bridge 172.18.0.0/16 which is owned by the Avi Service Engine on the Infra node:

...
44: bravi: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN qlen 1000
   link/ether 00:00:00:00:00:01 brd ff:ff:ff:ff:ff:ff
   inet 172.18.0.1/16 scope global bravi
      valid_lft forever preferred_lft forever
   inet6 fe80::200:ff:fe00:1/64 scope link
      valid_lft forever preferred_lft forever
45: bravi-tap: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master bravi state DOWN qlen 1000
   link/ether 00:00:00:00:00:01 brd ff:ff:ff:ff:ff:ff
...

Here is the article about how Avi SE are integrated into the OpenShift SDN. Avi uses PBR (policy based routing) to forward external ingress traffic to the Service Engine.

I have turned off the bravi bridge because PBR could be bypassing the routing table for traffic to the 169.254.169.254.

[root@infra01 ~]# ip link set bravi down

Traffic is now exiting the Infra node:

[root@infra01 ~]# traceroute 169.254.169.254
traceroute to 169.254.169.254 (169.254.169.254), 30 hops max, 60 byte packets
1  * * *
2  * * *
3  * * *
4  * * *
5  * * *
6  *^C
[root@infra01 ~]#

And the Infra node was able to collect metadata information:

[root@infra01 ~]# curl -v http://169.254.169.254/metadata/v1/InstanceInfo
* About to connect() to 169.254.169.254 port 80 (#0)
*   Trying 169.254.169.254...
* Connected to 169.254.169.254 (169.254.169.254) port 80 (#0)
GET /metadata/v1/InstanceInfo HTTP/1.1
User-Agent: curl/7.29.0
Host: 169.254.169.254
Accept: */*

< HTTP/1.1 200 OK
< Content-Type: text/json; charset=utf-8
< Server: Microsoft-IIS/10.0
< Date: Thu, 06 Dec 2018 13:53:16 GMT
< Content-Length: 43
<
* Connection #0 to host 169.254.169.254 left intact
{"ID":"_infra01","UD":"4","FD":"0"}
[root@infra01 ~]#

Simple restart of the atomic node service to trigger the master registration:

[root@infra01 ~]# systemctl restart atomic-openshift-node

The logs showed that the kubelet successfully got zone information

Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.341611   36813 kubelet_node_status.go:270] Setting node annotation to enable volume controller attach/detach
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.414847   36813 kubelet_node_status.go:326] Adding node label from cloud provider: beta.kubernetes.io/instance-type=Standard_D8_
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.414881   36813 kubelet_node_status.go:337] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/zone=0
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.414890   36813 kubelet_node_status.go:341] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/region=ukso
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.414966   36813 kubelet_node_status.go:488] Using Node Hostname from cloudprovider: "infra01"
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.420823   36813 kubelet_node_status.go:437] Recording NodeHasSufficientDisk event message for node infra01
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.420907   36813 kubelet_node_status.go:437] Recording NodeHasSufficientMemory event message for node infra01
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.423139   36813 kubelet_node_status.go:437] Recording NodeHasNoDiskPressure event message for node infra01
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.423235   36813 kubelet_node_status.go:82] Attempting to register node infra01
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.435412   36813 kubelet_node_status.go:85] Successfully registered node infra01
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.437308   36813 kubelet_node_status.go:488] Using Node Hostname from cloudprovider: "infra01"
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.441482   36813 manager.go:311] Recovery completed

The Infra node successfully registered again the OpenShift master and the node went into “Ready”:

[root@master01 ~]# oc get nodes -o wide --show-labels | grep 'region=infra'
infra01   Ready                      40s       v1.7.6+a08f5eeb62           Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-693.11.6.el7.x86_64   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D8_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=uksouth,failure-domain.beta.kubernetes.io/zone=0,kubernetes.io/hostname=infra01,logging-infra-fluentd=true,purpose=infra,region=infra,zone=1
infra02   Ready                      1d        v1.7.6+a08f5eeb62           Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-693.11.6.el7.x86_64   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D8_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=uksouth,failure-domain.beta.kubernetes.io/zone=1,kubernetes.io/hostname=infra02,logging-infra-fluentd=true,purpose=infra,region=infra,zone=0
infra03   Ready                      1d        v1.7.6+a08f5eeb62           Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-693.11.6.el7.x86_64   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D8_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=uksouth,failure-domain.beta.kubernetes.io/zone=0,kubernetes.io/hostname=infra03,logging-infra-fluentd=true,purpose=infra,region=infra,zone=2
[root@master01 ~]#

In the end, the root cause was the Avi East West subnet range which was set to 169.254.0.0/16 on the Avi controller nodes. Even the East West communication was deactivated on Avi, because kube_proxy was used, which made the Avi controller configure PBR on the bravi bridge for the 169.254.0.0/16 subnet range. This subnet range was previously used on all the on-prem datacenters and never caused issues since moving to cloud because the 169.254.169.254 is commonly used on cloud provider for instances to collect metadata information.

Deploy OpenShift 3.11 Container Platform on AWS using Terraform

I have done a few changes on my Terraform configuration for OpenShift 3.11 on Amazon AWS. I have downsized the environment because I didn’t needed that many nodes for a quick test setup. I have added CloudFlare DNS to automatically create CNAME for the AWS load balancers on the DNS zone. I have also added an AWS S3 Bucket for storing the backend state. You can find the new Terraform configuration on my Github repository: https://github.com/berndonline/openshift-terraform/tree/aws-dev

From OpenShift 3.10 and later versions the environment variables changes and I modified the ansible-hosts template for the new configuration. You can see the changes in the hosts template: https://github.com/berndonline/openshift-terraform/blob/aws-dev/helper_scripts/ansible-hosts.template.txt

OpenShift 3.11 has changed a few things and put an focus on an Cluster Operator console which is pretty nice and runs on Kubernetes 1.11. I recommend reading the release notes for the 3.11 release for more details: https://docs.openshift.com/container-platform/3.11/release_notes/ocp_3_11_release_notes.html

I don’t wanted to get into too much detail, just follow the steps below and start with cloning my repository, and choose the dev branch:

git clone -b aws-dev https://github.com/berndonline/openshift-terraform.git
cd ./openshift-terraform/
ssh-keygen -b 2048 -t rsa -f ./helper_scripts/id_rsa -q -N ""
chmod 600 ./helper_scripts/id_rsa

You need to modify the cloudflare.tf and add your CloudFlare API credentials otherwise just delete the file. The same for the S3 backend provider, you find the configuration in the main.tf and it can be removed if not needed.

CloudFlare and Amazon AWS credentials can be added through environment variables:

export AWS_ACCESS_KEY_ID='<-YOUR-AWS-ACCESS-KEY->'
export AWS_SECRET_ACCESS_KEY='<-YOUR-AWS-SECRET-KEY->'
export TF_VAR_email='<-YOUR-CLOUDFLARE-EMAIL-ADDRESS->'
export TF_VAR_token='<-YOUR-CLOUDFLARE-TOKEN->'
export TF_VAR_domain='<-YOUR-CLOUDFLARE-DOMAIN->'
export TF_VAR_htpasswd='<-YOUR-OPENSHIFT-DEMO-USER-HTPASSWD->'

Run terraform init and apply to create the environment.

terraform init && terraform apply -auto-approve

Copy the ssh key and ansible-hosts file to the bastion host from where you need to run the Ansible OpenShift playbooks.

scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -r ./helper_scripts/id_rsa centos@$(terraform output bastion):/home/centos/.ssh/
scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -r ./inventory/ansible-hosts  centos@$(terraform output bastion):/home/centos/ansible-hosts

I recommend waiting a few minutes as the AWS cloud-init script prepares the bastion host. Afterwards continue with the pre and install playbooks. You can connect to the bastion host and run the playbooks directly.

ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -l centos $(terraform output bastion) -A "cd /openshift-ansible/ && ansible-playbook ./playbooks/openshift-pre.yml -i ~/ansible-hosts"
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -l centos $(terraform output bastion) -A "cd /openshift-ansible/ && ansible-playbook ./playbooks/openshift-install.yml -i ~/ansible-hosts"

If for whatever reason the cluster deployment fails, you can run the uninstall playbook to bring the nodes back into a clean state and start from the beginning and run deploy_cluster.

ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -l centos $(terraform output bastion) -A "cd /openshift-ansible/ && ansible-playbook ./openshift-ansible/playbooks/adhoc/uninstall.yml -i ~/ansible-hosts"

Here are some screenshots of the new cluster console:

Let’s create a project and import my hello-openshift.yml build configuration:

Successful completed the build and deployed the hello-openshift container:

My example hello openshift application:

When you are finished with the testing, run terraform destroy.

terraform destroy -force 

 

Terraform CloudFlare Provider Example

This is a short article on how to create DNS records on your CloudFlare DNS zone using Terraform. I have used this in new coming article about OpenShift 3.11 on AWS. You can check out the cloudflare.tf example on my Github repository: https://github.com/berndonline/openshift-terraform/blob/aws-dev/cloudflare.tf

In the cloudflare_record configuration, the variables of the AWS ALB dns names are under resource values. This means, Terraform will start with deploying  the AWS infrastructure and create’s afterwards the specified DNS records on the CloudFlare DNS zone.

provider "cloudflare" {
  email = "[email protected]"
  token = "***YOUR-API-TOKEN***"
}
variable "domain" {
  default = "domain.com"
}
resource "cloudflare_record" "console-paas" {
  domain  = "${var.domain}"
  name    = "console-paas"
  value   = "${aws_lb.master_alb.dns_name}"
  type    = "CNAME"
  proxied = false
}
resource "cloudflare_record" "wildcard-paas" {
  domain  = "${var.domain}"
  name    = "*.paas"
  value   = "${aws_lb.infra_alb.dns_name}"
  type    = "CNAME"
  proxied = false
}

If you verify this on the CloudFlare web console, you see that Terraform created two DNS record’s and pointing to the AWS ALB dns name:

When you run terraform destroy the two DNS records will be automatically removed.

I recommend having a look at the great articles on the CloudFlare blog:

https://blog.cloudflare.com/getting-started-with-terraform-and-cloudflare-part-1/

https://blog.cloudflare.com/getting-started-with-terraform-and-cloudflare-part-2/

Deploy OpenShift 3.9 Container Platform using Terraform and Ansible on Amazon AWS

After my previous articles on OpenShift and Terraform I wanted to show how to create the necessary infrastructure and to deploy an OpenShift Container Platform in a more real-world scenario. I highly recommend reading my other posts about using Terraform to deploy an Amazon AWS VPC and AWS EC2 Instances and Load Balancers. Once the infrastructure is created we will use the Bastion Host to connect to the environment and deploy OpenShift Origin using Ansible.

I think this might be an interesting topic to show what tools like Terraform and Ansible can do together:

I will not go into detail about the configuration and only show the output of deploying the infrastructure. Please checkout my Github repository to see the detailed configuration: https://github.com/berndonline/openshift-terraform

Before we start you need to clone the repository and generate the ssh key used from the bastion host to access the OpenShift nodes:

git clone https://github.com/berndonline/openshift-terraform.git
cd ./openshift-terraform/
ssh-keygen -b 2048 -t rsa -f ./helper_scripts/id_rsa -q -N ""
chmod 600 ./helper_scripts/id_rsa

We are ready to create the infrastructure and run terraform apply:

berndonline@lab:~/openshift-terraform$ terraform apply

...

Plan: 56 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

...

Apply complete! Resources: 19 added, 0 changed, 16 destroyed.

Outputs:

bastion = ec2-34-244-225-35.eu-west-1.compute.amazonaws.com
openshift master = master-35563dddc8b2ea9c.elb.eu-west-1.amazonaws.com
openshift subdomain = infra-1994425986.eu-west-1.elb.amazonaws.com
berndonline@lab:~/openshift-terraform$

Terraform successfully creates the VPC, load balancers and all needed instances. Before we continue wait 5 to 10 minutes because the cloud-init script takes a bit time and all the instance reboot at the end.

Instances:

Security groups:

Target groups for the Master and the Infra load balancers:

Master and the Infra load balancers:

Terraform also automatically creates the inventory file for the OpenShift installation and adds the hostnames for master, infra and worker nodes to the correct inventory groups. The next step is to copy the private ssh key and the inventory file to the bastion host. I am using the terraform output command to get the public hostname from the bastion host:

scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -r ./helper_scripts/id_rsa centos@$(terraform output bastion):/home/centos/.ssh/
scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -r ./inventory/ansible-hosts  centos@$(terraform output bastion):/home/centos/ansible-hosts
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -l centos $(terraform output bastion)

On the bastion node, change to the /openshift-ansible/ folder and start running the prerequisites and the deploy-cluster playbooks:

cd /openshift-ansible/
ansible-playbook ./playbooks/prerequisites.yml -i ~/ansible-hosts
ansible-playbook ./playbooks/deploy_cluster.yml -i ~/ansible-hosts

Here the output from running the prerequisites playbook:

[centos@ip-10-0-0-22 ~]$ cd /openshift-ansible/
[centos@ip-10-0-0-22 openshift-ansible]$ ansible-playbook ./playbooks/prerequisites.yml -i ~/ansible-hosts

PLAY [Initialization Checkpoint Start] ****************************************************************************************************************************

TASK [Set install initialization 'In Progress'] *******************************************************************************************************************
Saturday 15 September 2018  11:04:50 +0000 (0:00:00.407)       0:00:00.407 ****
ok: [ip-10-0-1-237.eu-west-1.compute.internal]

PLAY [Populate config host groups] ********************************************************************************************************************************

TASK [Load group name mapping variables] **************************************************************************************************************************
Saturday 15 September 2018  11:04:50 +0000 (0:00:00.110)       0:00:00.517 ****
ok: [localhost]

TASK [Evaluate groups - g_etcd_hosts or g_new_etcd_hosts required] ************************************************************************************************
Saturday 15 September 2018  11:04:51 +0000 (0:00:00.033)       0:00:00.551 ****
skipping: [localhost]

TASK [Evaluate groups - g_master_hosts or g_new_master_hosts required] ********************************************************************************************
Saturday 15 September 2018  11:04:51 +0000 (0:00:00.024)       0:00:00.575 ****
skipping: [localhost]

TASK [Evaluate groups - g_node_hosts or g_new_node_hosts required] ************************************************************************************************
Saturday 15 September 2018  11:04:51 +0000 (0:00:00.024)       0:00:00.599 ****
skipping: [localhost]

...

PLAY RECAP ********************************************************************************************************************************************************
ip-10-0-1-192.eu-west-1.compute.internal : ok=56   changed=14   unreachable=0    failed=0
ip-10-0-1-237.eu-west-1.compute.internal : ok=64   changed=15   unreachable=0    failed=0
ip-10-0-1-248.eu-west-1.compute.internal : ok=56   changed=14   unreachable=0    failed=0
ip-10-0-5-174.eu-west-1.compute.internal : ok=56   changed=14   unreachable=0    failed=0
ip-10-0-5-235.eu-west-1.compute.internal : ok=58   changed=14   unreachable=0    failed=0
ip-10-0-5-35.eu-west-1.compute.internal : ok=56   changed=14   unreachable=0    failed=0
ip-10-0-9-130.eu-west-1.compute.internal : ok=56   changed=14   unreachable=0    failed=0
ip-10-0-9-51.eu-west-1.compute.internal : ok=58   changed=14   unreachable=0    failed=0
ip-10-0-9-85.eu-west-1.compute.internal : ok=56   changed=14   unreachable=0    failed=0
localhost                  : ok=11   changed=0    unreachable=0    failed=0


INSTALLER STATUS **************************************************************************************************************************************************
Initialization             : Complete (0:00:41)

[centos@ip-10-0-0-22 openshift-ansible]$

Continue with the deploy cluster playbook:

[centos@ip-10-0-0-22 openshift-ansible]$ ansible-playbook ./playbooks/deploy_cluster.yml -i ~/ansible-hosts

PLAY [Initialization Checkpoint Start] ****************************************************************************************************************************

TASK [Set install initialization 'In Progress'] *******************************************************************************************************************
Saturday 15 September 2018  11:08:38 +0000 (0:00:00.102)       0:00:00.102 ****
ok: [ip-10-0-1-237.eu-west-1.compute.internal]

PLAY [Populate config host groups] ********************************************************************************************************************************

TASK [Load group name mapping variables] **************************************************************************************************************************
Saturday 15 September 2018  11:08:38 +0000 (0:00:00.064)       0:00:00.167 ****
ok: [localhost]

TASK [Evaluate groups - g_etcd_hosts or g_new_etcd_hosts required] ************************************************************************************************
Saturday 15 September 2018  11:08:38 +0000 (0:00:00.031)       0:00:00.198 ****
skipping: [localhost]

TASK [Evaluate groups - g_master_hosts or g_new_master_hosts required] ********************************************************************************************
Saturday 15 September 2018  11:08:38 +0000 (0:00:00.026)       0:00:00.225 ****
skipping: [localhost]

...

PLAY RECAP ********************************************************************************************************************************************************
ip-10-0-1-192.eu-west-1.compute.internal : ok=132  changed=57   unreachable=0    failed=0
ip-10-0-1-237.eu-west-1.compute.internal : ok=591  changed=256  unreachable=0    failed=0
ip-10-0-1-248.eu-west-1.compute.internal : ok=132  changed=57   unreachable=0    failed=0
ip-10-0-5-174.eu-west-1.compute.internal : ok=132  changed=57   unreachable=0    failed=0
ip-10-0-5-235.eu-west-1.compute.internal : ok=325  changed=145  unreachable=0    failed=0
ip-10-0-5-35.eu-west-1.compute.internal : ok=132  changed=57   unreachable=0    failed=0
ip-10-0-9-130.eu-west-1.compute.internal : ok=132  changed=57   unreachable=0    failed=0
ip-10-0-9-51.eu-west-1.compute.internal : ok=325  changed=145  unreachable=0    failed=0
ip-10-0-9-85.eu-west-1.compute.internal : ok=132  changed=57   unreachable=0    failed=0
localhost                  : ok=13   changed=0    unreachable=0    failed=0

INSTALLER STATUS **************************************************************************************************************************************************
Initialization             : Complete (0:00:55)
Health Check               : Complete (0:00:01)
etcd Install               : Complete (0:01:03)
Master Install             : Complete (0:05:17)
Master Additional Install  : Complete (0:00:26)
Node Install               : Complete (0:08:24)
Hosted Install             : Complete (0:00:57)
Web Console Install        : Complete (0:00:28)
Service Catalog Install    : Complete (0:01:19)

[centos@ip-10-0-0-22 openshift-ansible]$

Once the deploy playbook finishes we have a working Openshift cluster:

Login with username: demo, and password: demo

For the infra load balancers you cannot access OpenShift routes via the Amazon DNS, this is not allowed. You need to create a wildcard DNS CNAME record like *.paas.domain.com and point to the AWS load balancer DNS record.

Let’s continue to do some basic cluster checks to see the nodes are in ready state:

[centos@ip-10-0-1-237 ~]$ oc get nodes
NAME                                       STATUS    ROLES     AGE       VERSION
ip-10-0-1-192.eu-west-1.compute.internal   Ready     compute   11m       v1.9.1+a0ce1bc657
ip-10-0-1-237.eu-west-1.compute.internal   Ready     master    16m       v1.9.1+a0ce1bc657
ip-10-0-1-248.eu-west-1.compute.internal   Ready         11m       v1.9.1+a0ce1bc657
ip-10-0-5-174.eu-west-1.compute.internal   Ready     compute   11m       v1.9.1+a0ce1bc657
ip-10-0-5-235.eu-west-1.compute.internal   Ready     master    15m       v1.9.1+a0ce1bc657
ip-10-0-5-35.eu-west-1.compute.internal    Ready         11m       v1.9.1+a0ce1bc657
ip-10-0-9-130.eu-west-1.compute.internal   Ready     compute   11m       v1.9.1+a0ce1bc657
ip-10-0-9-51.eu-west-1.compute.internal    Ready     master    14m       v1.9.1+a0ce1bc657
ip-10-0-9-85.eu-west-1.compute.internal    Ready         11m       v1.9.1+a0ce1bc657
[centos@ip-10-0-1-237 ~]$
[centos@ip-10-0-1-237 ~]$ oc get projects
NAME                                DISPLAY NAME   STATUS
default                                            Active
kube-public                                        Active
kube-service-catalog                               Active
kube-system                                        Active
logging                                            Active
management-infra                                   Active
openshift                                          Active
openshift-ansible-service-broker                   Active
openshift-infra                                    Active
openshift-node                                     Active
openshift-template-service-broker                  Active
openshift-web-console                              Active
[centos@ip-10-0-1-237 ~]$
[centos@ip-10-0-1-237 ~]$ oc get pods -o wide
NAME                       READY     STATUS    RESTARTS   AGE       IP           NODE
docker-registry-1-8798r    1/1       Running   0          10m       10.128.2.2   ip-10-0-5-35.eu-west-1.compute.internal
registry-console-1-zh9m4   1/1       Running   0          10m       10.129.2.3   ip-10-0-9-85.eu-west-1.compute.internal
router-1-96zzf             1/1       Running   0          10m       10.0.9.85    ip-10-0-9-85.eu-west-1.compute.internal
router-1-nfh7h             1/1       Running   0          10m       10.0.1.248   ip-10-0-1-248.eu-west-1.compute.internal
router-1-pcs68             1/1       Running   0          10m       10.0.5.35    ip-10-0-5-35.eu-west-1.compute.internal
[centos@ip-10-0-1-237 ~]$

At the end just destroy the infrastructure with terraform destroy:

berndonline@lab:~/openshift-terraform$ terraform destroy

...

Destroy complete! Resources: 56 destroyed.
berndonline@lab:~/openshift-terraform$

I will continue improving the configuration and I plan to use Jenkins to deploy the AWS infrastructure and OpenShift fully automatically.

Please let me know if you like the article or have questions in the comments below.