Running Istio Service Mesh on Amazon EKS

I have not spend too much time with Istio in the last weeks but after my previous article about running Istio Service Mesh on OpenShift I wanted to do the same and deploy Istio Service Mesh on an Amazon EKS cluster. This time I did the recommended way of using a helm template to deploy Istio which is more flexible then the Ansible operator for the OpenShift deployment.

Once you have created your EKS cluster you can start, there are not many prerequisite for EKS so you can basically create the istio namespace and create a secret for Kiali, and start to deploy the helm template:

kubectl create namespace istio-system

USERNAME=$(echo -n 'admin' | base64)
PASSPHRASE=$(echo -n 'supersecretpassword!!' | base64)
NAMESPACE=istio-system

cat <<EOF | kubectl apply -n istio-system -f -
apiVersion: v1
kind: Secret
metadata:
  name: kiali
  namespace: $NAMESPACE
  labels:
    app: kiali
type: Opaque
data:
  username: $USERNAME
  passphrase: $PASSPHRASE
EOF

You then create the Custom Resource Definitions (CRDs) for Istio:

helm template istio-1.1.4/install/kubernetes/helm/istio-init --name istio-init --namespace istio-system | kubectl apply -f -  

# Check the created Istio CRDs 
kubectl get crds -n istio-system | grep 'istio.io\|certmanager.k8s.io' | wc -l

At this point you can deploy the main Istio Helm template. See the installation options for more detail about customizing the installation:

helm template istio-1.1.4/install/kubernetes/helm/istio --name istio --namespace istio-system  --set grafana.enabled=true --set tracing.enabled=true --set kiali.enabled=true --set kiali.dashboard.secretName=kiali --set kiali.dashboard.usernameKey=username --set kiali.dashboard.passphraseKey=passphrase | kubectl apply -f -
 
# Validate and see that all components start
kubectl get pods -n istio-system -w  

The Kiali service has the type clusterIP which we need to change to type LoadBalancer:

kubectl patch svc kiali -n istio-system --patch '{"spec": {"type": "LoadBalancer" }}'

# Get the create AWS ELB for the Kiali service
$ kubectl get svc kiali -n istio-system --no-headers | awk '{ print $4 }'
abbf8224773f111e99e8a066e034c3d4-78576474.eu-west-1.elb.amazonaws.com

Now we are able to access the Kiali dashboard and login with the credentials I have specified earlier in the Kiali secret.

We didn’t deploy anything else yet so the default namespace is empty:

I recommend having a look at the Istio-Sidecar injection. If your istio-sidecar containers are not getting deployed you might forgot to allow TCP port 443 from your control-plane to worker nodes. Have a look at the Github issue about this: Admission control webhooks (e.g. sidecar injector) don’t work on EKS.

We can continue and deploy the Google Hipster Shop example.

# Label default namespace to inject Envoy sidecar
kubectl label namespace default istio-injection=enabled

# Check istio sidecar injector label
kubectl get namespace -L istio-injection

# Deploy Google hipster shop manifests
kubectl create -f https://raw.githubusercontent.com/berndonline/aws-eks-terraform/master/example/istio-hipster-shop.yml
kubectl create -f https://raw.githubusercontent.com/berndonline/aws-eks-terraform/master/example/istio-manifest.yml

# Wait a few minutes before deploying the load generator
kubectl create -f https://raw.githubusercontent.com/berndonline/aws-eks-terraform/master/example/istio-loadgenerator.yml

We can check again the Kiali dashboard once the application is deployed and healthy. If there are issues with the Envoy sidecar you will see a warning “Missing Sidecar”:

We are also able to see the graph which shows detailed traffic flows within the microservice application.

Let’s get the hostname for the istio-ingressgateway service and connect via the web browser:

$ kubectl get svc istio-ingressgateway -n istio-system --no-headers | awk '{ print $4 }'
a16f7090c74ca11e9a1fb02cd763ca9e-362893117.eu-west-1.elb.amazonaws.com

Before you destroy your EKS cluster you should remove all installed components because Kubernetes service type LoadBalancer created AWS ELBs which will not get deleted and stay behind when you delete the EKS cluster:

kubectl label namespace default istio-injection-
kubectl delete -f https://raw.githubusercontent.com/berndonline/aws-eks-terraform/master/example/istio-loadgenerator.yml
kubectl delete -f https://raw.githubusercontent.com/berndonline/aws-eks-terraform/master/example/istio-hipster-shop.yml
kubectl delete -f https://raw.githubusercontent.com/berndonline/aws-eks-terraform/master/example/istio-manifest.yml

Finally to remove Istio from EKS you run the same Helm template command but do kubectl delete:

helm template istio-1.1.4/install/kubernetes/helm/istio --name istio --namespace istio-system  --set grafana.enabled=true --set tracing.enabled=true --set kiali.enabled=true --set kiali.dashboard.secretName=kiali --set kiali.dashboard.usernameKey=username --set kiali.dashboard.passphraseKey=passphrase | kubectl delete -f -

Very simple to get started with Istio Service Mesh on EKS and if I find some time I will give the Istio Multicluster a try and see how this works to span Istio service mesh across multiple Kubernetes clusters.

Deploy OpenShift 3.11 Container Platform on Google Cloud Platform using Terraform

Over the past few days I have converted the OpenShift 3.11 infrastructure on Amazon AWS to run on Google Cloud Platform. I have kept the similar VPC network layout and instances to run OpenShift.

Before you start you need to create a project on Google Cloud Platform, then continue to create the service account and generate the private key and download the credential as JSON file.

Create the new project:

Create the service account:

Give the service account compute admin and storage object creator permissions:

Then create a storage bucket for the Terraform backend state and assign the correct bucket permission to the terraform service account:

Bucket permissions:

To start, clone my openshift-terraform github repository and checkout the google-dev branch:

git clone https://github.com/berndonline/openshift-terraform.git
cd ./openshift-terraform/ && git checkout google-dev

Add your previously downloaded credentials json file:

cat << EOF > ./credentials.json
{
  "type": "service_account",
  "project_id": "<--your-project-->",
  "private_key_id": "<--your-key-id-->",
  "private_key": "-----BEGIN PRIVATE KEY-----

...

}
EOF

There are a few things you need to modify in the main.tf and variables.tf before you can start:

...
terraform {
  backend "gcs" {
    bucket    = "<--your-bucket-name-->"
    prefix    = "openshift-311"
    credentials = "credentials.json"
  }
}
...
...
variable "gcp_region" {
  description = "Google Compute Platform region to launch servers."
  default     = "europe-west3"
}
variable "gcp_project" {
  description = "Google Compute Platform project name."
  default     = "<--your-project-name-->"
}
variable "gcp_zone" {
  type = "string"
  default = "europe-west3-a"
  description = "The zone to provision into"
}
...

Add the needed environment variables to apply changes to CloudFlare DNS:

export TF_VAR_email='<-YOUR-CLOUDFLARE-EMAIL-ADDRESS->'
export TF_VAR_token='<-YOUR-CLOUDFLARE-TOKEN->'
export TF_VAR_domain='<-YOUR-CLOUDFLARE-DOMAIN->'
export TF_VAR_htpasswd='<-YOUR-OPENSHIFT-DEMO-USER-HTPASSWD->'

Let’s start creating the infrastructure and verify afterwards the created resources on GCP.

terraform init && terraform apply -auto-approve

VPC and public and private subnets in region europe-west3:

Created instances:

Created load balancers for master and infra nodes:

Copy the ssh key and ansible-hosts file to the bastion host from where you need to run the Ansible OpenShift playbooks.

scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -r ./helper_scripts/id_rsa [email protected]$(terraform output bastion):/home/centos/.ssh/
scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -r ./inventory/ansible-hosts  [email protected]$(terraform output bastion):/home/centos/ansible-hosts

I recommend waiting a few minutes as the cloud-init script prepares the bastion host. Afterwards continue with the pre and install playbooks. You can connect to the bastion host and run the playbooks directly.

ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -l centos $(terraform output bastion) -A "cd /openshift-ansible/ && ansible-playbook ./playbooks/openshift-pre.yml -i ~/ansible-hosts"
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -l centos $(terraform output bastion) -A "cd /openshift-ansible/ && ansible-playbook ./playbooks/openshift-install.yml -i ~/ansible-hosts"

After the installation is completed, continue to create your project and applications:

When you are finished with the testing, run terraform destroy.

terraform destroy -force 

Please share your feedback and leave a comment.

OpenShift Infra node “Not Ready” running Avi Service Engine

I had to troubleshoot an interesting issue with OpenShift Infra nodes suddenly going into “Not Ready” state during an OpenShift upgrade or not registering on Master nodes after a re-install of OpenShift cluster. On the Infra nodes Avi Service Engines were running for ingress traffic. The problem was not very obvious and RedHat and Avi Networks were not able to identify the issue.

Here the output of oc get nodes:

[[email protected] ~]# oc get nodes -o wide --show-labels | grep 'region=infra'
infra01   NotReady                   1d        v1.7.6+a08f5eeb62           Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-693.11.6.el7.x86_64   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D8_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=uksouth,failure-domain.beta.kubernetes.io/zone=1,kubernetes.io/hostname=infra01,logging-infra-fluentd=true,purpose=infra,region=infra,zone=1
infra02   NotReady                   1d        v1.7.6+a08f5eeb62           Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-693.11.6.el7.x86_64   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D8_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=uksouth,failure-domain.beta.kubernetes.io/zone=1,kubernetes.io/hostname=infra02,logging-infra-fluentd=true,purpose=infra,region=infra,zone=0
infra03   NotReady                   1d        v1.7.6+a08f5eeb62           Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-693.11.6.el7.x86_64   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D8_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=uksouth,failure-domain.beta.kubernetes.io/zone=0,kubernetes.io/hostname=infra03,logging-infra-fluentd=true,purpose=infra,region=infra,zone=2
[[email protected] ~]#

On the Infra node itself you could see that the atomic node service had successfully started but I saw a very strange error message from the kubelet not being able to synchronise the pod:

I1206 14:52:28.115735   21690 cloud_request_manager.go:89] Requesting node addresses from cloud provider for node "infra01"
I1206 14:52:28.170366   21690 cloud_request_manager.go:108] Node addresses from cloud provider for node "infra01" collected
E1206 14:52:28.533560   21690 eviction_manager.go:238] eviction manager: unexpected err: failed GetNode: node 'infra01' not found
I1206 14:52:32.840769   21690 kubelet.go:1808] skipping pod synchronization - [Kubelet failed to get node info: failed to get zone from cloud provider: Get http://169.254.169.254/metadata/v1/InstanceInfo: dial tcp 169.254.169.254:80: getsockopt: no route to host]
I1206 14:52:37.841235   21690 kubelet.go:1808] skipping pod synchronization - [Kubelet failed to get node info: failed to get zone from cloud provider: Get http://169.254.169.254/metadata/v1/InstanceInfo: dial tcp 169.254.169.254:80: getsockopt: no route to host]
I1206 14:52:38.170604   21690 cloud_request_manager.go:89] Requesting node addresses from cloud provider for node "infra01"
I1206 14:52:38.222439   21690 cloud_request_manager.go:108] Node addresses from cloud provider for node "infra01" collected
E1206 14:52:38.545991   21690 eviction_manager.go:238] eviction manager: unexpected err: failed GetNode: node 'infra01' not found
I1206 14:52:42.841547   21690 kubelet.go:1808] skipping pod synchronization - [Kubelet failed to get node info: failed to get zone from cloud provider: Get http://169.254.169.254/metadata/v1/InstanceInfo: dial tcp 169.254.169.254:80: getsockopt: no route to host]
I1206 14:52:47.841819   21690 kubelet.go:1808] skipping pod synchronization - [Kubelet failed to get node info: failed to get zone from cloud provider: Get http://169.254.169.254/metadata/v1/InstanceInfo: dial tcp 169.254.169.254:80: getsockopt: no route to host]

Even stranger is that the kubelet was not able to get the metadata information of the Azure Cloud provider with the fault domain in which the instance is running.

About the “no route to host” error I thought this must be a network issue and that I could reproduce this with a simple curl command:

[[email protected] ~]# curl -v http://169.254.169.254/metadata/v1/InstanceInfo
* About to connect() to 169.254.169.254 port 80 (#0)
*   Trying 169.254.169.254...
* No route to host
* Failed connect to 169.254.169.254:80; No route to host
* Closing connection 0
curl: (7) Failed connect to 169.254.169.254:80; No route to host
[[email protected] ~]#

The routing table on the node looked fine and theoretically forwarded the traffic to the default gateway in the subnet.

[[email protected] ~]# ip route show
default via 10.1.1.1 dev eth0
10.1.1.0/27 dev eth0 proto kernel scope link src 10.1.1.10 
10.128.0.0/15 dev tun0 scope link
10.128.0.0/15 dev tun0
168.63.129.16 via 10.1.1.1 dev eth0 proto static
169.254.0.0/16 dev eth0 scope link metric 1002
169.254.169.254 via 10.1.1.1 dev eth0 proto static
172.17.0.0/16 via 172.17.0.1 dev docker0
172.18.0.0/16 dev bravi proto kernel scope link src 172.18.0.1
[[email protected] ~]#

What was a bit strange when I used tracepath was that the packets weren’t forwarded to the default gateway but instead forwarded to the node itself:

[[email protected] ~]# tracepath 169.254.169.254
1?: [LOCALHOST]                                         pmtu 1500
1:  infra01                                     3006.801ms !H
    Resume: pmtu 1500

Same with traceroute or ping output:

[[email protected] ~]# traceroute 169.254.169.254
traceroute to 169.254.169.254 (169.254.169.254), 30 hops max, 60 byte packets
1  infra01 (172.18.0.1)  1178.146 ms !H  1178.104 ms !H  1178.057 ms !H
[[email protected] ~]# ping 169.254.169.254
PING 169.254.169.254 (169.254.169.254) 56(84) bytes of data.
From 172.18.0.1 icmp_seq=1 Destination Host Unreachable
From 172.18.0.1 icmp_seq=2 Destination Host Unreachable
From 172.18.0.1 icmp_seq=3 Destination Host Unreachable
From 172.18.0.1 icmp_seq=4 Destination Host Unreachable
^C
--- 169.254.169.254 ping statistics ---
5 packets transmitted, 0 received, +4 errors, 100% packet loss, time 4000ms
pipe 4

What was very obvious was that the packets were forwarded to the bravi bridge 172.18.0.0/16 which is owned by the Avi Service Engine on the Infra node:

...
44: bravi: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN qlen 1000
   link/ether 00:00:00:00:00:01 brd ff:ff:ff:ff:ff:ff
   inet 172.18.0.1/16 scope global bravi
      valid_lft forever preferred_lft forever
   inet6 fe80::200:ff:fe00:1/64 scope link
      valid_lft forever preferred_lft forever
45: bravi-tap: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master bravi state DOWN qlen 1000
   link/ether 00:00:00:00:00:01 brd ff:ff:ff:ff:ff:ff
...

Here is the article about how Avi SE are integrated into the OpenShift SDN. Avi uses PBR (policy based routing) to forward external ingress traffic to the Service Engine.

I have turned off the bravi bridge because PBR could be bypassing the routing table for traffic to the 169.254.169.254.

[[email protected] ~]# ip link set bravi down

Traffic is now exiting the Infra node:

[[email protected] ~]# traceroute 169.254.169.254
traceroute to 169.254.169.254 (169.254.169.254), 30 hops max, 60 byte packets
1  * * *
2  * * *
3  * * *
4  * * *
5  * * *
6  *^C
[[email protected] ~]#

And the Infra node was able to collect metadata information:

[[email protected] ~]# curl -v http://169.254.169.254/metadata/v1/InstanceInfo
* About to connect() to 169.254.169.254 port 80 (#0)
*   Trying 169.254.169.254...
* Connected to 169.254.169.254 (169.254.169.254) port 80 (#0)
GET /metadata/v1/InstanceInfo HTTP/1.1
User-Agent: curl/7.29.0
Host: 169.254.169.254
Accept: */*

< HTTP/1.1 200 OK
< Content-Type: text/json; charset=utf-8
< Server: Microsoft-IIS/10.0
< Date: Thu, 06 Dec 2018 13:53:16 GMT
< Content-Length: 43
<
* Connection #0 to host 169.254.169.254 left intact
{"ID":"_infra01","UD":"4","FD":"0"}
[[email protected] ~]#

Simple restart of the atomic node service to trigger the master registration:

[[email protected] ~]# systemctl restart atomic-openshift-node

The logs showed that the kubelet successfully got zone information

Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.341611   36813 kubelet_node_status.go:270] Setting node annotation to enable volume controller attach/detach
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.414847   36813 kubelet_node_status.go:326] Adding node label from cloud provider: beta.kubernetes.io/instance-type=Standard_D8_
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.414881   36813 kubelet_node_status.go:337] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/zone=0
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.414890   36813 kubelet_node_status.go:341] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/region=ukso
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.414966   36813 kubelet_node_status.go:488] Using Node Hostname from cloudprovider: "infra01"
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.420823   36813 kubelet_node_status.go:437] Recording NodeHasSufficientDisk event message for node infra01
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.420907   36813 kubelet_node_status.go:437] Recording NodeHasSufficientMemory event message for node infra01
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.423139   36813 kubelet_node_status.go:437] Recording NodeHasNoDiskPressure event message for node infra01
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.423235   36813 kubelet_node_status.go:82] Attempting to register node infra01
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.435412   36813 kubelet_node_status.go:85] Successfully registered node infra01
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.437308   36813 kubelet_node_status.go:488] Using Node Hostname from cloudprovider: "infra01"
Dec 07 16:03:21 infra01 atomic-openshift-node[36736]: I1207 16:03:21.441482   36813 manager.go:311] Recovery completed

The Infra node successfully registered again the OpenShift master and the node went into “Ready”:

[[email protected] ~]# oc get nodes -o wide --show-labels | grep 'region=infra'
infra01   Ready                      40s       v1.7.6+a08f5eeb62           Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-693.11.6.el7.x86_64   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D8_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=uksouth,failure-domain.beta.kubernetes.io/zone=0,kubernetes.io/hostname=infra01,logging-infra-fluentd=true,purpose=infra,region=infra,zone=1
infra02   Ready                      1d        v1.7.6+a08f5eeb62           Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-693.11.6.el7.x86_64   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D8_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=uksouth,failure-domain.beta.kubernetes.io/zone=1,kubernetes.io/hostname=infra02,logging-infra-fluentd=true,purpose=infra,region=infra,zone=0
infra03   Ready                      1d        v1.7.6+a08f5eeb62           Red Hat Enterprise Linux Server 7.6 (Maipo)   3.10.0-693.11.6.el7.x86_64   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D8_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=uksouth,failure-domain.beta.kubernetes.io/zone=0,kubernetes.io/hostname=infra03,logging-infra-fluentd=true,purpose=infra,region=infra,zone=2
[[email protected] ~]#

In the end, the root cause was the Avi East West subnet range which was set to 169.254.0.0/16 on the Avi controller nodes. Even the East West communication was deactivated on Avi, because kube_proxy was used, which made the Avi controller configure PBR on the bravi bridge for the 169.254.0.0/16 subnet range. This subnet range was previously used on all the on-prem datacenters and never caused issues since moving to cloud because the 169.254.169.254 is commonly used on cloud provider for instances to collect metadata information.

Deploy OpenShift 3.11 Container Platform on AWS using Terraform

I have done a few changes on my Terraform configuration for OpenShift 3.11 on Amazon AWS. I have downsized the environment because I didn’t needed that many nodes for a quick test setup. I have added CloudFlare DNS to automatically create CNAME for the AWS load balancers on the DNS zone. I have also added an AWS S3 Bucket for storing the backend state. You can find the new Terraform configuration on my Github repository: https://github.com/berndonline/openshift-terraform/tree/aws-dev

From OpenShift 3.10 and later versions the environment variables changes and I modified the ansible-hosts template for the new configuration. You can see the changes in the hosts template: https://github.com/berndonline/openshift-terraform/blob/aws-dev/helper_scripts/ansible-hosts.template.txt

OpenShift 3.11 has changed a few things and put an focus on an Cluster Operator console which is pretty nice and runs on Kubernetes 1.11. I recommend reading the release notes for the 3.11 release for more details: https://docs.openshift.com/container-platform/3.11/release_notes/ocp_3_11_release_notes.html

I don’t wanted to get into too much detail, just follow the steps below and start with cloning my repository, and choose the dev branch:

git clone -b aws-dev https://github.com/berndonline/openshift-terraform.git
cd ./openshift-terraform/
ssh-keygen -b 2048 -t rsa -f ./helper_scripts/id_rsa -q -N ""
chmod 600 ./helper_scripts/id_rsa

You need to modify the cloudflare.tf and add your CloudFlare API credentials otherwise just delete the file. The same for the S3 backend provider, you find the configuration in the main.tf and it can be removed if not needed.

CloudFlare and Amazon AWS credentials can be added through environment variables:

export AWS_ACCESS_KEY_ID='<-YOUR-AWS-ACCESS-KEY->'
export AWS_SECRET_ACCESS_KEY='<-YOUR-AWS-SECRET-KEY->'
export TF_VAR_email='<-YOUR-CLOUDFLARE-EMAIL-ADDRESS->'
export TF_VAR_token='<-YOUR-CLOUDFLARE-TOKEN->'
export TF_VAR_domain='<-YOUR-CLOUDFLARE-DOMAIN->'
export TF_VAR_htpasswd='<-YOUR-OPENSHIFT-DEMO-USER-HTPASSWD->'

Run terraform init and apply to create the environment.

terraform init && terraform apply -auto-approve

Copy the ssh key and ansible-hosts file to the bastion host from where you need to run the Ansible OpenShift playbooks.

scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -r ./helper_scripts/id_rsa [email protected]$(terraform output bastion):/home/centos/.ssh/
scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -r ./inventory/ansible-hosts  [email protected]$(terraform output bastion):/home/centos/ansible-hosts

I recommend waiting a few minutes as the AWS cloud-init script prepares the bastion host. Afterwards continue with the pre and install playbooks. You can connect to the bastion host and run the playbooks directly.

ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -l centos $(terraform output bastion) -A "cd /openshift-ansible/ && ansible-playbook ./playbooks/openshift-pre.yml -i ~/ansible-hosts"
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -l centos $(terraform output bastion) -A "cd /openshift-ansible/ && ansible-playbook ./playbooks/openshift-install.yml -i ~/ansible-hosts"

If for whatever reason the cluster deployment fails, you can run the uninstall playbook to bring the nodes back into a clean state and start from the beginning and run deploy_cluster.

ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -l centos $(terraform output bastion) -A "cd /openshift-ansible/ && ansible-playbook ./openshift-ansible/playbooks/adhoc/uninstall.yml -i ~/ansible-hosts"

Here are some screenshots of the new cluster console:

Let’s create a project and import my hello-openshift.yml build configuration:

Successful completed the build and deployed the hello-openshift container:

My example hello openshift application:

When you are finished with the testing, run terraform destroy.

terraform destroy -force 

 

Terraform AWS S3 Bucket backend state and create IAM credentials

I am currently working on refactoring my Terraform configuration for deploying OpenShift 3.11 on AWS. I wanted to share some of the improvements I have made on the configuration by adding AWS S3 as a backend provider and using a custom IAM user for Terraform.

Let’s start with creating an AWS S3 Bucket for the Terraform backend state. You can find information about the Terraform S3 backend provider here: https://www.terraform.io/docs/backends/types/s3.html

First you need to create the S3 bucket on the AWS console:

 

It’s a pretty simple setup and below we see the successfully created S3 bucket:

To use the S3 bucket for the backend state, modify your my main.tf:

terraform {
  backend "s3" {
    bucket = "techbloc-terraform-data"
    key    = "openshift-311"
    region = "eu-west-1"
  }
}

When you run terraform apply it uses the specified S3 bucket to store the backend state and can be used from multiple users.

Instead of using your AWS Root account, it’s  better to create a custom AWS IAM user for Terraform and apply a few limitations for what the user is able to do on AWS.

Go to the AWS IAM service and create a new user with name Terraform. I would strongly recommend using only programmatic access which generates Access Key ID and Secret Access Key.

Create a Terraform IAM user:

In my simple example I created three additional policies to control the access to my AWS subscription:

See the json config for each policy below.

terraform-eu-west-1

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "*",
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "ec2:Region": "eu-west-1"
                }
            }
        }
    ]
}

terraform-elb

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "elasticloadbalancing:*",
            "Resource": "*"
        }
    ]
}

terraform-s3

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::techbloc-terraform-data"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::techbloc-terraform-data/openshift-311"
        }
    ]
}

The IAM policies are not very complex, I just wanted to limit the access to a specific region.

The S3 backend provider is very important because I am planning to use Jenkins to deploy the AWS infrastructure with Terraform and storing the backend state locally on the Jenkins server is not very ideal.