Kubernetes GitOps at Scale with Cluster API and Flux CD

What does GitOps mean and how you run this at scale with Kubernetes? GitOps is basically a framework that takes traditional DevOps practices which where used for application development and apply them to platform automation.

This is nothing new and some maybe have done similar type of automation in the past but this wasn’t called GitOps back then. Kubernetes is great because of it’s declarative configuration management which makes it very easy to configure. This can become a challenge when you suddenly have to run 5, 10, 20 or 40 of these clusters across various cloud providers and multiple environments. We need a cluster management system feeding configuration from a code repository to run all our Kubernetes “cattle” workload clusters.

What I am trying to achieve with this design; that you can easily horizontally scale not only your workload clusters but also your cluster management system which is versioned across multiple cloud providers like you see in the diagram above.

There is of course a technical problem to all of this, finding the right tools to solve the problem and which work well together. In my example I will use the Cluster API for provisioning and managing the lifecycle of these Kubernetes workload clusters. Then we need Flux CD for the configuration management both the cluster management which runs the Cluster API components but also the configuration for the workload clusters. The Cluster API you can also replace with OpenShift Hive to run instead OKD or RedHat OpenShift clusters.

Another problem we need to think about is version control and the branching model for the platform configuration. The structure of the configuration is important but also how you implement changes or the versioning of your configuration through releases. I highly recommend reading about Trunk Based Development which is a modern branching model and specifically solves the versioning problem for us.

Git repository and folder structure

We need a git repository for storing the platform configuration both for the management- and workload-clusters, and the tenant namespace configuration (this also can be stored in a separate repositories). Let’s go through the folder structure of the repository and I will explain this in more detail. Checkout my example repository for more detail: github.com/berndonline/k8s-gitops-at-scale.

  • The features folder on the top-level will store configuration for specific features we want to enable and apply to our clusters (both management and worker). Under each <feature name> you find two subfolders for namespace(d)- and cluster-wide (non-namespaced) configuration. Features are part of platform configuration which will be promoted between environments. You will see namespaced and non-namespaced subfolders throughout the folder structure which is basically to group your configuration files.
    ├── features
    │   ├── access-control
    │   │   └── non-namespaced
    │   ├── helloworld-operator
    │   │   ├── namespaced
    │   │   └── non-namespaced
    │   └── ingress-nginx
    │       ├── namespaced
    │       └── non-namespaced
    
  • The providers folder will store the configuration based on cloud provider <name> and the <version> of your cluster management. The version below the cloud provider folder is needed to be able to spin up new management clusters in the future. You can be creative with the folder structure and have management cluster per environment and/or instead of the version if required. The mgmt folder will store the configuration for the management cluster which includes manifests for Flux CD controllers, the Cluster API to spin-up workload clusters which are separated by cluster name and anything else you want to configure on your management cluster. The clusters folder will store configuration for all workload clusters separated based on <environment> and common (applies across multiple clusters in the same environment) and by <cluster name> (applies to a dedicated cluster).
    ├── providers
    │   └── aws
    │       └── v1
    │           ├── clusters
    │           │   ├── non-prod
    │           │   │   ├── common
    │           │   │   │   ├── namespaced
    │           │   │   │   │   └── non-prod-common
    │           │   │   │   └── non-namespaced
    │           │   │   │       └── non-prod-common
    │           │   │   └── non-prod-eu-west-1
    │           │   │       ├── namespaced
    │           │   │       │   └── non-prod-eu-west-1
    │           │   │       └── non-namespaced
    │           │   │           └── non-prod-eu-west-1
    │           │   └── prod
    │           │       ├── common
    │           │       │   ├── namespaced
    │           │       │   │   └── prod-common
    │           │       │   └── non-namespaced
    │           │       │       └── prod-common
    │           │       └── prod-eu-west-1
    │           │           ├── namespaced
    │           │           │   └── prod-eu-west-1
    │           │           └── non-namespaced
    │           │               └── prod-eu-west-1
    │           └── mgmt
    │               ├── namespaced
    │               │   ├── flux-system
    │               │   ├── non-prod-eu-west-1
    │               │   └── prod-eu-west-1
    │               └── non-namespaced
    │                   ├── non-prod-eu-west-1
    │                   └── prod-eu-west-1
    
  • The tenants folder will store the namespace configuration of the onboarded teams and is applied to our workload clusters. Similar to the providers folder tenants has subfolders based on the cloud provider <name> and below subfolders for common (applies across environments) and <environments> (applied to a dedicated environment) configuration. There you find the tenant namespace <name> and all the needed manifests to create and configure the namespace/s.
    └── tenants
        └── aws
            ├── common
            │   └── dummy
            ├── non-prod
            │   └── dummy
            └── prod
                └── dummy
    

Why do we need a common folder for tenants? The common folder will contain namespace configuration which will be promoted between the environments from non-prod to prod using a release but more about release and promotion you find more down below.

Configuration changes

Applying changes to your platform configuration has to follow the Trunk Based Development model of doing small incremental changes through feature branches.

Let’s look into an example change the our dummy tenant onboarding pull-request. You see that I checked-out a branch called “tenant-dummy” to apply my changes, then push and publish the branch in the repository to raised the pull-request.

Important is that your commit messages and pull-request name are following a strict naming convention.

I would also strongly recommend to squash your commit messages into the name of your pull-request. This will keep your git history clean.

This naming convention makes it easier later for auto-generating your release notes when you publish your release. Having the clean well formatted git history combined with your release notes nicely cross references your changes for to a particular release tag.

More about creating a release a bit later in this article.

GitOps configuration

The configuration from the platform repository gets pulled on the management cluster using different gitrepository resources following the main branch or a version tag.

$ kubectl get gitrepositories.source.toolkit.fluxcd.io -A
NAMESPACE     NAME      URL                                                    AGE   READY   STATUS
flux-system   main      ssh://[email protected]/berndonline/k8s-gitops-at-scale   2d    True    stored artifact for revision 'main/ee3e71efb06628775fa19e9664b9194848c6450e'
flux-system   release   ssh://[email protected]/berndonline/k8s-gitops-at-scale   2d    True    stored artifact for revision 'v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff'

The kustomization resources will then render and apply the configuration locally to the management cluster (diagram left-side) or remote clusters to our non-prod and prod workload clusters (diagram right-side) using the kubeconfig of the cluster created by the Cluster API stored during the bootstrap.

There are multiple kustomization resources to apply configuration based off the folder structure which I explained above. See the output below and checkout the repository for more details.

$ kubectl get kustomizations.kustomize.toolkit.fluxcd.io -A
NAMESPACE            NAME                          AGE   READY   STATUS
flux-system          feature-access-control        13h   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
flux-system          mgmt                          2d    True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   common                        21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   feature-access-control        21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   feature-helloworld-operator   21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   feature-ingress-nginx         21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   non-prod-eu-west-1            21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   tenants-common                21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   tenants-non-prod              21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
prod-eu-west-1       common                        15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
prod-eu-west-1       feature-access-control        15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
prod-eu-west-1       feature-helloworld-operator   15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
prod-eu-west-1       feature-ingress-nginx         15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
prod-eu-west-1       prod-eu-west-1                15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
prod-eu-west-1       tenants-common                15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
prod-eu-west-1       tenants-prod                  15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff

Release and promotion

The GitOps framework doesn’t explain about how to do promotion to higher environments and this is where the Trunk Based Development model comes in helpful together with the gitrepository resource to be able to pull a tagged version instead of a branch.

This allows us applying configuration first to lower environments to non-prod following the main branch, means pull-requests which are merged will be applied instantly. Configuration for higher environments to production requires to create a version tag and publish a release in the repository.

Why using a tag and not a release branch? A tag in your repository is a point in time snapshot of your configuration and can’t be easily modified which is required for creating the release. A branch on the other hand can be modified using pull-requests and you end up with lots of release branches which is less ideal.

To create a new version tag in the git repository I use the following commands:

$ git tag v0.0.3
$ git push origin --tags
Total 0 (delta 0), reused 0 (delta 0)
To github.com:berndonline/k8s-gitops-at-scale.git
* [new tag] v0.0.3 -> v0.0.3

This doesn’t do much after we pushed the new tag because the gitrepository release is set to v0.0.2 but I can see the new tag is available in the repository.

In the repository I can go to releases and click on “Draft a new release” and choose the new tag v0.0.3 I pushed previously.

The release notes you see below can be auto-generate from the pull-requests you merged between v0.0.2 and v0.0.3 by clicking “Generate release notes”. To finish this off save and publish the release.


The release is publish and release notes are visible to everyone which is great for product teams on your platform because they will get visibility about upcoming changes including their own modifications to namespace configuration.

Until now all the changes are applied to our lower non-prod environment following the main branch and for doing the promotion we need to raise a pull-request and update the gitrepository release the new version v0.0.3.

If you follow ITIL change procedures then this is the point where you would normally raise a change for merging your pull-request because this triggers the rollout of your configuration to production.

When the pull-request is merged the release gitrepository is updated by the kustomization resources through the main branch.

$ kubectl get gitrepositories.source.toolkit.fluxcd.io -A
NAMESPACE     NAME      URL                                           AGE   READY   STATUS
flux-system   main      ssh://[email protected]/berndonline/k8s-gitops   2d    True    stored artifact for revision 'main/83133756708d2526cca565880d069445f9619b70'
flux-system   release   ssh://[email protected]/berndonline/k8s-gitops   2d    True    stored artifact for revision 'v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8'

Shortly after the kustomization resources referencing the release will reconcile and automatically push down the new rendered configuration to the production clusters.

$ kubectl get kustomizations.kustomize.toolkit.fluxcd.io -A
NAMESPACE            NAME                          AGE   READY   STATUS
flux-system          feature-access-control        13h   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
flux-system          mgmt                          2d    True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   common                        31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   feature-access-control        31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   feature-helloworld-operator   31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   feature-ingress-nginx         31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   non-prod-eu-west-1            31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   tenants-common                31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   tenants-non-prod              31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
prod-eu-west-1       common                        26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
prod-eu-west-1       feature-access-control        26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
prod-eu-west-1       feature-helloworld-operator   26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
prod-eu-west-1       feature-ingress-nginx         26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
prod-eu-west-1       prod-eu-west-1                26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
prod-eu-west-1       tenants-common                26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
prod-eu-west-1       tenants-prod                  26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8

Why using Kustomize for managing the configuration and not Helm? I know the difficulties of managing these raw YAML manifests. Kustomize gets you going quick where with Helm there is a higher initial effort writing your Charts. In my next article I will focus specifically on Helm.

I showed a very simplistic example having a single cloud provider (aws) and a single management cluster but as you have seen you can easily add Azure or Google cloud providers in your configuration and scale horizontally. I think this is what makes Kubernetes and controllers like Flux CD great together that you don’t need to have complex pipelines or workflows to rollout and promote your changes completely pipeline-less.

 

OpenShift / OKD 4.x Cluster Deployment using OpenShift Hive

Before you continue to deploy an OpenShift or OKD cluster please check out my other posts about OpenShift Hive – API driven OpenShift cluster provisioning and management operator and Getting started with OpenShift Hive  because you need a running OpenShift Hive operator.

To install the OKD (OpenShift Origin Community Distribution) version we need a few things beforehand: a cluster namespace, AWS credentials, SSH keys, image pull secret, install-config, cluster image version and cluster deployment.

Let’s start to create the cluster namespace:

cat <<EOF | kubectl apply -f -
---
apiVersion: v1
kind: Namespace
metadata:
  name: okd

Create a secret with your ssh key:

$ kubectl create secret generic ssh-key -n okd --from-file=ssh-privatekey=/home/ubuntu/.ssh/id_rsa --from-file=ssh-publickey=/home/ubuntu/.ssh/id_rsa.pub

Create the AWS credential secret:

$ kubectl create secret generic aws-creds -n okd --from-literal=aws_secret_access_key=$AWS_SECRET_ACCESS_KEY --from-literal=aws_access_key_id=$AWS_ACCESS_KEY_ID

Create an image pull secret, this is not important for installing a OKD 4.x cluster but needs to be present otherwise Hive will not start the cluster deployment. If you have an RedHat Enterprise subscription for OpenShift then you need to add here your RedHat image pull secret:

$ kubectl create secret generic pull-secret -n okd --from-file=.dockerconfigjson=/home/ubuntu/.docker/config.json --type=kubernetes.io/dockerconfigjson 

Create a install-config.yaml for the cluster deployment and modify to your needs:

---
apiVersion: v1
baseDomain: kube.domain.com
compute:
- name: worker
  platform:
    aws:
      rootVolume:
        iops: 100
        size: 22
        type: gp2
      type: m4.xlarge
  replicas: 3
controlPlane:
  name: master
  platform:
    aws:
      rootVolume:
        iops: 100
        size: 22
        type: gp2
      type: m4.xlarge
replicas: 3
metadata:
  creationTimestamp: null
  name: okd
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineCIDR: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: eu-west-1
pullSecret: ""
sshKey: ""

Create the install-config secret for the cluster deployment:

$ kubectl create secret generic install-config -n okd --from-file=install-config.yaml=./install-config.yaml

Create the ClusterImageSet for OKD. In my example I am using the latest OKD 4.4.0 release. More information about the available OKD release versions you find here: https://origin-release.svc.ci.openshift.org/

cat <<EOF | kubectl apply -f -
---
apiVersion: hive.openshift.io/v1
kind: ClusterImageSet
metadata:
  name: okd-4-4-0-imageset
spec:
  releaseImage: registry.svc.ci.openshift.org/origin/release:4.4.0-0.okd-2020-02-18-212654
EOF 

Below is an example of a RedHat Enterprise OpenShift 4 ClusterImageSet:

---
apiVersion: hive.openshift.io/v1
kind: ClusterImageSet
metadata:
  name: openshift-4-3-0-imageset
spec:
  releaseImage: quay.io/openshift-release-dev/ocp-release:4.3.0-x86_64

For Hive to start with the cluster deployment, we need to modify the manifest below and add the references to the previous created secrets, install-config and cluster imageset version:

cat <<EOF | kubectl apply -f -
---
apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
  creationTimestamp: null
  name: okd
  namespace: okd
spec:
  baseDomain: kube.domain.com
  clusterName: okd
  controlPlaneConfig:
    servingCertificates: {}
  installed: false
  platform:
    aws:
      credentialsSecretRef:
        name: aws-creds
      region: eu-west-1
  provisioning:
    imageSetRef:
      name: okd-4-4-0-imageset
    installConfigSecretRef:
      name: install-config 
  pullSecretRef:
    name: pull-secret
  sshKey:
    name: ssh-key
status:
  clusterVersionStatus:
    availableUpdates: null
    desired:
      force: false
      image: ""
      version: ""
    observedGeneration: 0
    versionHash: ""
EOF

Once you submitted the ClusterDeployment manifest, the Hive operator will start to deploy the cluster straightaway:

$ kubectl get clusterdeployments.hive.openshift.io -n okd
NAME   CLUSTERNAME   CLUSTERTYPE   BASEDOMAIN          INSTALLED   INFRAID     AGE
okd    okd                         kube.domain.com     false       okd-jcdkd   107s

Hive will create the provision (install) pod for the cluster deployment and inject the installer configuration:

$ kubectl get pods -n okd
NAME                          READY   STATUS    RESTARTS   AGE
okd-0-tbm9t-provision-c5hpf   1/3     Running   0          57s

You can view the logs to check the progress of the cluster deployment. You will see the terraform output for creating the infrastructure resources and feedback from the installer about the installation progress. At the end you will see when the installation completed successfully:

$ kubectl logs okd-0-tbm9t-provision-c5hpf -n okd -c hive -f
...
time="2020-02-23T13:31:41Z" level=debug msg="module.dns.aws_route53_zone.int: Creating..."
time="2020-02-23T13:31:42Z" level=debug msg="aws_ami_copy.main: Still creating... [3m40s elapsed]"
time="2020-02-23T13:31:51Z" level=debug msg="module.dns.aws_route53_zone.int: Still creating... [10s elapsed]"
time="2020-02-23T13:31:52Z" level=debug msg="aws_ami_copy.main: Still creating... [3m50s elapsed]"
time="2020-02-23T13:32:01Z" level=debug msg="module.dns.aws_route53_zone.int: Still creating... [20s elapsed]"
time="2020-02-23T13:32:02Z" level=debug msg="aws_ami_copy.main: Still creating... [4m0s elapsed]"
time="2020-02-23T13:32:11Z" level=debug msg="module.dns.aws_route53_zone.int: Still creating... [30s elapsed]"
time="2020-02-23T13:32:12Z" level=debug msg="aws_ami_copy.main: Still creating... [4m10s elapsed]"
time="2020-02-23T13:32:21Z" level=debug msg="module.dns.aws_route53_zone.int: Still creating... [40s elapsed]"
time="2020-02-23T13:32:22Z" level=debug msg="aws_ami_copy.main: Still creating... [4m20s elapsed]"
time="2020-02-23T13:32:31Z" level=debug msg="module.dns.aws_route53_zone.int: Still creating... [50s elapsed]"
time="2020-02-23T13:32:32Z" level=debug msg="aws_ami_copy.main: Still creating... [4m30s elapsed]"
time="2020-02-23T13:32:41Z" level=debug msg="module.dns.aws_route53_zone.int: Still creating... [1m0s elapsed]"
time="2020-02-23T13:32:41Z" level=debug msg="module.dns.aws_route53_zone.int: Creation complete after 1m0s [id=Z10411051RAEUMMAUH39E]"
time="2020-02-23T13:32:41Z" level=debug msg="module.dns.aws_route53_record.etcd_a_nodes[0]: Creating..."
time="2020-02-23T13:32:41Z" level=debug msg="module.dns.aws_route53_record.api_internal: Creating..."
time="2020-02-23T13:32:41Z" level=debug msg="module.dns.aws_route53_record.api_external_internal_zone: Creating..."
time="2020-02-23T13:32:41Z" level=debug msg="module.dns.aws_route53_record.etcd_a_nodes[2]: Creating..."
time="2020-02-23T13:32:41Z" level=debug msg="module.dns.aws_route53_record.etcd_a_nodes[1]: Creating..."
time="2020-02-23T13:32:42Z" level=debug msg="aws_ami_copy.main: Still creating... [4m40s elapsed]"
time="2020-02-23T13:32:51Z" level=debug msg="module.dns.aws_route53_record.etcd_a_nodes[0]: Still creating... [10s elapsed]"
time="2020-02-23T13:32:51Z" level=debug msg="module.dns.aws_route53_record.api_internal: Still creating... [10s elapsed]"
time="2020-02-23T13:32:51Z" level=debug msg="module.dns.aws_route53_record.api_external_internal_zone: Still creating... [10s elapsed]"
time="2020-02-23T13:32:51Z" level=debug msg="module.dns.aws_route53_record.etcd_a_nodes[2]: Still creating... [10s elapsed]"
time="2020-02-23T13:32:51Z" level=debug msg="module.dns.aws_route53_record.etcd_a_nodes[1]: Still creating... [10s elapsed]"
time="2020-02-23T13:32:52Z" level=debug msg="aws_ami_copy.main: Still creating... [4m50s elapsed]"
...
time="2020-02-23T13:34:43Z" level=debug msg="Apply complete! Resources: 123 added, 0 changed, 0 destroyed."
time="2020-02-23T13:34:43Z" level=debug msg="OpenShift Installer unreleased-master-2446-gc108297de972e1a6a5fb502a7668079d16e501f9-dirty"
time="2020-02-23T13:34:43Z" level=debug msg="Built from commit c108297de972e1a6a5fb502a7668079d16e501f9"
time="2020-02-23T13:34:43Z" level=info msg="Waiting up to 20m0s for the Kubernetes API at https://api.okd.kube.domain.com:6443..."
time="2020-02-23T13:35:13Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.okd.kube.domain.com:6443/version?timeout=32s: dial tcp 52.17.210.160:6443: connect: connection refused"
time="2020-02-23T13:35:50Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.okd.kube.domain.com:6443/version?timeout=32s: dial tcp 52.211.227.216:6443: connect: connection refused"
time="2020-02-23T13:36:20Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.okd.kube.domain.com:6443/version?timeout=32s: dial tcp 52.17.210.160:6443: connect: connection refused"
time="2020-02-23T13:36:51Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.okd.kube.domain.com:6443/version?timeout=32s: dial tcp 52.211.227.216:6443: connect: connection refused"
time="2020-02-23T13:37:58Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.okd.kube.domain.com:6443/version?timeout=32s: dial tcp 52.211.227.216:6443: connect: connection refused"
time="2020-02-23T13:38:00Z" level=debug msg="Still waiting for the Kubernetes API: the server could not find the requested resource"
time="2020-02-23T13:38:30Z" level=debug msg="Still waiting for the Kubernetes API: the server could not find the requested resource"
time="2020-02-23T13:38:58Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.okd.kube.domain.com:6443/version?timeout=32s: dial tcp 52.211.227.216:6443: connect: connection refused"
time="2020-02-23T13:39:28Z" level=debug msg="Still waiting for the Kubernetes API: Get https://api.okd.kube.domain.com:6443/version?timeout=32s: dial tcp 63.35.50.149:6443: connect: connection refused"
time="2020-02-23T13:39:36Z" level=info msg="API v1.17.1 up"
time="2020-02-23T13:39:36Z" level=info msg="Waiting up to 40m0s for bootstrapping to complete..."
...
time="2020-02-23T13:55:14Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.4.0-0.okd-2020-02-18-212654: 97% complete"
time="2020-02-23T13:55:24Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.4.0-0.okd-2020-02-18-212654: 99% complete"
time="2020-02-23T13:57:39Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.4.0-0.okd-2020-02-18-212654: 99% complete, waiting on authentication, console, monitoring"
time="2020-02-23T13:57:39Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.4.0-0.okd-2020-02-18-212654: 99% complete, waiting on authentication, console, monitoring"
time="2020-02-23T13:58:54Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.4.0-0.okd-2020-02-18-212654: 99% complete"
time="2020-02-23T14:01:40Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.4.0-0.okd-2020-02-18-212654: 100% complete, waiting on authentication"
time="2020-02-23T14:03:24Z" level=debug msg="Cluster is initialized"
time="2020-02-23T14:03:24Z" level=info msg="Waiting up to 10m0s for the openshift-console route to be created..."
time="2020-02-23T14:03:24Z" level=debug msg="Route found in openshift-console namespace: console"
time="2020-02-23T14:03:24Z" level=debug msg="Route found in openshift-console namespace: downloads"
time="2020-02-23T14:03:24Z" level=debug msg="OpenShift console route is created"
time="2020-02-23T14:03:24Z" level=info msg="Install complete!"
time="2020-02-23T14:03:24Z" level=info msg="To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/output/auth/kubeconfig'"
time="2020-02-23T14:03:24Z" level=info msg="Access the OpenShift web-console here: https://console-openshift-console.apps.okd.kube.domain.com"
REDACTED LINE OF OUTPUT
time="2020-02-23T14:03:25Z" level=info msg="command completed successfully" installID=jcdkd
time="2020-02-23T14:03:25Z" level=info msg="saving installer output" installID=jcdkd
time="2020-02-23T14:03:25Z" level=debug msg="installer console log: level=info msg=\"Credentials loaded from default AWS environment variables\"\nlevel=info msg=\"Consuming Install Config from target directory\"\nlevel=warning msg=\"Found override for release image. Please be warned, this is not advised\"\nlevel=info msg=\"Consuming Master Machines from target directory\"\nlevel=info msg=\"Consuming Common Manifests from target directory\"\nlevel=info msg=\"Consuming OpenShift Install from target directory\"\nlevel=info msg=\"Consuming Worker Machines from target directory\"\nlevel=info msg=\"Consuming Openshift Manifests from target directory\"\nlevel=info msg=\"Consuming Master Ignition Config from target directory\"\nlevel=info msg=\"Consuming Worker Ignition Config from target directory\"\nlevel=info msg=\"Consuming Bootstrap Ignition Config from target directory\"\nlevel=info msg=\"Creating infrastructure resources...\"\nlevel=info msg=\"Waiting up to 20m0s for the Kubernetes API at https://api.okd.kube.domain.com:6443...\"\nlevel=info msg=\"API v1.17.1 up\"\nlevel=info msg=\"Waiting up to 40m0s for bootstrapping to complete...\"\nlevel=info msg=\"Destroying the bootstrap resources...\"\nlevel=error\nlevel=error msg=\"Warning: Resource targeting is in effect\"\nlevel=error\nlevel=error msg=\"You are creating a plan with the -target option, which means that the result\"\nlevel=error msg=\"of this plan may not represent all of the changes requested by the current\"\nlevel=error msg=configuration.\nlevel=error msg=\"\\t\\t\"\nlevel=error msg=\"The -target option is not for routine use, and is provided only for\"\nlevel=error msg=\"exceptional situations such as recovering from errors or mistakes, or when\"\nlevel=error msg=\"Terraform specifically suggests to use it as part of an error message.\"\nlevel=error\nlevel=error\nlevel=error msg=\"Warning: Applied changes may be incomplete\"\nlevel=error\nlevel=error msg=\"The plan was created with the -target option in effect, so some changes\"\nlevel=error msg=\"requested in the configuration may have been ignored and the output values may\"\nlevel=error msg=\"not be fully updated. Run the following command to verify that no other\"\nlevel=error msg=\"changes are pending:\"\nlevel=error msg=\"    terraform plan\"\nlevel=error msg=\"\\t\"\nlevel=error msg=\"Note that the -target option is not suitable for routine use, and is provided\"\nlevel=error msg=\"only for exceptional situations such as recovering from errors or mistakes, or\"\nlevel=error msg=\"when Terraform specifically suggests to use it as part of an error message.\"\nlevel=error\nlevel=info msg=\"Waiting up to 30m0s for the cluster at https://api.okd.kube.domain.com:6443 to initialize...\"\nlevel=info msg=\"Waiting up to 10m0s for the openshift-console route to be created...\"\nlevel=info msg=\"Install complete!\"\nlevel=info msg=\"To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/output/auth/kubeconfig'\"\nlevel=info msg=\"Access the OpenShift web-console here: https://console-openshift-console.apps.okd.kube.domain.com\"\nREDACTED LINE OF OUTPUT\n" installID=vxghr9br
time="2020-02-23T14:03:25Z" level=info msg="install completed successfully" installID=jcdkd

After the installation of the cluster deployment has finished, the Installed value is set to True:

$ kubectl get clusterdeployments.hive.openshift.io  -n okd
NAME   CLUSTERNAME   CLUSTERTYPE   BASEDOMAIN          INSTALLED   INFRAID      AGE
okd    okd                         kube.domain.com     true        okd-jcdkd    54m

At this point you can start using the platform by getting the login credentials from the cluster credential secret Hive created during the installation:

$ kubectl get secrets -n okd okd-0-tbm9t-admin-password -o jsonpath='{.data.username}' | base64 -d
kubeadmin
$ kubectl get secrets -n okd okd-0-tbm9t-admin-password -o jsonpath='{.data.password}' | base64 -d
2T38d-aETpX-dj2YU-UBN4a

Log in via the command-line or the web console:

To delete the cluster simply delete the ClusterDeployment resources which initiates a cluster deprovision and will delete all related AWS resources. If the deprovision gets stuck, manually delete the uninstall finalizer allowing the cluster deployment to be deleted, but note that this may leave artifacts in your AWS account:

$ kubectl delete clusterdeployments.hive.openshift.io okd -n okd --wait=false
clusterdeployment.hive.openshift.io "okd" deleted

Please visit the OpenShift Hive documentation for more information about using Hive.

In the next article I will explain how you can use OpenShift Hive to create, update, delete, patch cluster resources using SyncSets.

Running Istio Service Mesh on OpenShift

In the Kubernetes/OpenShift community everyone is talking about Istio service mesh, so I wanted to share my experience about the installation and running a sample microservice application with Istio on OpenShift 3.11 and 4.0. Service mesh on OpenShift is still at least a few month away from being available generally to run in production but this gives you the possibility to start testing and exploring Istio. I have found good documentation about installing Istio on OCP and OKD have a look for more information.

To install Istio on OpenShift 3.11 you need to apply the node and master prerequisites you see below; for OpenShift 4.0 and above you can skip these steps and go directly to the istio-operator installation:

sudo bash -c 'cat << EOF > /etc/origin/master/master-config.patch
admissionConfig:
  pluginConfig:
    MutatingAdmissionWebhook:
      configuration:
        apiVersion: apiserver.config.k8s.io/v1alpha1
        kubeConfigFile: /dev/null
        kind: WebhookAdmission
    ValidatingAdmissionWebhook:
      configuration:
        apiVersion: apiserver.config.k8s.io/v1alpha1
        kubeConfigFile: /dev/null
        kind: WebhookAdmission
EOF'
        
sudo cp -p /etc/origin/master/master-config.yaml /etc/origin/master/master-config.yaml.prepatch
sudo bash -c 'oc ex config patch /etc/origin/master/master-config.yaml.prepatch -p "$(cat /etc/origin/master/master-config.patch)" > /etc/origin/master/master-config.yaml'
sudo su -
master-restart api
master-restart controllers
exit       

sudo bash -c 'cat << EOF > /etc/sysctl.d/99-elasticsearch.conf 
vm.max_map_count = 262144
EOF'

sudo sysctl vm.max_map_count=262144

The Istio installation is straight forward by starting first to install the istio-operator:

oc new-project istio-operator
oc new-app -f https://raw.githubusercontent.com/Maistra/openshift-ansible/maistra-0.9/istio/istio_community_operator_template.yaml --param=OPENSHIFT_ISTIO_MASTER_PUBLIC_URL=<-master-public-hostname->

Verify the operator deployment:

oc logs -n istio-operator $(oc -n istio-operator get pods -l name=istio-operator --output=jsonpath={.items..metadata.name})

Once the operator is running we can start deploying Istio components by creating a custom resource:

cat << EOF >  ./istio-installation.yaml
apiVersion: "istio.openshift.com/v1alpha1"
kind: "Installation"
metadata:
  name: "istio-installation"
  namespace: istio-operator
EOF

oc create -n istio-operator -f ./istio-installation.yaml

Check and watch the Istio installation progress which might take a while to complete:

oc get pods -n istio-system -w

# The installation of the core components is finished when you see:
...
openshift-ansible-istio-installer-job-cnw72   0/1       Completed   0         4m

Afterwards, to finish off the Istio installation, we need to install the Kiali web console:

bash <(curl -L https://git.io/getLatestKialiOperator)
oc get route -n istio-system -l app=kiali

Verifying that all Istio components are running:

$ oc get pods -n istio-system
NAME                                          READY     STATUS      RESTARTS   AGE
elasticsearch-0                               1/1       Running     0          9m
grafana-74b5796d94-4ll5d                      1/1       Running     0          9m
istio-citadel-db879c7f8-kfxfk                 1/1       Running     0          11m
istio-egressgateway-6d78858d89-58lsd          1/1       Running     0          11m
istio-galley-6ff54d9586-8r7cl                 1/1       Running     0          11m
istio-ingressgateway-5dcf9fdf4b-4fjj5         1/1       Running     0          11m
istio-pilot-7ccf64f659-ghh7d                  2/2       Running     0          11m
istio-policy-6c86656499-v45zr                 2/2       Running     3          11m
istio-sidecar-injector-6f696b8495-8qqjt       1/1       Running     0          11m
istio-telemetry-686f78b66b-v7ljf              2/2       Running     3          11m
jaeger-agent-k4tpz                            1/1       Running     0          9m
jaeger-collector-64bc5678dd-wlknc             1/1       Running     0          9m
jaeger-query-776d4d754b-8z47d                 1/1       Running     0          9m
kiali-5fd946b855-7lw2h                        1/1       Running     0          2m
openshift-ansible-istio-installer-job-cnw72   0/1       Completed   0          13m
prometheus-75b849445c-l7rlr                   1/1       Running     0          11m

Let’s start to deploy the microservice application example by using the Google Hipster Shop, it contains multiple microservices which is great to test with Istio:

# Create new project
oc new-project hipster-shop

# Set permissions to allow Istio to deploy the Envoy-Proxy side-car container
oc adm policy add-scc-to-user anyuid -z default -n hipster-shop
oc adm policy add-scc-to-user privileged -z default -n hipster-shop

# Create Hipster Shop deployments and Istio services
oc create -f https://raw.githubusercontent.com/berndonline/openshift-ansible/master/examples/istio-hipster-shop.yml
oc create -f https://raw.githubusercontent.com/berndonline/openshift-ansible/master/examples/istio-manifest.yml

# Wait and check that all pods are running before creating the load generator
oc get pods -n hipster-shop -w

# Create load generator deployment
oc create -f https://raw.githubusercontent.com/berndonline/openshift-ansible/master/examples/istio-loadgenerator.yml

As you see below each pod has a sidecar container with the Istio Envoy proxy which handles pod traffic:

[centos@ip-172-26-1-167 ~]$ oc get pods
NAME                                     READY     STATUS    RESTARTS   AGE
adservice-7894dbfd8c-g4m9v               2/2       Running   0          49m
cartservice-758d66c648-79fj4             2/2       Running   4          49m
checkoutservice-7b9dc8b755-h2b2v         2/2       Running   0          49m
currencyservice-7b5c5f48fc-gtm9x         2/2       Running   0          49m
emailservice-79578566bb-jvwbw            2/2       Running   0          49m
frontend-6497c5f748-5fc4f                2/2       Running   0          49m
loadgenerator-764c5547fc-sw6mg           2/2       Running   0          40m
paymentservice-6b989d657c-klp4d          2/2       Running   0          49m
productcatalogservice-5bfbf4c77c-cw676   2/2       Running   0          49m
recommendationservice-c947d84b5-svbk8    2/2       Running   0          49m
redis-cart-79d84748cf-cvg86              2/2       Running   0          49m
shippingservice-6ccb7d8ff7-66v8m         2/2       Running   0          49m
[centos@ip-172-26-1-167 ~]$

The Kiali web console answers the question about what microservices are part of the service mesh and how are they connected which gives you a great level of detail about the traffic flows:

Detailed traffic flow view:

The Isito installation comes with Jaeger which is an open source tracing tool to monitor and troubleshoot transactions:

Enough about this, lets connect to our cool Hipster Shop and happy shopping:

Additionally there is another example, the Istio Bookinfo if you want to try something smaller and less complex:

oc new-project myproject

oc adm policy add-scc-to-user anyuid -z default -n myproject
oc adm policy add-scc-to-user privileged -z default -n myproject

oc apply -n myproject -f https://raw.githubusercontent.com/Maistra/bookinfo/master/bookinfo.yaml
oc apply -n myproject -f https://raw.githubusercontent.com/Maistra/bookinfo/master/bookinfo-gateway.yaml
export GATEWAY_URL=$(oc get route -n istio-system istio-ingressgateway -o jsonpath='{.spec.host}')
curl -o /dev/null -s -w "%{http_code}\n" http://$GATEWAY_URL/productpage

curl -o destination-rule-all.yaml https://raw.githubusercontent.com/istio/istio/release-1.0/samples/bookinfo/networking/destination-rule-all.yaml
oc apply -f destination-rule-all.yaml

curl -o destination-rule-all-mtls.yaml https://raw.githubusercontent.com/istio/istio/release-1.0/samples/bookinfo/networking/destination-rule-all-mtls.yaml
oc apply -f destination-rule-all-mtls.yaml

oc get destinationrules -o yaml

I hope this is a useful article for getting started with Istio service mesh on OpenShift.

Getting started with OpenShift 4.0 Container Platform

I had a first look at OpenShift 4.0 and I wanted to share some information from what I have seen so far. The installation of the cluster is super easy and RedHat did a lot to improve the overall experience of the installation process to the previous OpenShift v3.x Ansible based installation and moving towards ephemeral cluster deployments.

There are a many changes under the hood and it’s not as obvious as Bootkube for the self-hosted/healing control-plane, MachineSets and the many internal operators to install and manage the OpenShift components ( api serverscheduler, controller manager, cluster-autoscalercluster-monitoringweb-consolednsingressnetworkingnode-tuning, and authentication ).

For the OpenShift 4.0 developer preview you need an RedHat account because you require a pull-secret for the cluster installation. For more information please visit: https://cloud.openshift.com/clusters/install

First we need to download the openshift-installer binary:

wget https://github.com/openshift/installer/releases/download/v0.16.1/openshift-install-linux-amd64
mv openshift-install-linux-amd64 openshift-install
chmod +x openshift-install

Then we create the install-configuration, it is required that you already have AWS account credentials and an Route53 DNS domain set-up:

$ ./openshift-install create install-config
INFO Platform aws
INFO AWS Access Key ID *********
INFO AWS Secret Access Key [? for help] *********
INFO Writing AWS credentials to "/home/centos/.aws/credentials" (https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html)
INFO Region eu-west-1
INFO Base Domain paas.domain.com
INFO Cluster Name cluster1
INFO Pull Secret [? for help] *********

Let’s look at the install-config.yaml

apiVersion: v1beta4
baseDomain: paas.domain.com
compute:
- name: worker
  platform: {}
  replicas: 3
controlPlane:
  name: master
  platform: {}
  replicas: 3
metadata:
  creationTimestamp: null
  name: ew1
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineCIDR: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: eu-west-1
pullSecret: '{"auths":{...}'

Now we can continue to create the OpenShift v4 cluster which takes around 30mins to complete. At the end of the openshift-installer you see the auto-generate credentials to connect to the cluster:

$ ./openshift-install create cluster
INFO Consuming "Install Config" from target directory
INFO Creating infrastructure resources...
INFO Waiting up to 30m0s for the Kubernetes API at https://api.cluster1.paas.domain.com:6443...
INFO API v1.12.4+0ba401e up
INFO Waiting up to 30m0s for the bootstrap-complete event...
INFO Destroying the bootstrap resources...
INFO Waiting up to 30m0s for the cluster at https://api.cluster1.paas.domain.com:6443 to initialize...
INFO Waiting up to 10m0s for the openshift-console route to be created...
INFO Install complete!
INFO Run 'export KUBECONFIG=/home/centos/auth/kubeconfig' to manage the cluster with 'oc', the OpenShift CLI.
INFO The cluster is ready when 'oc login -u kubeadmin -p jMTSJ-F6KYy-mVVZ4-QVNPP' succeeds (wait a few minutes).
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.cluster1.paas.domain.com
INFO Login to the console with user: kubeadmin, password: jMTSJ-F6KYy-mVVZ4-QVNPP

The web-console has a very clean new design which I really like in addition to all the great improvements.

Under administration -> cluster settings you can explore the new auto-upgrade functionality of OpenShift 4.0:

You choose the new version to upgrade and everything else happens in the background which is a massive improvement to OpenShift v3.x where you had to run the ansible installer for this.

In the background the cluster operator upgrades the different platform components one by one.

Slowly you will see that the components move to the new build version.

Finished cluster upgrade:

You can only upgrade from one version 4.0.0-0.9 to the next version 4.0.0-0.10. It is not possible to upgrade and go straight from x-0.9 to x-0.11.

But let’s deploy the Google Hipster Shop example and expose the frontend-external service for some more testing:

oc login -u kubeadmin -p jMTSJ-F6KYy-mVVZ4-QVNPP https://api.cluster1.paas.domain.com:6443 --insecure-skip-tls-verify=true
oc new-project myproject
oc create -f https://raw.githubusercontent.com/berndonline/openshift-ansible/master/examples/hipster-shop.yml
oc expose svc frontend-external

Getting the hostname for the exposed service:

$ oc get route
NAME                HOST/PORT                                                   PATH      SERVICES            PORT      TERMINATION   WILDCARD
frontend-external   frontend-external-myproject.apps.cluster1.paas.domain.com             frontend-external   http                    None

Use the browser to connect to our Hipster Shop:

It’s also very easy to destroy the cluster as it is to create it, as you seen previously:

$ ./openshift-install destroy cluster
INFO Disassociated                                 arn="arn:aws:ec2:eu-west-1:552276840222:route-table/rtb-083e2da5d1183efa7" id=rtbassoc-01d27db162fa45402
INFO Disassociated                                 arn="arn:aws:ec2:eu-west-1:552276840222:route-table/rtb-083e2da5d1183efa7" id=rtbassoc-057f593640067efc0
INFO Disassociated                                 arn="arn:aws:ec2:eu-west-1:552276840222:route-table/rtb-083e2da5d1183efa7" id=rtbassoc-05e821b451bead18f
INFO Disassociated                                 IAM instance profile="arn:aws:iam::552276840222:instance-profile/ocp4-bgx4c-worker-profile" arn="arn:aws:ec2:eu-west-1:552276840222:instance/i-0f64a911b1ffa3eff" id=i-0f64a911b1ffa3eff name=ocp4-bgx4c-worker-profile role=ocp4-bgx4c-worker-role
INFO Deleted                                       IAM instance profile="arn:aws:iam::552276840222:instance-profile/ocp4-bgx4c-worker-profile" arn="arn:aws:ec2:eu-west-1:552276840222:instance/i-0f64a911b1ffa3eff" id=i-0f64a911b1ffa3eff name=0xc00090f9a8
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:instance/i-0f64a911b1ffa3eff" id=i-0f64a911b1ffa3eff
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:instance/i-00b5eedc186ba26a7" id=i-00b5eedc186ba26a7
...
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:security-group/sg-016d4c7d435a1c97f" id=sg-016d4c7d435a1c97f
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:subnet/subnet-076348368858e9a82" id=subnet-076348368858e9a82
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:vpc/vpc-00c611ae1b9b8e10a" id=vpc-00c611ae1b9b8e10a
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:dhcp-options/dopt-0ce8b6a1c31e0ceac" id=dopt-0ce8b6a1c31e0ceac

The install experience is great for OpenShift 4.0 which makes it very easy for everyone to create and get started quickly with an enterprise container platform. From the operational perspective I still need to see how to run the new platform because all the operators are great and makes it an easy to use cluster but what happens when one of the operators goes rogue and debugging this I am most interested in.

Over the coming weeks I will look into more detail around OpenShift 4.0 and the different new features, I am especially interested in Service Mesh.

Typhoon Kubernetes Distribution

I stumbled across a very interesting Kubernetes distribution called Typhoon which runs a self-hosted control-plane using Bootkube running on CoreOS. Typhoon uses Terraform to deploy the required instances on various cloud providers or on bare-metal servers. I really like the concept of a minimal Kubernetes distribution and a simple bootstrap to deploy a full featured cluster in a few minutes. Check out the official Typhoon website or their Github repository for more information.

To install Typhoon I followed the documentation, everything is pretty simple with a bit of Terraform knowledge. Here’s my Github repository with my cluster configuration: https://github.com/berndonline/typhoon-kubernetes/tree/aws

Before you start you need to install Terraform v0.11.x and terraform-provider-ct, and setup a AWS Route53 domain for the Kubernetes cluster.

I created a new subdomain on Route53 and configured delegation on CloudFlare for the domain.

Let’s checkout the configuration, first the cluster.tf which I have modified slightly because I use Jenkins to deploy the Kubernetes cluster.

module "aws-cluster" {
  source = "git::https://github.com/poseidon/typhoon//aws/container-linux/kubernetes?ref=v1.13.3"

  providers = {
    aws = "aws.default"
    local = "local.default"
    null = "null.default"
    template = "template.default"
    tls = "tls.default"
  }

  # AWS
  cluster_name = "typhoon"
  dns_zone     = "${var.dns}"
  dns_zone_id  = "${var.dns_id}"

  # configuration
  ssh_authorized_key = "${var.ssh_key}"
  asset_dir          = "./.secrets/clusters/typhoon"

  # optional
  worker_count = 2
  worker_type  = "t3.small"
}

In the provider.tf I have only added S3 to be used for the Terraform backend state but otherwise I’ve left the defaults.

provider "aws" {
  version = "~> 1.13.0"
  alias   = "default"

  region  = "eu-west-1"
}

terraform {
  backend "s3" {
    bucket = "techbloc-terraform-data"
    key    = "openshift-311"
    region = "eu-west-1"
  }
}

...

I added a variables.tf file for the DNS and SSH variables.

variable "dns" {
}
variable "dns_id" {
}
variable "ssh_key" {
}

Let’s have a quick look at my simple Jenkins pipeline to deploy Typhoon Kubernetes. Apart from installing Kubernetes I am deploying the Nginx Ingress controller and  Heapster addons for the cluster. I’ve also added an example application I have used previously after deploying the cluster.

pipeline {
    agent any
    environment {
        AWS_ACCESS_KEY_ID = credentials('AWS_ACCESS_KEY_ID')
        AWS_SECRET_ACCESS_KEY = credentials('AWS_SECRET_ACCESS_KEY')
        TF_VAR_dns = credentials('TF_VAR_dns')
        TF_VAR_dns_id = credentials('TF_VAR_dns_id')
        TF_VAR_ssh_key = credentials('TF_VAR_ssh_key')
    }
    stages {
        stage('Prepare workspace') {
            steps {
                sh 'rm -rf *'
                git branch: 'aws', url: 'https://github.com/berndonline/typhoon-kubernetes.git'
                sh 'terraform init'
            }
        }
        stage('terraform apply') {
            steps {
                sshagent (credentials: ['fcdca8fa-aab9-3846-832f-4756392b7e2c']) {
                    sh 'terraform apply -auto-approve'
                    sh 'sleep 30'
                }
            }
        }
        stage('deploy nginx-ingress and heapster') {
            steps {    
                sh 'kubectl apply -R -f ./nginx-ingress/ --kubeconfig=./.secrets/clusters/typhoon/auth/kubeconfig'
                sh 'kubectl apply -R -f ./heapster/ --kubeconfig=./.secrets/clusters/typhoon/auth/kubeconfig'
                sh 'sleep 30'
            }
        }
        stage('deploy example application') {
            steps {    
                sh 'kubectl apply -f ./example/hello-kubernetes.yml --kubeconfig=./.secrets/clusters/typhoon/auth/kubeconfig'
            }
        }
        stage('Run terraform destroy') {
            steps {
                input 'Run terraform destroy?'
            }
        }
        stage('terraform destroy') {
            steps {
                sshagent (credentials: ['fcdca8fa-aab9-3846-832f-4756392b7e2c']) {
                    sh 'terraform destroy -force'
                }
            }
        }
    }
}

Let’s start the Jenkins pipeline:

Let’s check if I can access the hello-kubernetes application. For everyone who is interested, this is the link to the Github repository for the hello-kubernetes example application I have used.

I really like the Typhoon Kubernetes distribution and the work that went into it to create a easy way for everyone to install a Kubernetes cluster and start using it in a few minutes. I also find the way they’ve used Terraform and Bootkube to deploy the platform on CoreOS very inspiring and it gave me some ideas how I can make use of it for production clusters.

I actually like CoreOS and the easy bootstrapping with Terraform and Bootkube which I have not used before, I’ve always deployed OpenShift/Kubernetes on either RedHat or CentOS with Ansible, and find it a very interesting way to deploy a Kubernetes platform.