Kubernetes GitOps at Scale with Cluster API and Flux CD

What does GitOps mean and how you run this at scale with Kubernetes? GitOps is basically a framework that takes traditional DevOps practices which where used for application development and apply them to platform automation.

This is nothing new and some maybe have done similar type of automation in the past but this wasn’t called GitOps back then. Kubernetes is great because of it’s declarative configuration management which makes it very easy to configure. This can become a challenge when you suddenly have to run 5, 10, 20 or 40 of these clusters across various cloud providers and multiple environments. We need a cluster management system feeding configuration from a code repository to run all our Kubernetes “cattle” workload clusters.

What I am trying to achieve with this design; that you can easily horizontally scale not only your workload clusters but also your cluster management system which is versioned across multiple cloud providers like you see in the diagram above.

There is of course a technical problem to all of this, finding the right tools to solve the problem and which work well together. In my example I will use the Cluster API for provisioning and managing the lifecycle of these Kubernetes workload clusters. Then we need Flux CD for the configuration management both the cluster management which runs the Cluster API components but also the configuration for the workload clusters. The Cluster API you can also replace with OpenShift Hive to run instead OKD or RedHat OpenShift clusters.

Another problem we need to think about is version control and the branching model for the platform configuration. The structure of the configuration is important but also how you implement changes or the versioning of your configuration through releases. I highly recommend reading about Trunk Based Development which is a modern branching model and specifically solves the versioning problem for us.

Git repository and folder structure

We need a git repository for storing the platform configuration both for the management- and workload-clusters, and the tenant namespace configuration (this also can be stored in a separate repositories). Let’s go through the folder structure of the repository and I will explain this in more detail. Checkout my example repository for more detail: github.com/berndonline/k8s-gitops-at-scale.

  • The features folder on the top-level will store configuration for specific features we want to enable and apply to our clusters (both management and worker). Under each <feature name> you find two subfolders for namespace(d)- and cluster-wide (non-namespaced) configuration. Features are part of platform configuration which will be promoted between environments. You will see namespaced and non-namespaced subfolders throughout the folder structure which is basically to group your configuration files.
    ├── features
    │   ├── access-control
    │   │   └── non-namespaced
    │   ├── helloworld-operator
    │   │   ├── namespaced
    │   │   └── non-namespaced
    │   └── ingress-nginx
    │       ├── namespaced
    │       └── non-namespaced
    
  • The providers folder will store the configuration based on cloud provider <name> and the <version> of your cluster management. The version below the cloud provider folder is needed to be able to spin up new management clusters in the future. You can be creative with the folder structure and have management cluster per environment and/or instead of the version if required. The mgmt folder will store the configuration for the management cluster which includes manifests for Flux CD controllers, the Cluster API to spin-up workload clusters which are separated by cluster name and anything else you want to configure on your management cluster. The clusters folder will store configuration for all workload clusters separated based on <environment> and common (applies across multiple clusters in the same environment) and by <cluster name> (applies to a dedicated cluster).
    ├── providers
    │   └── aws
    │       └── v1
    │           ├── clusters
    │           │   ├── non-prod
    │           │   │   ├── common
    │           │   │   │   ├── namespaced
    │           │   │   │   │   └── non-prod-common
    │           │   │   │   └── non-namespaced
    │           │   │   │       └── non-prod-common
    │           │   │   └── non-prod-eu-west-1
    │           │   │       ├── namespaced
    │           │   │       │   └── non-prod-eu-west-1
    │           │   │       └── non-namespaced
    │           │   │           └── non-prod-eu-west-1
    │           │   └── prod
    │           │       ├── common
    │           │       │   ├── namespaced
    │           │       │   │   └── prod-common
    │           │       │   └── non-namespaced
    │           │       │       └── prod-common
    │           │       └── prod-eu-west-1
    │           │           ├── namespaced
    │           │           │   └── prod-eu-west-1
    │           │           └── non-namespaced
    │           │               └── prod-eu-west-1
    │           └── mgmt
    │               ├── namespaced
    │               │   ├── flux-system
    │               │   ├── non-prod-eu-west-1
    │               │   └── prod-eu-west-1
    │               └── non-namespaced
    │                   ├── non-prod-eu-west-1
    │                   └── prod-eu-west-1
    
  • The tenants folder will store the namespace configuration of the onboarded teams and is applied to our workload clusters. Similar to the providers folder tenants has subfolders based on the cloud provider <name> and below subfolders for common (applies across environments) and <environments> (applied to a dedicated environment) configuration. There you find the tenant namespace <name> and all the needed manifests to create and configure the namespace/s.
    └── tenants
        └── aws
            ├── common
            │   └── dummy
            ├── non-prod
            │   └── dummy
            └── prod
                └── dummy
    

Why do we need a common folder for tenants? The common folder will contain namespace configuration which will be promoted between the environments from non-prod to prod using a release but more about release and promotion you find more down below.

Configuration changes

Applying changes to your platform configuration has to follow the Trunk Based Development model of doing small incremental changes through feature branches.

Let’s look into an example change the our dummy tenant onboarding pull-request. You see that I checked-out a branch called “tenant-dummy” to apply my changes, then push and publish the branch in the repository to raised the pull-request.

Important is that your commit messages and pull-request name are following a strict naming convention.

I would also strongly recommend to squash your commit messages into the name of your pull-request. This will keep your git history clean.

This naming convention makes it easier later for auto-generating your release notes when you publish your release. Having the clean well formatted git history combined with your release notes nicely cross references your changes for to a particular release tag.

More about creating a release a bit later in this article.

GitOps configuration

The configuration from the platform repository gets pulled on the management cluster using different gitrepository resources following the main branch or a version tag.

$ kubectl get gitrepositories.source.toolkit.fluxcd.io -A
NAMESPACE     NAME      URL                                                    AGE   READY   STATUS
flux-system   main      ssh://[email protected]/berndonline/k8s-gitops-at-scale   2d    True    stored artifact for revision 'main/ee3e71efb06628775fa19e9664b9194848c6450e'
flux-system   release   ssh://[email protected]/berndonline/k8s-gitops-at-scale   2d    True    stored artifact for revision 'v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff'

The kustomization resources will then render and apply the configuration locally to the management cluster (diagram left-side) or remote clusters to our non-prod and prod workload clusters (diagram right-side) using the kubeconfig of the cluster created by the Cluster API stored during the bootstrap.

There are multiple kustomization resources to apply configuration based off the folder structure which I explained above. See the output below and checkout the repository for more details.

$ kubectl get kustomizations.kustomize.toolkit.fluxcd.io -A
NAMESPACE            NAME                          AGE   READY   STATUS
flux-system          feature-access-control        13h   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
flux-system          mgmt                          2d    True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   common                        21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   feature-access-control        21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   feature-helloworld-operator   21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   feature-ingress-nginx         21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   non-prod-eu-west-1            21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   tenants-common                21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   tenants-non-prod              21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
prod-eu-west-1       common                        15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
prod-eu-west-1       feature-access-control        15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
prod-eu-west-1       feature-helloworld-operator   15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
prod-eu-west-1       feature-ingress-nginx         15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
prod-eu-west-1       prod-eu-west-1                15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
prod-eu-west-1       tenants-common                15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
prod-eu-west-1       tenants-prod                  15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff

Release and promotion

The GitOps framework doesn’t explain about how to do promotion to higher environments and this is where the Trunk Based Development model comes in helpful together with the gitrepository resource to be able to pull a tagged version instead of a branch.

This allows us applying configuration first to lower environments to non-prod following the main branch, means pull-requests which are merged will be applied instantly. Configuration for higher environments to production requires to create a version tag and publish a release in the repository.

Why using a tag and not a release branch? A tag in your repository is a point in time snapshot of your configuration and can’t be easily modified which is required for creating the release. A branch on the other hand can be modified using pull-requests and you end up with lots of release branches which is less ideal.

To create a new version tag in the git repository I use the following commands:

$ git tag v0.0.3
$ git push origin --tags
Total 0 (delta 0), reused 0 (delta 0)
To github.com:berndonline/k8s-gitops-at-scale.git
* [new tag] v0.0.3 -> v0.0.3

This doesn’t do much after we pushed the new tag because the gitrepository release is set to v0.0.2 but I can see the new tag is available in the repository.

In the repository I can go to releases and click on “Draft a new release” and choose the new tag v0.0.3 I pushed previously.

The release notes you see below can be auto-generate from the pull-requests you merged between v0.0.2 and v0.0.3 by clicking “Generate release notes”. To finish this off save and publish the release.


The release is publish and release notes are visible to everyone which is great for product teams on your platform because they will get visibility about upcoming changes including their own modifications to namespace configuration.

Until now all the changes are applied to our lower non-prod environment following the main branch and for doing the promotion we need to raise a pull-request and update the gitrepository release the new version v0.0.3.

If you follow ITIL change procedures then this is the point where you would normally raise a change for merging your pull-request because this triggers the rollout of your configuration to production.

When the pull-request is merged the release gitrepository is updated by the kustomization resources through the main branch.

$ kubectl get gitrepositories.source.toolkit.fluxcd.io -A
NAMESPACE     NAME      URL                                           AGE   READY   STATUS
flux-system   main      ssh://[email protected]/berndonline/k8s-gitops   2d    True    stored artifact for revision 'main/83133756708d2526cca565880d069445f9619b70'
flux-system   release   ssh://[email protected]/berndonline/k8s-gitops   2d    True    stored artifact for revision 'v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8'

Shortly after the kustomization resources referencing the release will reconcile and automatically push down the new rendered configuration to the production clusters.

$ kubectl get kustomizations.kustomize.toolkit.fluxcd.io -A
NAMESPACE            NAME                          AGE   READY   STATUS
flux-system          feature-access-control        13h   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
flux-system          mgmt                          2d    True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   common                        31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   feature-access-control        31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   feature-helloworld-operator   31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   feature-ingress-nginx         31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   non-prod-eu-west-1            31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   tenants-common                31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   tenants-non-prod              31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
prod-eu-west-1       common                        26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
prod-eu-west-1       feature-access-control        26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
prod-eu-west-1       feature-helloworld-operator   26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
prod-eu-west-1       feature-ingress-nginx         26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
prod-eu-west-1       prod-eu-west-1                26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
prod-eu-west-1       tenants-common                26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
prod-eu-west-1       tenants-prod                  26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8

Why using Kustomize for managing the configuration and not Helm? I know the difficulties of managing these raw YAML manifests. Kustomize gets you going quick where with Helm there is a higher initial effort writing your Charts. In my next article I will focus specifically on Helm.

I showed a very simplistic example having a single cloud provider (aws) and a single management cluster but as you have seen you can easily add Azure or Google cloud providers in your configuration and scale horizontally. I think this is what makes Kubernetes and controllers like Flux CD great together that you don’t need to have complex pipelines or workflows to rollout and promote your changes completely pipeline-less.

 

Install OpenShift/OKD 4.9.x Single Node Cluster (SNO) using OpenShift Hive/ACM

I haven’t written much since the summer 2021 and I thought I start the New Year with a little update regarding OpenShift/OKD 4.9 Single Node cluster (SNO) installation. The single node type is not new because I have been using these All-in-One or Single Node clusters since OpenShift 3.x and it worked great until OpenShift 4.7. When RedHat released OpenShift 4.8 the single node installation stopped working because of issue with the control-plane because it expected three nodes for high availability and this installation method was possible till then but not officially supported by RedHat.

When the OpenShift 4.9 release was announced the single node installation method called SNO became a supported way for deploying OpenShift Edge clusters on  bare-metal or virtual machine using the RedHat Cloud Assisted Installer.

This opened the possibility again to install OpenShift/OKD 4.9 as a single node (SNO) on any cloud provider like AWS, GCP or Azure through the openshift-install command line utility or through OpenShift Hive / Advanced Cluster Management operator.

The install-config.yaml for a single node cluster is pretty much the same like for a normal cluster only that you change the worker node replicas to zero and control-plane (master) nodes to one. Make sure your instance size has minimum 8x vCPUs and 32 GB of memory.

---
apiVersion: v1
baseDomain: k8s.domain.com
compute:
- name: worker
  platform:
    aws:
      rootVolume:
        iops: 100
        size: 22
        type: gp2
      type: m5.2xlarge
  replicas: 0
controlPlane:
  name: master
  platform:
    aws:
      rootVolume:
        iops: 100
        size: 22
        type: gp2
      type: m5.2xlarge
  replicas: 1
metadata:
  creationTimestamp: null
  name: okd-eu-west-1
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineCIDR: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: eu-west-1
pullSecret: ""
sshKey: ""

I am using OpenShift Hive for installing the OKD 4.9 single node cluster which requires Kubernetes to run the Hive operator.

Create a install-config secret:

$ kubectl create secret generic install-config -n okd --from-file=install-config.yaml=./okd-sno-install-config.yaml 

In the ClusterDeployment you specify AWS credentials, reference the install-config and the release image for OKD 4.9. Here you can find the latest OKD release image tags: https://quay.io/repository/openshift/okd

---
apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
  creationTimestamp: null
  name: okd-eu-west-1
  namespace: okd
spec:
  baseDomain: k8s.domain.com
  clusterName: okd-eu-west-1
  controlPlaneConfig:
    servingCertificates: {}
  installed: false
  platform:
    aws:
      credentialsSecretRef:
        name: aws-creds
      region: eu-west-1
  provisioning:
    releaseImage: quay.io/openshift/okd:4.9.0-0.okd-2022-01-14-230113
    installConfigSecretRef:
      name: install-config
  pullSecretRef:
    name: pull-secret

Apply the cluster deployment and wait for Hive to install the OpenShift/OKD cluster.

$ kubectl apply -f ./okd-clusterdeployment.yaml 

The provision pod will output the messages from the openshift-install binary and the cluster will be finish the installation in around 35mins.

$ kubectl logs okd-eu-west-1-0-8vhnf-provision-qrjrg -c hive -f
time="2022-01-15T15:51:32Z" level=debug msg="Couldn't find install logs provider environment variable. Skipping."
time="2022-01-15T15:51:32Z" level=debug msg="checking for SSH private key" installID=m2zcxsds
time="2022-01-15T15:51:32Z" level=info msg="unable to initialize host ssh key" error="cannot configure SSH agent as SSH_PRIV_KEY_PATH is unset or empty" installID=m2zcxsds
time="2022-01-15T15:51:32Z" level=info msg="waiting for files to be available: [/output/openshift-install /output/oc]" installID=m2zcxsds
time="2022-01-15T15:51:32Z" level=info msg="found file" installID=m2zcxsds path=/output/openshift-install
time="2022-01-15T15:51:32Z" level=info msg="found file" installID=m2zcxsds path=/output/oc
time="2022-01-15T15:51:32Z" level=info msg="all files found, ready to proceed" installID=m2zcxsds
time="2022-01-15T15:51:35Z" level=info msg="copied /output/openshift-install to /home/hive/openshift-install" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=info msg="copied /output/oc to /home/hive/oc" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=info msg="copying install-config.yaml" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=info msg="copied /installconfig/install-config.yaml to /output/install-config.yaml" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=info msg="waiting for files to be available: [/output/.openshift_install.log]" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=info msg="cleaning up from past install attempts" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=warning msg="skipping cleanup as no infra ID set" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=debug msg="object does not exist" installID=m2zcxsds object=okd/okd-eu-west-1-0-8vhnf-admin-kubeconfig
time="2022-01-15T15:51:36Z" level=debug msg="object does not exist" installID=m2zcxsds object=okd/okd-eu-west-1-0-8vhnf-admin-password
time="2022-01-15T15:51:36Z" level=info msg="generating assets" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=info msg="running openshift-install create manifests" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=info msg="running openshift-install binary" args="[create manifests]" installID=m2zcxsds
time="2022-01-15T15:51:37Z" level=info msg="found file" installID=m2zcxsds path=/output/.openshift_install.log
time="2022-01-15T15:51:37Z" level=info msg="all files found, ready to proceed" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=debug msg="OpenShift Installer unreleased-master-5011-geb132dae953888e736c382f1176c799c0e1aa49e-dirty"
time="2022-01-15T15:51:36Z" level=debug msg="Built from commit eb132dae953888e736c382f1176c799c0e1aa49e"
time="2022-01-15T15:51:36Z" level=debug msg="Fetching Master Machines..."
time="2022-01-15T15:51:36Z" level=debug msg="Loading Master Machines..."
time="2022-01-15T15:51:36Z" level=debug msg="  Loading Cluster ID..."
time="2022-01-15T15:51:36Z" level=debug msg="    Loading Install Config..."
time="2022-01-15T15:51:36Z" level=debug msg="      Loading SSH Key..."
time="2022-01-15T15:51:36Z" level=debug msg="      Loading Base Domain..."

....

time="2022-01-15T16:14:17Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.9.0-0.okd-2022-01-14-230113"
time="2022-01-15T16:14:31Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.9.0-0.okd-2022-01-14-230113: 529 of 744 done (71% complete)"
time="2022-01-15T16:14:32Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.9.0-0.okd-2022-01-14-230113: 585 of 744 done (78% complete)"
time="2022-01-15T16:14:47Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.9.0-0.okd-2022-01-14-230113: 702 of 744 done (94% complete)"
time="2022-01-15T16:15:02Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.9.0-0.okd-2022-01-14-230113: 703 of 744 done (94% complete)"
time="2022-01-15T16:15:32Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.9.0-0.okd-2022-01-14-230113: 708 of 744 done (95% complete)"
time="2022-01-15T16:15:47Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.9.0-0.okd-2022-01-14-230113: 720 of 744 done (96% complete)"
time="2022-01-15T16:16:02Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.9.0-0.okd-2022-01-14-230113: 722 of 744 done (97% complete)"
time="2022-01-15T16:17:17Z" level=debug msg="Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, console, monitoring"
time="2022-01-15T16:18:02Z" level=debug msg="Cluster is initialized"
time="2022-01-15T16:18:02Z" level=info msg="Waiting up to 10m0s for the openshift-console route to be created..."
time="2022-01-15T16:18:02Z" level=debug msg="Route found in openshift-console namespace: console"
time="2022-01-15T16:18:02Z" level=debug msg="OpenShift console route is admitted"
time="2022-01-15T16:18:02Z" level=info msg="Install complete!"
time="2022-01-15T16:18:02Z" level=info msg="To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/output/auth/kubeconfig'"
time="2022-01-15T16:18:02Z" level=info msg="Access the OpenShift web-console here: https://console-openshift-console.apps.okd-eu-west-1.k8s.domain.com"
time="2022-01-15T16:18:02Z" level=debug msg="Time elapsed per stage:"
time="2022-01-15T16:18:02Z" level=debug msg="           cluster: 6m35s"
time="2022-01-15T16:18:02Z" level=debug msg="         bootstrap: 34s"
time="2022-01-15T16:18:02Z" level=debug msg="Bootstrap Complete: 12m46s"
time="2022-01-15T16:18:02Z" level=debug msg="               API: 4m2s"
time="2022-01-15T16:18:02Z" level=debug msg=" Bootstrap Destroy: 1m15s"
time="2022-01-15T16:18:02Z" level=debug msg=" Cluster Operators: 4m59s"
time="2022-01-15T16:18:02Z" level=info msg="Time elapsed: 26m13s"
time="2022-01-15T16:18:03Z" level=info msg="command completed successfully" installID=m2zcxsds
time="2022-01-15T16:18:03Z" level=info msg="saving installer output" installID=m2zcxsds
time="2022-01-15T16:18:03Z" level=info msg="install completed successfully" installID=m2zcxsds

Check the cluster deployment and get the kubeadmin password from the secret the Hive operator created during the installation and login to the web console:

$ kubectl get clusterdeployments
NAME            PLATFORM   REGION      CLUSTERTYPE   INSTALLED   INFRAID               VERSION   POWERSTATE   AGE
okd-eu-west-1   aws        eu-west-1                 true        okd-eu-west-1-l4g4n   4.9.0     Running      39m
$ kubectl get secrets okd-eu-west-1-0-8vhnf-admin-password -o jsonpath={.data.password} | base64 -d
EP5Fs-TZrKj-Vtst6-5GWZ9

The cluster details show that the control plane runs as single master node:

Your cluster has a single combined master/worker node:

These single node type clusters can be used in combination with OpenShift Hive ClusterPools to have an amount of pre-installed OpenShift/OKD clusters available for automated tests or as temporary development environment.

apiVersion: hive.openshift.io/v1
kind: ClusterPool
metadata:
  name: okd-eu-west-1-pool
  namespace: okd
spec:
  baseDomain: k8s.domain.com
  imageSetRef:
    name: 4.9.0-0.okd-2022-01-14
  installConfigSecretTemplateRef:
    name: install-config
  platform:
    aws:
      credentialsSecretRef:
        name: aws-creds
      region: eu-west-1
  pullSecretRef:
    name: pull-secret
  size: 3

The clusters are hibernating (shutdown) in the pool and will be powered on when you apply the ClusterClaim to allocate a cluster with a lifetime set to 8 hours. After 8 hours the cluster gets automatically deleted by the Hive operator.

apiVersion: hive.openshift.io/v1
kind: ClusterClaim
metadata:
  name: test-1
  namespace: okd
spec:
  clusterPoolName: okd-eu-west-1-pool
  lifetime: 8h

This sums up how to deploy a OpenShift/OKD 4.9 as single node cluster. I hope this article is helpful and leave a comment if you have questions.

Kubernetes Cluster API – Machine Health Check and AWS Spot instances

In my first article about the Kubernetes Cluster API and provisioning of AWS workload clusters I mentioned briefly configuring Machine Health Check for the data-place/worker nodes. The Cluster API also supports Machine Health Check for control-plane/master nodes and can automatically remediate any node issues by replacing and provision new instances. The configuration is the same, only the label selector is different for the node type.

Let’s take a look again at the Machine Health Check for data-plane/worker nodes, the selector label is set to nodepool: nodepool-0 to match the label which is configured in the MachineDeployment.

---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachineHealthCheck
metadata:
  name: cluster-1-node-unhealthy-5m
  namespace: k8s
spec:
  clusterName: cluster-1
  maxUnhealthy: 40%
  nodeStartupTimeout: 10m
  selector:
    matchLabels:
      nodepool: nodepool-0
  unhealthyConditions:
  - type: Ready
    status: Unknown
    timeout: 300s
  - type: Ready
    status: "False"
    timeout: 300s

To configure Machine Health Check for your control-plane/master add the label cluster.x-k8s.io/control-plane: “” as selector.

---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachineHealthCheck
metadata:
  name: cluster-1-master-unhealthy-5m
spec:
  clusterName: cluster-1
  maxUnhealthy: 30%
  selector:
    matchLabels:
      cluster.x-k8s.io/control-plane: ""
  unhealthyConditions:
    - type: Ready
      status: Unknown
      timeout: 300s
    - type: Ready
      status: "False"
      timeout: 300s

When both are applied you see the two node groups and the status of available nodes and expected/desired state.

$ kubectl get machinehealthcheck
NAME                                       MAXUNHEALTHY   EXPECTEDMACHINES   CURRENTHEALTHY
cluster-1-node-unhealthy-5m                40%            3                  3
cluster-1-master-unhealthy-5m              30%            3                  3

If you terminate one control- and data-plane node, the Machine Health Check identifies these after a couple of minutes and starts the remediation by provisioning new instances to replace the faulty ones. This takes around 5 to 10 min and your cluster is back into the desired state. The management cluster automatically repaired the workload cluster without manual intervention.

$ kubectl get machinehealthcheck
NAME                            MAXUNHEALTHY   EXPECTEDMACHINES   CURRENTHEALTHY
cluster-1-node-unhealthy-5m     40%            3                  2
cluster-1-master-unhealthy-5m   30%            3                  2

More information about Machine Health Check you can find in the Cluster API documentation.

However, a few ago, I didn’t test running the data-plane/worker nodes on AWS EC2 spot instances which is also supported option in the AWSMachineTemplate. Spot instances for control-plane nodes are not supported and don’t make sense because you need the master nodes available at all time.

Using spot instances can reduce the cost of running your workload cluster and you can see a cost saving of up to 60% – 70% compared to the on-demand price. Although AWS can reclaim these instance by terminating your spot instance at any point in time, they are reliable enough in combination with the Cluster API Machine Health Check that you could run production on spot instances with huge cost savings.

To use spot instances simply add the spotMarketOptions to the AWS Machine Template of the data-plane nodes and the Cluster API will automatically issue spot instance requests for these. If you don’t specify the maxPrice and leave this value blank, this will automatically put the on-demand price as max value for the requested instance type. It makes sense to leave this empty because you cannot be outbid if the marketplace of spot instance suddenly changes because of increasing compute demand.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSMachineTemplate
metadata:
  name: cluster-1-data-plane-0
  namespace: k8s
spec:
  template:
    spec:
      iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
      instanceType: t3.small
      sshKeyName: default
      spotMarketOptions:
        maxPrice: ""

In the AWS console you see the spot instance requests.

This is great in combination with the Machine Health Check that I explained earlier: if AWS suddenly does reclaim one or multiple of your spot instances, the Machine Health Check will automatically starts to remediate for these missing nodes by requesting new spot instance.

Kubernetes Cluster API – Provision workload clusters on AWS

The past few months I have been following the progress of the Kubernetes Cluster API which is part of the Kubernetes SIG (special interest group) Cluster-Lifecycle because they made good progress and wanted to try out the AWS provider version to deploy Kubeadm clusters. There are multiple infrastructure / cloud providers available which can be used, have a look at supported providers.

RedHat has based the Machine API Operator for the OpenShift 4 platform on the Kubernetes Cluster API and forked some of the cloud provider integrations but in OpenShift 4 this has a different use-case for the cluster to managed itself without the need of a central management cluster. I actually like RedHat’s concept and adaptation of the Cluster API and I hope we will see something similar in the upstream project.

Bootstrapping workload clusters are pretty straight forward but before we can start with deploying the workload cluster we need a central Kubernetes management cluster for running the Cluster API components for your selected cloud provider. In The Cluster API Book for example they use a KinD (Kubernetes in Docker) cluster to provision the workload clusters.

To deploy the Cluster API components you need the clusterctl (Cluster API) and clusterawsadm (Cluster API AWS Provider) command-line utilities.

curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/v0.3.14/clusterctl-linux-amd64 -o clusterctl
chmod +x ./clusterctl
sudo mv ./clusterctl /usr/local/bin/clusterctl
curl -L https://github.com/kubernetes-sigs/cluster-api-provider-aws/releases/download/v0.6.4/clusterawsadm-linux-amd64 -o clusterawsadm
chmod +x ./clusterawsadm
sudo mv ./clusterawsadm /usr/local/bin/clusterawsadm

Let’s start to prepare to initialise the management cluster. You need a AWS IAM service account and in my example I enabled the experimental features-gates for MachinePool and ClusterResourceSets before running clusterawsadm to apply the required AWS IAM configuration.

$ export AWS_ACCESS_KEY_ID='<-YOUR-ACCESS-KEY->'
$ export AWS_SECRET_ACCESS_KEY='<-YOUR-SECRET-ACCESS-KEY->'
$ export EXP_MACHINE_POOL=true
$ export EXP_CLUSTER_RESOURCE_SET=true
$ clusterawsadm bootstrap iam create-cloudformation-stack
Attempting to create AWS CloudFormation stack cluster-api-provider-aws-sigs-k8s-io
I1206 22:23:19.620891  357601 service.go:59] AWS Cloudformation stack "cluster-api-provider-aws-sigs-k8s-io" already exists, updating

Following resources are in the stack: 

Resource                  |Type                                                                                |Status
AWS::IAM::InstanceProfile |control-plane.cluster-api-provider-aws.sigs.k8s.io                                  |CREATE_COMPLETE
AWS::IAM::InstanceProfile |controllers.cluster-api-provider-aws.sigs.k8s.io                                    |CREATE_COMPLETE
AWS::IAM::InstanceProfile |nodes.cluster-api-provider-aws.sigs.k8s.io                                          |CREATE_COMPLETE
AWS::IAM::ManagedPolicy   |arn:aws:iam::552276840222:policy/control-plane.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE
AWS::IAM::ManagedPolicy   |arn:aws:iam::552276840222:policy/nodes.cluster-api-provider-aws.sigs.k8s.io         |CREATE_COMPLETE
AWS::IAM::ManagedPolicy   |arn:aws:iam::552276840222:policy/controllers.cluster-api-provider-aws.sigs.k8s.io   |CREATE_COMPLETE
AWS::IAM::Role            |control-plane.cluster-api-provider-aws.sigs.k8s.io                                  |CREATE_COMPLETE
AWS::IAM::Role            |controllers.cluster-api-provider-aws.sigs.k8s.io                                    |CREATE_COMPLETE
AWS::IAM::Role            |nodes.cluster-api-provider-aws.sigs.k8s.io                                          |CREATE_COMPLETE

This might take a few minutes before you can continue and run clusterctl to initialise the Cluster API components on your Kubernetes management cluster with the option –watching-namespace where you can apply the cluster deployment manifests.

$ export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm bootstrap credentials encode-as-profile)

WARNING: `encode-as-profile` should only be used for bootstrapping.

$ clusterctl init --infrastructure aws --watching-namespace k8s
Fetching providers
Installing cert-manager Version="v0.16.1"
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v0.3.14" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-aws" Version="v0.6.3" TargetNamespace="capa-system"

Your management cluster has been initialized successfully!

You can now create your first workload cluster by running the following:

  clusterctl config cluster [name] --kubernetes-version [version] | kubectl apply -f -

Now we have finished deploying the needed Cluster API components and are ready to create your first Kubernetes workload cluster. I go through the different custom resources and configuration options for the cluster provisioning. This starts with the cloud infrastructure configuration as you see in the example below for the VPC setup. You don’t have to use all three Availability Zone and can start with a single AZ in a region.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSCluster
metadata:
  name: cluster-1
  namespace: k8s
spec:
  region: eu-west-1
  sshKeyName: default
  networkSpec:
    vpc:
      cidrBlock: "10.0.0.0/23"
    subnets:
    - availabilityZone: eu-west-1a
      cidrBlock: "10.0.0.0/27"
      isPublic: true
    - availabilityZone: eu-west-1b
      cidrBlock: "10.0.0.32/27"
      isPublic: true
    - availabilityZone: eu-west-1c
      cidrBlock: "10.0.0.64/27"
      isPublic: true
    - availabilityZone: eu-west-1a
      cidrBlock: "10.0.1.0/27"
    - availabilityZone: eu-west-1b
      cidrBlock: "10.0.1.32/27"
    - availabilityZone: eu-west-1c
      cidrBlock: "10.0.1.64/27"

Alternatively you can also provision the workload cluster into an existing VPC, in this case your cloud infrastructure configuration looks slightly different and you need to specify VPC and subnet IDs.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSCluster
metadata:
  name: cluster-1
  namespace: k8s
spec:
  region: eu-west-1
  sshKeyName: default
  networkSpec:
    vpc:
      id: vpc-0425c335226437144
    subnets:
    - id: subnet-0261219d564bb0dc5
    - id: subnet-0fdcccba78668e013
...

Next we define the Kubeadm control-plane configuration and start with the AWS Machine Template to define the instance type and custom node configuration. Then follows the Kubeadm control-plane config referencing the machine template and amounts of replicas and Kubernetes control-plane version:

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSMachineTemplate
metadata:
  name: cluster-1
  namespace: k8s
spec:
  template:
    spec:
      iamInstanceProfile: control-plane.cluster-api-provider-aws.sigs.k8s.io
      instanceType: t3.small
      sshKeyName: default
---
apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
kind: KubeadmControlPlane
metadata:
  name: cluster-1-control-plane
  namespace: k8s
spec:
  infrastructureTemplate:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: AWSMachineTemplate
    name: cluster-1-control-plane
  kubeadmConfigSpec:
    clusterConfiguration:
      apiServer:
        extraArgs:
          cloud-provider: aws
      controllerManager:
        extraArgs:
          cloud-provider: aws
    initConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: aws
        name: '{{ ds.meta_data.local_hostname }}'
    joinConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: aws
        name: '{{ ds.meta_data.local_hostname }}'
  replicas: 1
  version: v1.20.4

We continue with the data-plane (worker) nodes which also starts with the AWS machine template, additionally we need a Kubeadm Config Template and then the Machine Deployment for the worker nodes with a number of replicas and used Kubernetes version.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSMachineTemplate
metadata:
  name: cluster-1-data-plane-0
  namespace: k8s
spec:
  template:
    spec:
      iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
      instanceType: t3.small
      sshKeyName: default
---
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
kind: KubeadmConfigTemplate
metadata:
  name: cluster-1-data-plane-0
  namespace: k8s
spec:
  template:
    spec:
      joinConfiguration:
        nodeRegistration:
          kubeletExtraArgs:
            cloud-provider: aws
          name: '{{ ds.meta_data.local_hostname }}'
---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachineDeployment
metadata:
  name: cluster-1-data-plane-0
  namespace: k8s
spec:
  clusterName: cluster-1
  replicas: 1
  selector:
    matchLabels: null
  template:
    metadata:
      labels:
        "nodepool": "nodepool-0"
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
          kind: KubeadmConfigTemplate
          name: cluster-1-data-plane-0
      clusterName: cluster-1
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
        kind: AWSMachineTemplate
        name: cluster-1-data-plane-0
      version: v1.20.4

A workload cluster can be very easily upgraded by changing the .spec.version in the MachineDeployment and KubeadmControlPlane configuration. You can’t jump over a Kubernetes versions and can only upgrade to the next available version example: v1.18.4 to v1.19.8 or v1.19.8 to v1.20.4. See the list of supported AMIs and Kubernetes versions for the AWS provider.

At the beginning we enabled the feature-gates when we were initialising the management cluster to allow us to use ClusterResourceSets. This is incredible useful because I can define a set of resources which gets applied during the provisioning of the cluster. This only get executed one time during the bootstrap and will be not reconciled afterwards. In the configuration you see the reference to two configmaps for adding the Calico CNI plugin and the Nginx Ingress controller.

---
apiVersion: addons.cluster.x-k8s.io/v1alpha3
kind: ClusterResourceSet
metadata:
  name: cluster-1-crs-0
  namespace: k8s
spec:
  clusterSelector:
    matchLabels:
      cluster.x-k8s.io/cluster-name: cluster-1
  resources:
  - kind: ConfigMap
    name: calico-cni
  - kind: ConfigMap
    name: nginx-ingress

Example of the two configmaps which contain the YAML manifests:

apiVersion: v1
kind: ConfigMap
metadata:
  creationTimestamp: null
  name: calico-cni
  namespace: k8s
data:
  calico.yaml: |+
    ---
    # Source: calico/templates/calico-config.yaml
    # This ConfigMap is used to configure a self-hosted Calico installation.
    kind: ConfigMap
    apiVersion: v1
    metadata:
      name: calico-config
      namespace: kube-system
...
---
apiVersion: v1
data:
  deploy.yaml: |+
    ---
    apiVersion: v1
    kind: Namespace
    metadata:
      name: ingress-nginx
      labels:
        app.kubernetes.io/name: ingress-nginx
        app.kubernetes.io/instance: ingress-nginx
...

Without ClusterResourceSet you would need to manually apply the CNI and ingress controller manifests which is not great because you need the CNI plugin for all nodes to go into Ready state.

$ kubectl --kubeconfig=./cluster-1.kubeconfig   apply -f https://docs.projectcalico.org/v3.15/manifests/calico.yaml
$ kubectl --kubeconfig=./cluster-1.kubeconfig apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.41.2/deploy/static/provider/aws/deploy.yaml

Finally after we have created the configuration of the workload cluster we can apply cluster manifest with the option for setting custom clusterNetwork and specify with service and pod IP range.

---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: Cluster
metadata:
  name: cluster-1
  namespace: k8s
  labels:
    cluster.x-k8s.io/cluster-name: cluster-1
spec:
  clusterNetwork:
    services:
      cidrBlocks:
      - 172.30.0.0/16
    pods:
      cidrBlocks:
      - 10.128.0.0/14
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
    kind: KubeadmControlPlane
    name: cluster-1-control-plane
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: AWSCluster
    name: cluster-1

The provisioning of the workload cluster will take around 10 to 15 mins and you can follow the progress by checking the status of different configurations we have applied previously.

You can scale both Kubeadm control-plane and MachineDeployment afterwards to change the size of your cluster. MachineDeployment can be scaled down to zero to save cost.

$ kubectl scale KubeadmControlPlane cluster-1-control-plane --replicas=1
$ kubectl scale MachineDeployment cluster-1-data-plane-0 --replicas=0

After the provisioning is completed you can get kubeconfig of the cluster from the secret which got created during the bootstrap:

$ kubectl --namespace=k8s get secret cluster-1-kubeconfig    -o jsonpath={.data.value} | base64 --decode    > cluster-1.kubeconfig

Example check the node state.

$ kubectl --kubeconfig=./cluster-1.kubeconfig get nodes

When your cluster is provisioned and nodes are in Ready state you can apply the MachineHealthCheck for the data-plane (worker) nodes. This automatically remediate unhealthy nodes and provisions new nodes to join them into the cluster.

---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachineHealthCheck
metadata:
  name: cluster-1-node-unhealthy-5m
  namespace: k8s
spec:
  # clusterName is required to associate this MachineHealthCheck with a particular cluster
  clusterName: cluster-1
  # (Optional) maxUnhealthy prevents further remediation if the cluster is already partially unhealthy
  maxUnhealthy: 40%
  # (Optional) nodeStartupTimeout determines how long a MachineHealthCheck should wait for
  # a Node to join the cluster, before considering a Machine unhealthy
  nodeStartupTimeout: 10m
  # selector is used to determine which Machines should be health checked
  selector:
    matchLabels:
      nodepool: nodepool-0 
  # Conditions to check on Nodes for matched Machines, if any condition is matched for the duration of its timeout, the Machine is considered unhealthy
  unhealthyConditions:
  - type: Ready
    status: Unknown
    timeout: 300s
  - type: Ready
    status: "False"
    timeout: 300s

I hope this is a useful article for getting started with the Kubernetes Cluster API.

OpenShift Hive – Deploy Single Node (All-in-One) OKD Cluster on AWS

The concept of a single-node or All-in-One OpenShift / Kubernetes cluster isn’t something new, years ago when I was working with OpenShift 3 and before that with native Kubernetes, we were using single-node clusters as ephemeral development environment, integrations testing for pull-request or platform releases. It was only annoying because this required complex Jenkins pipelines, provision the node first, then install prerequisites and run the openshift-ansible installer playbook. Not always reliable and not a great experience but it done the job.

This is possible as well with the new OpenShift/OKD 4 version and with the help from OpenShift Hive. The experience is more reliable and quicker than previously and I don’t need to worry about de-provisioning, I will let Hive delete the cluster after a few hours automatically.

It requires a few simple modifications in the install-config. You need to add the Availability Zone you want where the instance will be created. When doing this the VPC will only have two subnets, one public and one private subnet in eu-west-1. You can also install the single-node cluster into an existing VPC you just have to specify subnet ids. Change the compute worker node replicas zero and control-plane replicas to one. Make sure to have an instance size with enough CPU and memory for all OpenShift components because they need to fit onto the single node. The rest of the install-config is pretty much standard.

---
apiVersion: v1
baseDomain: k8s.domain.com
compute:
- name: worker
  platform:
    aws:
      zones:
      - eu-west-1a
      rootVolume:
        iops: 100
        size: 22
        type: gp2
      type: r4.xlarge
  replicas: 0
controlPlane:
  name: master
  platform:
    aws:
      zones:
      - eu-west-1a
      rootVolume:
        iops: 100
        size: 22
        type: gp2
      type: r5.2xlarge
  replicas: 1
metadata:
  creationTimestamp: null
  name: okd-aio
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineCIDR: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: eu-west-1
pullSecret: ""
sshKey: ""

Create a new install-config secret for the cluster.

kubectl create secret generic install-config-aio -n okd --from-file=install-config.yaml=./install-config-aio.yaml

We will be using OpenShift Hive for the cluster deployment because the provision is more simplified and Hive can also apply any configuration using SyncSets or SelectorSyncSets which is needed. Add the annotation hive.openshift.io/delete-after: “2h” and Hive will automatically delete the cluster after 4 hours.

---
apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
  creationTimestamp: null
  annotations:
    hive.openshift.io/delete-after: "2h"
  name: okd-aio 
  namespace: okd
spec:
  baseDomain: k8s.domain.com
  clusterName: okd-aio
  controlPlaneConfig:
    servingCertificates: {}
  installed: false
  platform:
    aws:
      credentialsSecretRef:
        name: aws-creds
      region: eu-west-1
  provisioning:
    releaseImage: quay.io/openshift/okd:4.5.0-0.okd-2020-07-14-153706-ga
    installConfigSecretRef:
      name: install-config-aio
  pullSecretRef:
    name: pull-secret
  sshKey:
    name: ssh-key
status:
  clusterVersionStatus:
    availableUpdates: null
    desired:
      force: false
      image: ""
      version: ""
    observedGeneration: 0
    versionHash: ""

Apply the cluster deployment to your clusters namespace.

kubectl apply -f  ./clusterdeployment-aio.yaml

This is slightly faster than provision 6 nodes cluster and will take around 30mins until your ephemeral test cluster is ready to use.