Kubernetes GitOps at Scale with Cluster API and Flux CD

What does GitOps mean and how you run this at scale with Kubernetes? GitOps is basically a framework that takes traditional DevOps practices which where used for application development and apply them to platform automation.

This is nothing new and some maybe have done similar type of automation in the past but this wasn’t called GitOps back then. Kubernetes is great because of it’s declarative configuration management which makes it very easy to configure. This can become a challenge when you suddenly have to run 5, 10, 20 or 40 of these clusters across various cloud providers and multiple environments. We need a cluster management system feeding configuration from a code repository to run all our Kubernetes “cattle” workload clusters.

What I am trying to achieve with this design; that you can easily horizontally scale not only your workload clusters but also your cluster management system which is versioned across multiple cloud providers like you see in the diagram above.

There is of course a technical problem to all of this, finding the right tools to solve the problem and which work well together. In my example I will use the Cluster API for provisioning and managing the lifecycle of these Kubernetes workload clusters. Then we need Flux CD for the configuration management both the cluster management which runs the Cluster API components but also the configuration for the workload clusters. The Cluster API you can also replace with OpenShift Hive to run instead OKD or RedHat OpenShift clusters.

Another problem we need to think about is version control and the branching model for the platform configuration. The structure of the configuration is important but also how you implement changes or the versioning of your configuration through releases. I highly recommend reading about Trunk Based Development which is a modern branching model and specifically solves the versioning problem for us.

Git repository and folder structure

We need a git repository for storing the platform configuration both for the management- and workload-clusters, and the tenant namespace configuration (this also can be stored in a separate repositories). Let’s go through the folder structure of the repository and I will explain this in more detail. Checkout my example repository for more detail: github.com/berndonline/k8s-gitops-at-scale.

  • The features folder on the top-level will store configuration for specific features we want to enable and apply to our clusters (both management and worker). Under each <feature name> you find two subfolders for namespace(d)- and cluster-wide (non-namespaced) configuration. Features are part of platform configuration which will be promoted between environments. You will see namespaced and non-namespaced subfolders throughout the folder structure which is basically to group your configuration files.
    ├── features
    │   ├── access-control
    │   │   └── non-namespaced
    │   ├── helloworld-operator
    │   │   ├── namespaced
    │   │   └── non-namespaced
    │   └── ingress-nginx
    │       ├── namespaced
    │       └── non-namespaced
    
  • The providers folder will store the configuration based on cloud provider <name> and the <version> of your cluster management. The version below the cloud provider folder is needed to be able to spin up new management clusters in the future. You can be creative with the folder structure and have management cluster per environment and/or instead of the version if required. The mgmt folder will store the configuration for the management cluster which includes manifests for Flux CD controllers, the Cluster API to spin-up workload clusters which are separated by cluster name and anything else you want to configure on your management cluster. The clusters folder will store configuration for all workload clusters separated based on <environment> and common (applies across multiple clusters in the same environment) and by <cluster name> (applies to a dedicated cluster).
    ├── providers
    │   └── aws
    │       └── v1
    │           ├── clusters
    │           │   ├── non-prod
    │           │   │   ├── common
    │           │   │   │   ├── namespaced
    │           │   │   │   │   └── non-prod-common
    │           │   │   │   └── non-namespaced
    │           │   │   │       └── non-prod-common
    │           │   │   └── non-prod-eu-west-1
    │           │   │       ├── namespaced
    │           │   │       │   └── non-prod-eu-west-1
    │           │   │       └── non-namespaced
    │           │   │           └── non-prod-eu-west-1
    │           │   └── prod
    │           │       ├── common
    │           │       │   ├── namespaced
    │           │       │   │   └── prod-common
    │           │       │   └── non-namespaced
    │           │       │       └── prod-common
    │           │       └── prod-eu-west-1
    │           │           ├── namespaced
    │           │           │   └── prod-eu-west-1
    │           │           └── non-namespaced
    │           │               └── prod-eu-west-1
    │           └── mgmt
    │               ├── namespaced
    │               │   ├── flux-system
    │               │   ├── non-prod-eu-west-1
    │               │   └── prod-eu-west-1
    │               └── non-namespaced
    │                   ├── non-prod-eu-west-1
    │                   └── prod-eu-west-1
    
  • The tenants folder will store the namespace configuration of the onboarded teams and is applied to our workload clusters. Similar to the providers folder tenants has subfolders based on the cloud provider <name> and below subfolders for common (applies across environments) and <environments> (applied to a dedicated environment) configuration. There you find the tenant namespace <name> and all the needed manifests to create and configure the namespace/s.
    └── tenants
        └── aws
            ├── common
            │   └── dummy
            ├── non-prod
            │   └── dummy
            └── prod
                └── dummy
    

Why do we need a common folder for tenants? The common folder will contain namespace configuration which will be promoted between the environments from non-prod to prod using a release but more about release and promotion you find more down below.

Configuration changes

Applying changes to your platform configuration has to follow the Trunk Based Development model of doing small incremental changes through feature branches.

Let’s look into an example change the our dummy tenant onboarding pull-request. You see that I checked-out a branch called “tenant-dummy” to apply my changes, then push and publish the branch in the repository to raised the pull-request.

Important is that your commit messages and pull-request name are following a strict naming convention.

I would also strongly recommend to squash your commit messages into the name of your pull-request. This will keep your git history clean.

This naming convention makes it easier later for auto-generating your release notes when you publish your release. Having the clean well formatted git history combined with your release notes nicely cross references your changes for to a particular release tag.

More about creating a release a bit later in this article.

GitOps configuration

The configuration from the platform repository gets pulled on the management cluster using different gitrepository resources following the main branch or a version tag.

$ kubectl get gitrepositories.source.toolkit.fluxcd.io -A
NAMESPACE     NAME      URL                                                    AGE   READY   STATUS
flux-system   main      ssh://[email protected]/berndonline/k8s-gitops-at-scale   2d    True    stored artifact for revision 'main/ee3e71efb06628775fa19e9664b9194848c6450e'
flux-system   release   ssh://[email protected]/berndonline/k8s-gitops-at-scale   2d    True    stored artifact for revision 'v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff'

The kustomization resources will then render and apply the configuration locally to the management cluster (diagram left-side) or remote clusters to our non-prod and prod workload clusters (diagram right-side) using the kubeconfig of the cluster created by the Cluster API stored during the bootstrap.

There are multiple kustomization resources to apply configuration based off the folder structure which I explained above. See the output below and checkout the repository for more details.

$ kubectl get kustomizations.kustomize.toolkit.fluxcd.io -A
NAMESPACE            NAME                          AGE   READY   STATUS
flux-system          feature-access-control        13h   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
flux-system          mgmt                          2d    True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   common                        21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   feature-access-control        21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   feature-helloworld-operator   21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   feature-ingress-nginx         21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   non-prod-eu-west-1            21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   tenants-common                21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
non-prod-eu-west-1   tenants-non-prod              21m   True    Applied revision: main/ee3e71efb06628775fa19e9664b9194848c6450e
prod-eu-west-1       common                        15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
prod-eu-west-1       feature-access-control        15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
prod-eu-west-1       feature-helloworld-operator   15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
prod-eu-west-1       feature-ingress-nginx         15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
prod-eu-west-1       prod-eu-west-1                15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
prod-eu-west-1       tenants-common                15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff
prod-eu-west-1       tenants-prod                  15m   True    Applied revision: v0.0.2/a5a5edd1194b629f6b41977483dca49aaad957ff

Release and promotion

The GitOps framework doesn’t explain about how to do promotion to higher environments and this is where the Trunk Based Development model comes in helpful together with the gitrepository resource to be able to pull a tagged version instead of a branch.

This allows us applying configuration first to lower environments to non-prod following the main branch, means pull-requests which are merged will be applied instantly. Configuration for higher environments to production requires to create a version tag and publish a release in the repository.

Why using a tag and not a release branch? A tag in your repository is a point in time snapshot of your configuration and can’t be easily modified which is required for creating the release. A branch on the other hand can be modified using pull-requests and you end up with lots of release branches which is less ideal.

To create a new version tag in the git repository I use the following commands:

$ git tag v0.0.3
$ git push origin --tags
Total 0 (delta 0), reused 0 (delta 0)
To github.com:berndonline/k8s-gitops-at-scale.git
* [new tag] v0.0.3 -> v0.0.3

This doesn’t do much after we pushed the new tag because the gitrepository release is set to v0.0.2 but I can see the new tag is available in the repository.

In the repository I can go to releases and click on “Draft a new release” and choose the new tag v0.0.3 I pushed previously.

The release notes you see below can be auto-generate from the pull-requests you merged between v0.0.2 and v0.0.3 by clicking “Generate release notes”. To finish this off save and publish the release.


The release is publish and release notes are visible to everyone which is great for product teams on your platform because they will get visibility about upcoming changes including their own modifications to namespace configuration.

Until now all the changes are applied to our lower non-prod environment following the main branch and for doing the promotion we need to raise a pull-request and update the gitrepository release the new version v0.0.3.

If you follow ITIL change procedures then this is the point where you would normally raise a change for merging your pull-request because this triggers the rollout of your configuration to production.

When the pull-request is merged the release gitrepository is updated by the kustomization resources through the main branch.

$ kubectl get gitrepositories.source.toolkit.fluxcd.io -A
NAMESPACE     NAME      URL                                           AGE   READY   STATUS
flux-system   main      ssh://[email protected]/berndonline/k8s-gitops   2d    True    stored artifact for revision 'main/83133756708d2526cca565880d069445f9619b70'
flux-system   release   ssh://[email protected]/berndonline/k8s-gitops   2d    True    stored artifact for revision 'v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8'

Shortly after the kustomization resources referencing the release will reconcile and automatically push down the new rendered configuration to the production clusters.

$ kubectl get kustomizations.kustomize.toolkit.fluxcd.io -A
NAMESPACE            NAME                          AGE   READY   STATUS
flux-system          feature-access-control        13h   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
flux-system          mgmt                          2d    True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   common                        31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   feature-access-control        31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   feature-helloworld-operator   31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   feature-ingress-nginx         31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   non-prod-eu-west-1            31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   tenants-common                31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
non-prod-eu-west-1   tenants-non-prod              31m   True    Applied revision: main/83133756708d2526cca565880d069445f9619b70
prod-eu-west-1       common                        26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
prod-eu-west-1       feature-access-control        26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
prod-eu-west-1       feature-helloworld-operator   26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
prod-eu-west-1       feature-ingress-nginx         26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
prod-eu-west-1       prod-eu-west-1                26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
prod-eu-west-1       tenants-common                26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8
prod-eu-west-1       tenants-prod                  26m   True    Applied revision: v0.0.3/ee3e71efb06628885fa19e9664b9198a8c6450e8

Why using Kustomize for managing the configuration and not Helm? I know the difficulties of managing these raw YAML manifests. Kustomize gets you going quick where with Helm there is a higher initial effort writing your Charts. In my next article I will focus specifically on Helm.

I showed a very simplistic example having a single cloud provider (aws) and a single management cluster but as you have seen you can easily add Azure or Google cloud providers in your configuration and scale horizontally. I think this is what makes Kubernetes and controllers like Flux CD great together that you don’t need to have complex pipelines or workflows to rollout and promote your changes completely pipeline-less.

 

OpenShift Hive v1.1.x – Latest updates & new features

Over a year has gone by since my first article about Getting started with OpenShift Hive and my talk at the RedHat OpenShift Gathering when the first stable OpenShift Hive v1 version got released. In between a lot has happened and OpenShift Hive v1.1.1 was released a few weeks ago. So I wanted to look into the new functionalities of OpenShift Hive.

  • Operator Lifecycle Manager (OLM) installation

Hive is now available through the Operator Hub community catalog and can be installed on both OpenShift or native Kubernetes cluster through the OLM. The install is straightforward by adding the operator-group and subscription manifests:

---
apiVersion: operators.coreos.com/v1alpha2
kind: OperatorGroup
metadata:
  name: operatorgroup
  namespace: hive
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: hive
  namespace: hive
spec:
  channel: alpha
  name: hive-operator
  source: operatorhubio-catalog
  sourceNamespace: olm

Alternatively the Hive subscription can be configured with a manual install plan. In this case the OLM will not automatically upgrade the Hive operator when a new version is released – I highly recommend this for production deployments!

---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: hive
  namespace: hive
spec:
  channel: alpha
  name: hive-operator
  installPlanApproval: Manual
  source: operatorhubio-catalog
  sourceNamespace: olm

After a few seconds you see an install plan being added.

$ k get installplan
NAME            CSV                    APPROVAL   APPROVED
install-9drmh   hive-operator.v1.1.0   Manual     false

Edit the install plan and set approved value to true – the OLM will start and install or upgrade the Hive operator automatically.

...
spec:
  approval: Manual
  approved: true
  clusterServiceVersionNames:
  - hive-operator.v1.1.0
  generation: 1
...

After the Hive operator is installed you need to apply the Hiveconfig object for the operator to install all of the needed Hive components. On non-OpenShift installs (native Kubernetes) you still need to generate Hiveadmission certificates for the admission controller pods to start otherwise they are missing the hiveadmission-serving-cert secret.

  • Hiveconfig – Velero backup and delete protection

There are a few small but also very useful changes in the Hiveconfig object. You can now enable the deleteProtection option which prevents administrators from accidental deletions of ClusterDeployments or SyncSets. Another great addition is that you can enable automatic configuration of Velero to backup your cluster namespaces, meaning you’re not required to configure backups separately.

---
apiVersion: hive.openshift.io/v1
kind: HiveConfig
metadata:
  name: hive
spec:
  logLevel: info
  targetNamespace: hive
  deleteProtection: enabled
  backup:
    velero:
      enabled: true
      namespace: velero

Backups are configured in the Velero namespace as specified in the Hiveconfig.

$ k get backups -n velero
NAME                              AGE
backup-okd-2021-03-26t11-57-32z   3h12m
backup-okd-2021-03-26t12-00-32z   3h9m
backup-okd-2021-03-26t12-35-44z   154m
backup-okd-2021-03-26t12-38-44z   151m
...

With the deletion protection enabled in the hiveconfig, the controller automatically adds the annotation hive.openshift.io/protected-delete: “true” to all resources and prevents these from accidental deletions:

$ k delete cd okd --wait=0
The ClusterDeployment "okd" is invalid: metadata.annotations.hive.openshift.io/protected-delete: Invalid value: "true": cannot delete while annotation is present
  • ClusterSync and Scaling Hive controller

To check applied resources through SyncSets and SelectorSyncSets, where Hive has previously used Syncsetnstance but these no longer exists. This now has move to ClusterSync to collect status information about applied resources:

$ k get clustersync okd -o yaml
apiVersion: hiveinternal.openshift.io/v1alpha1
kind: ClusterSync
metadata:
  name: okd
  namespace: okd
spec: {}
status:
  conditions:
  - lastProbeTime: "2021-03-26T16:13:57Z"
    lastTransitionTime: "2021-03-26T16:13:57Z"
    message: All SyncSets and SelectorSyncSets have been applied to the cluster
    reason: Success
    status: "False"
    type: Failed
  firstSuccessTime: "2021-03-26T16:13:57Z"
...

It is also possible to horizontally scale the Hive controller to change the synchronisation frequency for running larger OpenShift deployments.

---
apiVersion: hive.openshift.io/v1
kind: HiveConfig
metadata:
  name: hive
spec:
  logLevel: info
  targetNamespace: hive
  deleteProtection: enabled
  backup:
    velero:
      enabled: true
      namespace: velero
  controllersConfig:
    controllers:
    - config:
        concurrentReconciles: 10
        replicas: 3
      name: clustersync

Please checkout the scaling test script which I found in the Github repo, you can simulate fake clusters by adding the annotation “hive.openshift.io/fake-cluster=true” to your ClusterDeployment.

  • Hibernating clusters

RedHat introduced that you can hibernate (shutdown) clusters in OpenShift 4.5 when they are not needed and switch them easily back on when you need them. This is now possible with OpenShift Hive: you can hibernate and change the power state of a cluster deployment.

$ kubectl patch cd okd --type='merge' -p $'spec:\n powerState: Hibernating'

Checking the cluster deployment and power state change to stopping.

$ kubectl get cd
NAME   PLATFORM   REGION      CLUSTERTYPE   INSTALLED   INFRAID     VERSION   POWERSTATE   AGE
okd    aws        eu-west-1                 true        okd-jpqgb   4.7.0     Stopping     44m

After a couple of minutes the power state of the cluster nodes will change to hibernating.

$ kubectl get cd
NAME   PLATFORM   REGION      CLUSTERTYPE   INSTALLED   INFRAID     VERSION   POWERSTATE    AGE
okd    aws        eu-west-1                 true        okd-jpqgb   4.7.0     Hibernating   47m

In the AWS console you see the cluster instances as stopped.

When turning the cluster back online, change the power state in the cluster deployment to running.

$ kubectl patch cd okd --type='merge' -p $'spec:\n powerState: Running'

Again the power state changes to resuming.

$ kubectl get cd
NAME   PLATFORM   REGION      CLUSTERTYPE   INSTALLED   INFRAID     VERSION   POWERSTATE   AGE
okd    aws        eu-west-1                 true        okd-jpqgb   4.7.0     Resuming     49m

A few minutes later the cluster changes to running and is ready to use again.

$ k get cd
NAME   PLATFORM   REGION      CLUSTERTYPE   INSTALLED   INFRAID     VERSION   POWERSTATE   AGE
okd    aws        eu-west-1                 true        okd-jpqgb   4.7.0     Running      61m
  • Cluster pools

Cluster pools is something which came together with the hibernating feature which allows you to pre-provision OpenShift clusters without actually allocating them and after the provisioning they will hibernate until you claim a cluster. Again a nice feature and ideal use-case for ephemeral type development or integration test environments which allows you to have clusters ready to go to claim when needed and dispose them afterwards.

Create a ClusterPool custom resource which is similar to a cluster deployment.

apiVersion: hive.openshift.io/v1
kind: ClusterPool
metadata:
  name: okd-eu-west-1
  namespace: hive
spec:
  baseDomain: okd.domain.com
  imageSetRef:
    name: okd-4.7-imageset
  installConfigSecretTemplateRef: 
    name: install-config
  skipMachinePools: true
  platform:
    aws:
      credentialsSecretRef:
        name: aws-creds
      region: eu-west-1
  pullSecretRef:
    name: pull-secret
  size: 3

To claim a cluster from a pool, apply the ClusterClaim resource.

apiVersion: hive.openshift.io/v1
kind: ClusterClaim
metadata:
  name: okd-claim
  namespace: hive
spec:
  clusterPoolName: okd-eu-west-1
  lifetime: 8h

I haven’t tested this yet but will definitely start using this in the coming weeks. Have a look at the Hive documentation on using ClusterPool and ClusterClaim.

  • Cluster relocation

For me, having used OpenShift Hive for over one and half years to run OpenShift 4 cluster, this is a very useful functionality because at some point you might need to rebuild or move your management services to a new Hive cluster. The ClusterRelocator object gives you the option to do this.

$ kubectl create secret generic new-hive-cluster-kubeconfig -n hive --from-file=kubeconfig=./new-hive-cluster.kubeconfig

Create the ClusterRelocator object and specify the kubeconfig of the remote Hive cluster, and also add a clusterDeploymentSelector:

apiVersion: hive.openshift.io/v1
kind: ClusterRelocate
metadata:
  name: migrate
spec:
  kubeconfigSecretRef:
    namespace: hive
    name: new-hive-cluster-kubeconfig
  clusterDeploymentSelector:
    matchLabels:
      migrate: cluster

To move cluster deployments, add the label migrate=cluster to your OpenShift clusters you want to move.

$ kubectl label clusterdeployment okd migrate=cluster

The cluster deployment will move to the new Hive cluster and will be removed from the source Hive cluster without the de-provision. It’s important to keep in mind that you need to copy any other resources you need, such as secrets, syncsets, selectorsyncsets and syncidentiyproviders, before moving the clusters. Take a look at the Hive documentation for the exact steps.

  • Useful annotation

Pause SyncSets by adding the annotation “hive.openshift.io/syncset-pause=true” to the clusterdeployment which stops the reconcile of defined resources and great for troubleshooting.

In a cluster deployment you can set the option to preserve cluster on delete which allows the user to disconnect a cluster from Hive without de-provisioning it.

$ kubectl patch cd okd --type='merge' -p $'spec:\n preserveOnDelete: true'

This sums up the new features and functionalities you can use with the latest OpenShift Hive version.

Kubernetes Cluster API – Provision workload clusters on AWS

The past few months I have been following the progress of the Kubernetes Cluster API which is part of the Kubernetes SIG (special interest group) Cluster-Lifecycle because they made good progress and wanted to try out the AWS provider version to deploy Kubeadm clusters. There are multiple infrastructure / cloud providers available which can be used, have a look at supported providers.

RedHat has based the Machine API Operator for the OpenShift 4 platform on the Kubernetes Cluster API and forked some of the cloud provider integrations but in OpenShift 4 this has a different use-case for the cluster to managed itself without the need of a central management cluster. I actually like RedHat’s concept and adaptation of the Cluster API and I hope we will see something similar in the upstream project.

Bootstrapping workload clusters are pretty straight forward but before we can start with deploying the workload cluster we need a central Kubernetes management cluster for running the Cluster API components for your selected cloud provider. In The Cluster API Book for example they use a KinD (Kubernetes in Docker) cluster to provision the workload clusters.

To deploy the Cluster API components you need the clusterctl (Cluster API) and clusterawsadm (Cluster API AWS Provider) command-line utilities.

curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/v0.3.14/clusterctl-linux-amd64 -o clusterctl
chmod +x ./clusterctl
sudo mv ./clusterctl /usr/local/bin/clusterctl
curl -L https://github.com/kubernetes-sigs/cluster-api-provider-aws/releases/download/v0.6.4/clusterawsadm-linux-amd64 -o clusterawsadm
chmod +x ./clusterawsadm
sudo mv ./clusterawsadm /usr/local/bin/clusterawsadm

Let’s start to prepare to initialise the management cluster. You need a AWS IAM service account and in my example I enabled the experimental features-gates for MachinePool and ClusterResourceSets before running clusterawsadm to apply the required AWS IAM configuration.

$ export AWS_ACCESS_KEY_ID='<-YOUR-ACCESS-KEY->'
$ export AWS_SECRET_ACCESS_KEY='<-YOUR-SECRET-ACCESS-KEY->'
$ export EXP_MACHINE_POOL=true
$ export EXP_CLUSTER_RESOURCE_SET=true
$ clusterawsadm bootstrap iam create-cloudformation-stack
Attempting to create AWS CloudFormation stack cluster-api-provider-aws-sigs-k8s-io
I1206 22:23:19.620891  357601 service.go:59] AWS Cloudformation stack "cluster-api-provider-aws-sigs-k8s-io" already exists, updating

Following resources are in the stack: 

Resource                  |Type                                                                                |Status
AWS::IAM::InstanceProfile |control-plane.cluster-api-provider-aws.sigs.k8s.io                                  |CREATE_COMPLETE
AWS::IAM::InstanceProfile |controllers.cluster-api-provider-aws.sigs.k8s.io                                    |CREATE_COMPLETE
AWS::IAM::InstanceProfile |nodes.cluster-api-provider-aws.sigs.k8s.io                                          |CREATE_COMPLETE
AWS::IAM::ManagedPolicy   |arn:aws:iam::552276840222:policy/control-plane.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE
AWS::IAM::ManagedPolicy   |arn:aws:iam::552276840222:policy/nodes.cluster-api-provider-aws.sigs.k8s.io         |CREATE_COMPLETE
AWS::IAM::ManagedPolicy   |arn:aws:iam::552276840222:policy/controllers.cluster-api-provider-aws.sigs.k8s.io   |CREATE_COMPLETE
AWS::IAM::Role            |control-plane.cluster-api-provider-aws.sigs.k8s.io                                  |CREATE_COMPLETE
AWS::IAM::Role            |controllers.cluster-api-provider-aws.sigs.k8s.io                                    |CREATE_COMPLETE
AWS::IAM::Role            |nodes.cluster-api-provider-aws.sigs.k8s.io                                          |CREATE_COMPLETE

This might take a few minutes before you can continue and run clusterctl to initialise the Cluster API components on your Kubernetes management cluster with the option –watching-namespace where you can apply the cluster deployment manifests.

$ export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm bootstrap credentials encode-as-profile)

WARNING: `encode-as-profile` should only be used for bootstrapping.

$ clusterctl init --infrastructure aws --watching-namespace k8s
Fetching providers
Installing cert-manager Version="v0.16.1"
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v0.3.14" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-aws" Version="v0.6.3" TargetNamespace="capa-system"

Your management cluster has been initialized successfully!

You can now create your first workload cluster by running the following:

  clusterctl config cluster [name] --kubernetes-version [version] | kubectl apply -f -

Now we have finished deploying the needed Cluster API components and are ready to create your first Kubernetes workload cluster. I go through the different custom resources and configuration options for the cluster provisioning. This starts with the cloud infrastructure configuration as you see in the example below for the VPC setup. You don’t have to use all three Availability Zone and can start with a single AZ in a region.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSCluster
metadata:
  name: cluster-1
  namespace: k8s
spec:
  region: eu-west-1
  sshKeyName: default
  networkSpec:
    vpc:
      cidrBlock: "10.0.0.0/23"
    subnets:
    - availabilityZone: eu-west-1a
      cidrBlock: "10.0.0.0/27"
      isPublic: true
    - availabilityZone: eu-west-1b
      cidrBlock: "10.0.0.32/27"
      isPublic: true
    - availabilityZone: eu-west-1c
      cidrBlock: "10.0.0.64/27"
      isPublic: true
    - availabilityZone: eu-west-1a
      cidrBlock: "10.0.1.0/27"
    - availabilityZone: eu-west-1b
      cidrBlock: "10.0.1.32/27"
    - availabilityZone: eu-west-1c
      cidrBlock: "10.0.1.64/27"

Alternatively you can also provision the workload cluster into an existing VPC, in this case your cloud infrastructure configuration looks slightly different and you need to specify VPC and subnet IDs.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSCluster
metadata:
  name: cluster-1
  namespace: k8s
spec:
  region: eu-west-1
  sshKeyName: default
  networkSpec:
    vpc:
      id: vpc-0425c335226437144
    subnets:
    - id: subnet-0261219d564bb0dc5
    - id: subnet-0fdcccba78668e013
...

Next we define the Kubeadm control-plane configuration and start with the AWS Machine Template to define the instance type and custom node configuration. Then follows the Kubeadm control-plane config referencing the machine template and amounts of replicas and Kubernetes control-plane version:

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSMachineTemplate
metadata:
  name: cluster-1
  namespace: k8s
spec:
  template:
    spec:
      iamInstanceProfile: control-plane.cluster-api-provider-aws.sigs.k8s.io
      instanceType: t3.small
      sshKeyName: default
---
apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
kind: KubeadmControlPlane
metadata:
  name: cluster-1-control-plane
  namespace: k8s
spec:
  infrastructureTemplate:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: AWSMachineTemplate
    name: cluster-1-control-plane
  kubeadmConfigSpec:
    clusterConfiguration:
      apiServer:
        extraArgs:
          cloud-provider: aws
      controllerManager:
        extraArgs:
          cloud-provider: aws
    initConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: aws
        name: '{{ ds.meta_data.local_hostname }}'
    joinConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: aws
        name: '{{ ds.meta_data.local_hostname }}'
  replicas: 1
  version: v1.20.4

We continue with the data-plane (worker) nodes which also starts with the AWS machine template, additionally we need a Kubeadm Config Template and then the Machine Deployment for the worker nodes with a number of replicas and used Kubernetes version.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSMachineTemplate
metadata:
  name: cluster-1-data-plane-0
  namespace: k8s
spec:
  template:
    spec:
      iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
      instanceType: t3.small
      sshKeyName: default
---
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
kind: KubeadmConfigTemplate
metadata:
  name: cluster-1-data-plane-0
  namespace: k8s
spec:
  template:
    spec:
      joinConfiguration:
        nodeRegistration:
          kubeletExtraArgs:
            cloud-provider: aws
          name: '{{ ds.meta_data.local_hostname }}'
---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachineDeployment
metadata:
  name: cluster-1-data-plane-0
  namespace: k8s
spec:
  clusterName: cluster-1
  replicas: 1
  selector:
    matchLabels: null
  template:
    metadata:
      labels:
        "nodepool": "nodepool-0"
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
          kind: KubeadmConfigTemplate
          name: cluster-1-data-plane-0
      clusterName: cluster-1
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
        kind: AWSMachineTemplate
        name: cluster-1-data-plane-0
      version: v1.20.4

A workload cluster can be very easily upgraded by changing the .spec.version in the MachineDeployment and KubeadmControlPlane configuration. You can’t jump over a Kubernetes versions and can only upgrade to the next available version example: v1.18.4 to v1.19.8 or v1.19.8 to v1.20.4. See the list of supported AMIs and Kubernetes versions for the AWS provider.

At the beginning we enabled the feature-gates when we were initialising the management cluster to allow us to use ClusterResourceSets. This is incredible useful because I can define a set of resources which gets applied during the provisioning of the cluster. This only get executed one time during the bootstrap and will be not reconciled afterwards. In the configuration you see the reference to two configmaps for adding the Calico CNI plugin and the Nginx Ingress controller.

---
apiVersion: addons.cluster.x-k8s.io/v1alpha3
kind: ClusterResourceSet
metadata:
  name: cluster-1-crs-0
  namespace: k8s
spec:
  clusterSelector:
    matchLabels:
      cluster.x-k8s.io/cluster-name: cluster-1
  resources:
  - kind: ConfigMap
    name: calico-cni
  - kind: ConfigMap
    name: nginx-ingress

Example of the two configmaps which contain the YAML manifests:

apiVersion: v1
kind: ConfigMap
metadata:
  creationTimestamp: null
  name: calico-cni
  namespace: k8s
data:
  calico.yaml: |+
    ---
    # Source: calico/templates/calico-config.yaml
    # This ConfigMap is used to configure a self-hosted Calico installation.
    kind: ConfigMap
    apiVersion: v1
    metadata:
      name: calico-config
      namespace: kube-system
...
---
apiVersion: v1
data:
  deploy.yaml: |+
    ---
    apiVersion: v1
    kind: Namespace
    metadata:
      name: ingress-nginx
      labels:
        app.kubernetes.io/name: ingress-nginx
        app.kubernetes.io/instance: ingress-nginx
...

Without ClusterResourceSet you would need to manually apply the CNI and ingress controller manifests which is not great because you need the CNI plugin for all nodes to go into Ready state.

$ kubectl --kubeconfig=./cluster-1.kubeconfig   apply -f https://docs.projectcalico.org/v3.15/manifests/calico.yaml
$ kubectl --kubeconfig=./cluster-1.kubeconfig apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.41.2/deploy/static/provider/aws/deploy.yaml

Finally after we have created the configuration of the workload cluster we can apply cluster manifest with the option for setting custom clusterNetwork and specify with service and pod IP range.

---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: Cluster
metadata:
  name: cluster-1
  namespace: k8s
  labels:
    cluster.x-k8s.io/cluster-name: cluster-1
spec:
  clusterNetwork:
    services:
      cidrBlocks:
      - 172.30.0.0/16
    pods:
      cidrBlocks:
      - 10.128.0.0/14
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
    kind: KubeadmControlPlane
    name: cluster-1-control-plane
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: AWSCluster
    name: cluster-1

The provisioning of the workload cluster will take around 10 to 15 mins and you can follow the progress by checking the status of different configurations we have applied previously.

You can scale both Kubeadm control-plane and MachineDeployment afterwards to change the size of your cluster. MachineDeployment can be scaled down to zero to save cost.

$ kubectl scale KubeadmControlPlane cluster-1-control-plane --replicas=1
$ kubectl scale MachineDeployment cluster-1-data-plane-0 --replicas=0

After the provisioning is completed you can get kubeconfig of the cluster from the secret which got created during the bootstrap:

$ kubectl --namespace=k8s get secret cluster-1-kubeconfig    -o jsonpath={.data.value} | base64 --decode    > cluster-1.kubeconfig

Example check the node state.

$ kubectl --kubeconfig=./cluster-1.kubeconfig get nodes

When your cluster is provisioned and nodes are in Ready state you can apply the MachineHealthCheck for the data-plane (worker) nodes. This automatically remediate unhealthy nodes and provisions new nodes to join them into the cluster.

---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachineHealthCheck
metadata:
  name: cluster-1-node-unhealthy-5m
  namespace: k8s
spec:
  # clusterName is required to associate this MachineHealthCheck with a particular cluster
  clusterName: cluster-1
  # (Optional) maxUnhealthy prevents further remediation if the cluster is already partially unhealthy
  maxUnhealthy: 40%
  # (Optional) nodeStartupTimeout determines how long a MachineHealthCheck should wait for
  # a Node to join the cluster, before considering a Machine unhealthy
  nodeStartupTimeout: 10m
  # selector is used to determine which Machines should be health checked
  selector:
    matchLabels:
      nodepool: nodepool-0 
  # Conditions to check on Nodes for matched Machines, if any condition is matched for the duration of its timeout, the Machine is considered unhealthy
  unhealthyConditions:
  - type: Ready
    status: Unknown
    timeout: 300s
  - type: Ready
    status: "False"
    timeout: 300s

I hope this is a useful article for getting started with the Kubernetes Cluster API.

Mozilla SOPS and GitOps Toolkit (Flux CD v2) to decrypt and apply Kubernetes secrets

Using GitOps way of working and tools like the GitOps toolkit (Flux CD v2) is great for applying configuration to your Kubernetes clusters but what about secrets and how can you store them securely in your repository? The perfect tool for this is Mozilla’s SOPS which uses a cloud based KMS, HashiCorp Vault or a PGP key to encrypt and decrypt your secrets and store them in encrypted form with the rest of your configuration in a code repostory. There is a guide in the Flux documentation about how to use SOPS but I did this slightly differently with a Google Cloud KMS.

Start by downloading the latest version of the Mozilla SOPS command-line binary. This is what makes SOPS so easy to use, there is not much you need to encrypt or decrypt secrets apart for an KMS system or a simple PGP key.

sudo wget -O /usr/local/bin/sops https://github.com/mozilla/sops/releases/download/v3.7.1/sops-v3.7.1.linux
sudo chmod 755 /usr/local/bin/sops

Next create the Google Cloud KMS, which I am using in my example.

$ gcloud auth application-default login
Go to the following link in your browser:

    https://accounts.google.com/o/oauth2/auth?code_challenge=xxxxxxx&prompt=select_account&code_challenge_method=S256&access_type=offline&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code&client_id=xxxxxxxxx-xxxxxxxxxxxxxxxxxx.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Faccounts.reauth


Enter verification code: xxxxxxxxxxxxxxx

Credentials saved to file: [/home/ubuntu/.config/gcloud/application_default_credentials.json]

These credentials will be used by any library that requests Application Default Credentials (ADC).
$ gcloud kms keyrings create sops --location global
$ gcloud kms keys create sops-key --location global --keyring sops --purpose encryption
$ gcloud kms keys list --location global --keyring sops
NAME                                                                           PURPOSE          ALGORITHM                    PROTECTION_LEVEL  LABELS  PRIMARY_ID  PRIMARY_STATE
projects/kubernetes-xxxxxx/locations/global/keyRings/sops/cryptoKeys/sops-key  ENCRYPT_DECRYPT  GOOGLE_SYMMETRIC_ENCRYPTION  SOFTWARE                  1           ENABLED

To encrypt secrets you need to create a .sops.yaml file in root your code repository.

creation_rules:
  - path_regex: \.yaml$
    gcp_kms: projects/kubernetes-xxxxxx/locations/global/keyRings/sops/cryptoKeys/sops-key
    encrypted_regex: ^(data|stringData)$

Let’s create a simple Kubernetes secret for testing.

$ cat secret.yaml
---
apiVersion: v1
kind: Secret
metadata:
  name: mysecret
  namespace: default
type: Opaque
data:
  username: YWRtaW4=
  password: MWYyZDFlMmU2N2Rm

Encrypt your secret.yaml using SOPS with the following example.

$ sops -e secret.yaml
apiVersion: v1
kind: Secret
metadata:
    name: mysecret
    namespace: default
type: Opaque
data:
    username: ENC[AES256_GCM,data:<-HASH->,type:str]
    password: ENC[AES256_GCM,data:<-HASH->,type:str]
sops:
    kms: []
    gcp_kms:
    -   resource_id: projects/kubernetes-xxxxxx/locations/global/keyRings/sops/cryptoKeys/sops-key
        created_at: '2021-03-01T17:25:29Z'
        enc: <-HASH->
    azure_kv: []
    lastmodified: '2021-03-01T17:25:29Z'
    mac: ENC[AES256_GCM,data:<-HASH->,type:str]
    pgp: []
    encrypted_regex: ^(data|stringData)$
    version: 3.5.0

Alternatively you can encrypt and replace the file in-place.

$ sops -i -e secret.yaml

To decrypt the yaml file use sops -d or replace in-place using sops -i -d.

$ sops -d secret.yaml
apiVersion: v1
kind: Secret
metadata:
    name: mysecret
    namespace: default
type: Opaque
data:
    username: YWRtaW4=
    password: MWYyZDFlMmU2N2Rm

You can also edit an encrypted file with the default terminal editor by directly using the sops command without any options.

$ sops secret.yaml
File has not changed, exiting.

Let’s use the Flux CD Kustomize controller for this to decrypt Kubernetes secrets and apply to the specified namespace. First you need to create a GCP service account for Flux and grant the permission to decrypt.

Download the GCP json authentication file for the service account and create a new secret in the Flux namespace.

$ kubectl create secret generic gcp-auth -n gotk-system --from-file=./sops-gcp
$ kubectl get secrets -n gotk-system gcp-auth -o yaml
apiVersion: v1
data:
  sops-gcp: <-BASE64-ENCODED-GCP-AUTH-JSON->
kind: Secret
metadata:
  creationTimestamp: "2021-03-01T17:34:11Z"
  name: gcp-auth
  namespace: gotk-system
  resourceVersion: "1879000"
  selfLink: /api/v1/namespaces/gotk-system/secrets/gcp-auth
  uid: 10a14c1f-19a6-41a2-8610-694b12efefee
type: Opaque

You need to update the kustomize-controller deployment and add the volume mount for the sops GCP secret and the environment variable with the value where to find the Google application credential file. This is where my example is different to what is documented because I am not using integrated cloud authentication because my cluster is running locally.

...
    spec:
      containers:
...
      - env:
        - name: GOOGLE_APPLICATION_CREDENTIALS
          value: /tmp/.gcp/credentials/sops-gcp
        name: manager
        volumeMounts:
        - mountPath: /tmp/.gcp/credentials
          name: sops-gcp
          readOnly: true
      volumes:
      - name: sops-gcp
        secret:
          defaultMode: 420
          secretName: sops-gcp
...

In the Kustomize object you enable the sops decryption provider and the controller automatically decrypts and applies secrets in the next reconcile loop.

apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
  name: cluster
  namespace: gotk-system
spec:
  decryption:
    provider: sops
  interval: 5m0s
  path: ./clusters/cluster-dev
  prune: true
  sourceRef:
    kind: GitRepository
    name: github-source

This takes a few minutes until the sync is completed and we can find out if the example secret got created correctly.

$ kubectl get secrets mysecret -n default -o yaml
apiVersion: v1
data:
  password: MWYyZDFlMmU2N2Rm
  username: YWRtaW4=
kind: Secret
metadata:
  name: mysecret
  namespace: default
  resourceVersion: "3439293"
  selfLink: /api/v1/namespaces/default/secrets/mysecret
  uid: 4a009675-3c89-448b-bb86-6211cec3d4ea
type: Opaque

This is how to use SOPS and Flux CD v2 to decrypt and apply Kubernetes secrets using GitOps.

New Kubernetes GitOps Toolkit – Flux CD v2

I have been using the Flux CD operator for a few month to manage Kubernetes clusters in dev and prod and it is a great tool. When I initially reviewed Flux the first time back then, I liked it because of its simplicity but it was missing some important features such as the possibility to synchronise based on tags instead of a single branch, and configuring the Flux operator through the deployment wasn’t as good and intuitive, and caused some headaches.

A few days ago I stumbled across the new Flux CD GitOps Toolkit and it got my attention when I saw the new Flux v2 operator architecture. They’ve split the operator functions into three controller and using CRDs to configure Source, Kustomize and Helm configuration:

The feature which I was really waiting for was the support for Semantic Versioning semver in your GitRepository source. With this I am able to create platform releases, and can separate non-prod and prod clusters better which makes the deployment of configuration more controlled and flexible than previously with Flux v1.

You can see below the different release versions I’ve created in my cluster management repository:

The following two GitRepository examples; the first one syncs based on a static release tag 0.0.1 and the second syncs within a Semantic version range >=0.0.1 <0.1.0:

---
apiVersion: source.toolkit.fluxcd.io/v1alpha1
kind: GitRepository
metadata:
  creationTimestamp: null
  name: gitops-system
  namespace: gitops-system
spec:
  interval: 1m0s
  ref:
    tag: 0.0.1
  secretRef:
    name: gitops-system
  url: ssh://github.com/berndonline/gitops-toolkit
status: {}
---
apiVersion: source.toolkit.fluxcd.io/v1alpha1
kind: GitRepository
metadata:
  creationTimestamp: null
  name: gitops-system
  namespace: gitops-system
spec:
  interval: 1m0s
  ref:
    semver: '>=0.0.1 <0.1.0'
  secretRef:
    name: gitops-system
  url: ssh://github.com/berndonline/gitops-toolkit
status: {}

There are improvements for the Kustomize configuration to add additional overlays depending on your repository folder structure or combine this with another GitRepository source. In my example repository I have a cluster folder cluster-dev and a folder for common configuration:

.
|____cluster-dev
| |____kustomization.yaml
| |____hello-world_base
| | |____kustomization.yaml
| | |____deploy.yaml
|____common
  |____kustomization.yaml
  |____nginx-service.yaml
  |____nginx_base
    |____kustomization.yaml
    |____service.yaml
    |____nginx.yaml

You can add multiple Kustomize custom resources as you can see in my examples, one for the cluster specific config and a second one for the common configuration with can be applied to multiple clusters:

---
apiVersion: kustomize.toolkit.fluxcd.io/v1alpha1
kind: Kustomization
metadata:
  creationTimestamp: null
  name: cluster-conf
  namespace: gitops-system
spec:
  interval: 5m0s
  path: ./cluster-dev
  prune: true
  sourceRef:
    kind: GitRepository
    name: gitops-system
status: {}
---
apiVersion: kustomize.toolkit.fluxcd.io/v1alpha1
kind: Kustomization
metadata:
  creationTimestamp: null
  name: common-con
  namespace: gitops-system
spec:
  interval: 5m0s
  path: ./common
  prune: true
  sourceRef:
    kind: GitRepository
    name: gitops-system
status: {}

Let’s install the Flux CD GitOps Toolkit. The toolkit comes again with its own command-line utility tk which you use to install and configure the operator . You find available CLI versions on the Github release page.

Set up a  new repository to store you k8s configuration:

$ git clone ssh://github.com/berndonline/gitops-toolkit
$ cd gitops-toolkit
$ mkdir -p ./cluster-dev/gitops-system

Generate the GitOps Toolkit manifests and store under gitops-system folder, afterwards apply the configuration to your k8s cluster:

$ tk install --version=latest \
    --export > ./cluster-dev/gitops-system/toolkit-components.yaml
$ kubectl apply -f ./cluster-dev/gitops-system/toolkit-components.yaml 
namespace/gitops-system created
customresourcedefinition.apiextensions.k8s.io/alerts.notification.toolkit.fluxcd.io created
customresourcedefinition.apiextensions.k8s.io/gitrepositories.source.toolkit.fluxcd.io created
customresourcedefinition.apiextensions.k8s.io/helmcharts.source.toolkit.fluxcd.io created
customresourcedefinition.apiextensions.k8s.io/helmreleases.helm.toolkit.fluxcd.io created
customresourcedefinition.apiextensions.k8s.io/helmrepositories.source.toolkit.fluxcd.io created
customresourcedefinition.apiextensions.k8s.io/kustomizations.kustomize.toolkit.fluxcd.io created
customresourcedefinition.apiextensions.k8s.io/providers.notification.toolkit.fluxcd.io created
customresourcedefinition.apiextensions.k8s.io/receivers.notification.toolkit.fluxcd.io created
role.rbac.authorization.k8s.io/crd-controller-gitops-system created
rolebinding.rbac.authorization.k8s.io/crd-controller-gitops-system created
clusterrolebinding.rbac.authorization.k8s.io/cluster-reconciler-gitops-system created
service/notification-controller created
service/source-controller created
service/webhook-receiver created
deployment.apps/helm-controller created
deployment.apps/kustomize-controller created
deployment.apps/notification-controller created
deployment.apps/source-controller created
networkpolicy.networking.k8s.io/deny-ingress created

Check if all the pods are running and use the command tk check to see if the toolkit is working correctly:

$ kubectl get pod -n gitops-system
NAME                                       READY   STATUS    RESTARTS   AGE
helm-controller-64f846df8c-g4mhv           1/1     Running   0          19s
kustomize-controller-6d9745c8cd-n8tth      1/1     Running   0          19s
notification-controller-587c49f7fc-ldcg2   1/1     Running   0          18s
source-controller-689dcd8bd7-rzp55         1/1     Running   0          18s
$ tk check
► checking prerequisites
✔ kubectl 1.18.3 >=1.18.0
✔ Kubernetes 1.18.6 >=1.16.0
► checking controllers
✔ source-controller is healthy
✔ kustomize-controller is healthy
✔ helm-controller is healthy
✔ notification-controller is healthy
✔ all checks passed

Now you can create a GitRepository custom resource, it will generate a ssh key local and displays the public key which you need to add to your repository deploy keys:

$ tk create source git gitops-system \
  --url=ssh://github.com/berndonline/gitops-toolkit \ 
  --ssh-key-algorithm=ecdsa \
  --ssh-ecdsa-curve=p521 \
  --branch=master \
  --interval=1m
► generating deploy key pair
ecdsa-sha2-nistp521 xxxxxxxxxxx
Have you added the deploy key to your repository: y
► collecting preferred public key from SSH server
✔ collected public key from SSH server:
github.com ssh-rsa xxxxxxxxxxx
► applying secret with keys
✔ authentication configured
✚ generating source
► applying source
✔ source created
◎ waiting for git sync
✗ git clone error: remote repository is empty

Continue with adding the Kustomize configuration:

$ tk create kustomization gitops-system \
  --source=gitops-system \
  --path="./cluster-dev" \
  --prune=true \
  --interval=5m
✚ generating kustomization
► applying kustomization
✔ kustomization created
◎ waiting for kustomization sync
✗ Source is not ready

Afterwards you can add your Kubernetes manifests to your repository and the operator will start synchronising the repository and apply the configuration which you’ve defined.

You can export the Source and Kustomize configuration:

$ tk export source git gitops-system \
 > ./cluster-dev/gitops-system/toolkit-source.yaml
$ tk export kustomization gitops-system \
 > ./cluster-dev/gitops-system/toolkit-kustomization.yaml

You basically finished installing the GitOps Toolkit and below you have some useful commands to reconcile the configured custom resources:

$ tk reconcile source git gitops-system
$ tk reconcile kustomization gitops-system

I was thinking of explaining how to setup a Kubernetes platform repository and do release versioning with the Flux GitOps Toolkit in one of my next articles. Please let me know if you have questions.