Kubernetes Cluster API – Machine Health Check and AWS Spot instances

In my first article about the Kubernetes Cluster API and provisioning of AWS workload clusters I mentioned briefly configuring Machine Health Check for the data-place/worker nodes. The Cluster API also supports Machine Health Check for control-plane/master nodes and can automatically remediate any node issues by replacing and provision new instances. The configuration is the same, only the label selector is different for the node type.

Let’s take a look again at the Machine Health Check for data-plane/worker nodes, the selector label is set to nodepool: nodepool-0 to match the label which is configured in the MachineDeployment.

---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachineHealthCheck
metadata:
  name: cluster-1-node-unhealthy-5m
  namespace: k8s
spec:
  clusterName: cluster-1
  maxUnhealthy: 40%
  nodeStartupTimeout: 10m
  selector:
    matchLabels:
      nodepool: nodepool-0
  unhealthyConditions:
  - type: Ready
    status: Unknown
    timeout: 300s
  - type: Ready
    status: "False"
    timeout: 300s

To configure Machine Health Check for your control-plane/master add the label cluster.x-k8s.io/control-plane: “” as selector.

---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachineHealthCheck
metadata:
  name: cluster-1-master-unhealthy-5m
spec:
  clusterName: cluster-1
  maxUnhealthy: 30%
  selector:
    matchLabels:
      cluster.x-k8s.io/control-plane: ""
  unhealthyConditions:
    - type: Ready
      status: Unknown
      timeout: 300s
    - type: Ready
      status: "False"
      timeout: 300s

When both are applied you see the two node groups and the status of available nodes and expected/desired state.

$ kubectl get machinehealthcheck
NAME                                       MAXUNHEALTHY   EXPECTEDMACHINES   CURRENTHEALTHY
cluster-1-node-unhealthy-5m                40%            3                  3
cluster-1-master-unhealthy-5m              30%            3                  3

If you terminate one control- and data-plane node, the Machine Health Check identifies these after a couple of minutes and starts the remediation by provisioning new instances to replace the faulty ones. This takes around 5 to 10 min and your cluster is back into the desired state. The management cluster automatically repaired the workload cluster without manual intervention.

$ kubectl get machinehealthcheck
NAME                            MAXUNHEALTHY   EXPECTEDMACHINES   CURRENTHEALTHY
cluster-1-node-unhealthy-5m     40%            3                  2
cluster-1-master-unhealthy-5m   30%            3                  2

More information about Machine Health Check you can find in the Cluster API documentation.

However, a few ago, I didn’t test running the data-plane/worker nodes on AWS EC2 spot instances which is also supported option in the AWSMachineTemplate. Spot instances for control-plane nodes are not supported and don’t make sense because you need the master nodes available at all time.

Using spot instances can reduce the cost of running your workload cluster and you can see a cost saving of up to 60% – 70% compared to the on-demand price. Although AWS can reclaim these instance by terminating your spot instance at any point in time, they are reliable enough in combination with the Cluster API Machine Health Check that you could run production on spot instances with huge cost savings.

To use spot instances simply add the spotMarketOptions to the AWS Machine Template of the data-plane nodes and the Cluster API will automatically issue spot instance requests for these. If you don’t specify the maxPrice and leave this value blank, this will automatically put the on-demand price as max value for the requested instance type. It makes sense to leave this empty because you cannot be outbid if the marketplace of spot instance suddenly changes because of increasing compute demand.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSMachineTemplate
metadata:
  name: cluster-1-data-plane-0
  namespace: k8s
spec:
  template:
    spec:
      iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
      instanceType: t3.small
      sshKeyName: default
      spotMarketOptions:
        maxPrice: ""

In the AWS console you see the spot instance requests.

This is great in combination with the Machine Health Check that I explained earlier: if AWS suddenly does reclaim one or multiple of your spot instances, the Machine Health Check will automatically starts to remediate for these missing nodes by requesting new spot instance.

OpenShift Hive v1.1.x – Latest updates & new features

Over a year has gone by since my first article about Getting started with OpenShift Hive and my talk at the RedHat OpenShift Gathering when the first stable OpenShift Hive v1 version got released. In between a lot has happened and OpenShift Hive v1.1.1 was released a few weeks ago. So I wanted to look into the new functionalities of OpenShift Hive.

  • Operator Lifecycle Manager (OLM) installation

Hive is now available through the Operator Hub community catalog and can be installed on both OpenShift or native Kubernetes cluster through the OLM. The install is straightforward by adding the operator-group and subscription manifests:

---
apiVersion: operators.coreos.com/v1alpha2
kind: OperatorGroup
metadata:
  name: operatorgroup
  namespace: hive
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: hive
  namespace: hive
spec:
  channel: alpha
  name: hive-operator
  source: operatorhubio-catalog
  sourceNamespace: olm

Alternatively the Hive subscription can be configured with a manual install plan. In this case the OLM will not automatically upgrade the Hive operator when a new version is released – I highly recommend this for production deployments!

---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: hive
  namespace: hive
spec:
  channel: alpha
  name: hive-operator
  installPlanApproval: Manual
  source: operatorhubio-catalog
  sourceNamespace: olm

After a few seconds you see an install plan being added.

$ k get installplan
NAME            CSV                    APPROVAL   APPROVED
install-9drmh   hive-operator.v1.1.0   Manual     false

Edit the install plan and set approved value to true – the OLM will start and install or upgrade the Hive operator automatically.

...
spec:
  approval: Manual
  approved: true
  clusterServiceVersionNames:
  - hive-operator.v1.1.0
  generation: 1
...

After the Hive operator is installed you need to apply the Hiveconfig object for the operator to install all of the needed Hive components. On non-OpenShift installs (native Kubernetes) you still need to generate Hiveadmission certificates for the admission controller pods to start otherwise they are missing the hiveadmission-serving-cert secret.

  • Hiveconfig – Velero backup and delete protection

There are a few small but also very useful changes in the Hiveconfig object. You can now enable the deleteProtection option which prevents administrators from accidental deletions of ClusterDeployments or SyncSets. Another great addition is that you can enable automatic configuration of Velero to backup your cluster namespaces, meaning you’re not required to configure backups separately.

---
apiVersion: hive.openshift.io/v1
kind: HiveConfig
metadata:
  name: hive
spec:
  logLevel: info
  targetNamespace: hive
  deleteProtection: enabled
  backup:
    velero:
      enabled: true
      namespace: velero

Backups are configured in the Velero namespace as specified in the Hiveconfig.

$ k get backups -n velero
NAME                              AGE
backup-okd-2021-03-26t11-57-32z   3h12m
backup-okd-2021-03-26t12-00-32z   3h9m
backup-okd-2021-03-26t12-35-44z   154m
backup-okd-2021-03-26t12-38-44z   151m
...

With the deletion protection enabled in the hiveconfig, the controller automatically adds the annotation hive.openshift.io/protected-delete: “true” to all resources and prevents these from accidental deletions:

$ k delete cd okd --wait=0
The ClusterDeployment "okd" is invalid: metadata.annotations.hive.openshift.io/protected-delete: Invalid value: "true": cannot delete while annotation is present
  • ClusterSync and Scaling Hive controller

To check applied resources through SyncSets and SelectorSyncSets, where Hive has previously used Syncsetnstance but these no longer exists. This now has move to ClusterSync to collect status information about applied resources:

$ k get clustersync okd -o yaml
apiVersion: hiveinternal.openshift.io/v1alpha1
kind: ClusterSync
metadata:
  name: okd
  namespace: okd
spec: {}
status:
  conditions:
  - lastProbeTime: "2021-03-26T16:13:57Z"
    lastTransitionTime: "2021-03-26T16:13:57Z"
    message: All SyncSets and SelectorSyncSets have been applied to the cluster
    reason: Success
    status: "False"
    type: Failed
  firstSuccessTime: "2021-03-26T16:13:57Z"
...

It is also possible to horizontally scale the Hive controller to change the synchronisation frequency for running larger OpenShift deployments.

---
apiVersion: hive.openshift.io/v1
kind: HiveConfig
metadata:
  name: hive
spec:
  logLevel: info
  targetNamespace: hive
  deleteProtection: enabled
  backup:
    velero:
      enabled: true
      namespace: velero
  controllersConfig:
    controllers:
    - config:
        concurrentReconciles: 10
        replicas: 3
      name: clustersync

Please checkout the scaling test script which I found in the Github repo, you can simulate fake clusters by adding the annotation “hive.openshift.io/fake-cluster=true” to your ClusterDeployment.

  • Hibernating clusters

RedHat introduced that you can hibernate (shutdown) clusters in OpenShift 4.5 when they are not needed and switch them easily back on when you need them. This is now possible with OpenShift Hive: you can hibernate and change the power state of a cluster deployment.

$ kubectl patch cd okd --type='merge' -p $'spec:\n powerState: Hibernating'

Checking the cluster deployment and power state change to stopping.

$ kubectl get cd
NAME   PLATFORM   REGION      CLUSTERTYPE   INSTALLED   INFRAID     VERSION   POWERSTATE   AGE
okd    aws        eu-west-1                 true        okd-jpqgb   4.7.0     Stopping     44m

After a couple of minutes the power state of the cluster nodes will change to hibernating.

$ kubectl get cd
NAME   PLATFORM   REGION      CLUSTERTYPE   INSTALLED   INFRAID     VERSION   POWERSTATE    AGE
okd    aws        eu-west-1                 true        okd-jpqgb   4.7.0     Hibernating   47m

In the AWS console you see the cluster instances as stopped.

When turning the cluster back online, change the power state in the cluster deployment to running.

$ kubectl patch cd okd --type='merge' -p $'spec:\n powerState: Running'

Again the power state changes to resuming.

$ kubectl get cd
NAME   PLATFORM   REGION      CLUSTERTYPE   INSTALLED   INFRAID     VERSION   POWERSTATE   AGE
okd    aws        eu-west-1                 true        okd-jpqgb   4.7.0     Resuming     49m

A few minutes later the cluster changes to running and is ready to use again.

$ k get cd
NAME   PLATFORM   REGION      CLUSTERTYPE   INSTALLED   INFRAID     VERSION   POWERSTATE   AGE
okd    aws        eu-west-1                 true        okd-jpqgb   4.7.0     Running      61m
  • Cluster pools

Cluster pools is something which came together with the hibernating feature which allows you to pre-provision OpenShift clusters without actually allocating them and after the provisioning they will hibernate until you claim a cluster. Again a nice feature and ideal use-case for ephemeral type development or integration test environments which allows you to have clusters ready to go to claim when needed and dispose them afterwards.

Create a ClusterPool custom resource which is similar to a cluster deployment.

apiVersion: hive.openshift.io/v1
kind: ClusterPool
metadata:
  name: okd-eu-west-1
  namespace: hive
spec:
  baseDomain: okd.domain.com
  imageSetRef:
    name: okd-4.7-imageset
  installConfigSecretTemplateRef: 
    name: install-config
  skipMachinePools: true
  platform:
    aws:
      credentialsSecretRef:
        name: aws-creds
      region: eu-west-1
  pullSecretRef:
    name: pull-secret
  size: 3

To claim a cluster from a pool, apply the ClusterClaim resource.

apiVersion: hive.openshift.io/v1
kind: ClusterClaim
metadata:
  name: okd-claim
  namespace: hive
spec:
  clusterPoolName: okd-eu-west-1
  lifetime: 8h

I haven’t tested this yet but will definitely start using this in the coming weeks. Have a look at the Hive documentation on using ClusterPool and ClusterClaim.

  • Cluster relocation

For me, having used OpenShift Hive for over one and half years to run OpenShift 4 cluster, this is a very useful functionality because at some point you might need to rebuild or move your management services to a new Hive cluster. The ClusterRelocator object gives you the option to do this.

$ kubectl create secret generic new-hive-cluster-kubeconfig -n hive --from-file=kubeconfig=./new-hive-cluster.kubeconfig

Create the ClusterRelocator object and specify the kubeconfig of the remote Hive cluster, and also add a clusterDeploymentSelector:

apiVersion: hive.openshift.io/v1
kind: ClusterRelocate
metadata:
  name: migrate
spec:
  kubeconfigSecretRef:
    namespace: hive
    name: new-hive-cluster-kubeconfig
  clusterDeploymentSelector:
    matchLabels:
      migrate: cluster

To move cluster deployments, add the label migrate=cluster to your OpenShift clusters you want to move.

$ kubectl label clusterdeployment okd migrate=cluster

The cluster deployment will move to the new Hive cluster and will be removed from the source Hive cluster without the de-provision. It’s important to keep in mind that you need to copy any other resources you need, such as secrets, syncsets, selectorsyncsets and syncidentiyproviders, before moving the clusters. Take a look at the Hive documentation for the exact steps.

  • Useful annotation

Pause SyncSets by adding the annotation “hive.openshift.io/syncset-pause=true” to the clusterdeployment which stops the reconcile of defined resources and great for troubleshooting.

In a cluster deployment you can set the option to preserve cluster on delete which allows the user to disconnect a cluster from Hive without de-provisioning it.

$ kubectl patch cd okd --type='merge' -p $'spec:\n preserveOnDelete: true'

This sums up the new features and functionalities you can use with the latest OpenShift Hive version.

Kubernetes Cluster API – Provision workload clusters on AWS

The past few months I have been following the progress of the Kubernetes Cluster API which is part of the Kubernetes SIG (special interest group) Cluster-Lifecycle because they made good progress and wanted to try out the AWS provider version to deploy Kubeadm clusters. There are multiple infrastructure / cloud providers available which can be used, have a look at supported providers.

RedHat has based the Machine API Operator for the OpenShift 4 platform on the Kubernetes Cluster API and forked some of the cloud provider integrations but in OpenShift 4 this has a different use-case for the cluster to managed itself without the need of a central management cluster. I actually like RedHat’s concept and adaptation of the Cluster API and I hope we will see something similar in the upstream project.

Bootstrapping workload clusters are pretty straight forward but before we can start with deploying the workload cluster we need a central Kubernetes management cluster for running the Cluster API components for your selected cloud provider. In The Cluster API Book for example they use a KinD (Kubernetes in Docker) cluster to provision the workload clusters.

To deploy the Cluster API components you need the clusterctl (Cluster API) and clusterawsadm (Cluster API AWS Provider) command-line utilities.

curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/v0.3.14/clusterctl-linux-amd64 -o clusterctl
chmod +x ./clusterctl
sudo mv ./clusterctl /usr/local/bin/clusterctl
curl -L https://github.com/kubernetes-sigs/cluster-api-provider-aws/releases/download/v0.6.4/clusterawsadm-linux-amd64 -o clusterawsadm
chmod +x ./clusterawsadm
sudo mv ./clusterawsadm /usr/local/bin/clusterawsadm

Let’s start to prepare to initialise the management cluster. You need a AWS IAM service account and in my example I enabled the experimental features-gates for MachinePool and ClusterResourceSets before running clusterawsadm to apply the required AWS IAM configuration.

$ export AWS_ACCESS_KEY_ID='<-YOUR-ACCESS-KEY->'
$ export AWS_SECRET_ACCESS_KEY='<-YOUR-SECRET-ACCESS-KEY->'
$ export EXP_MACHINE_POOL=true
$ export EXP_CLUSTER_RESOURCE_SET=true
$ clusterawsadm bootstrap iam create-cloudformation-stack
Attempting to create AWS CloudFormation stack cluster-api-provider-aws-sigs-k8s-io
I1206 22:23:19.620891  357601 service.go:59] AWS Cloudformation stack "cluster-api-provider-aws-sigs-k8s-io" already exists, updating

Following resources are in the stack: 

Resource                  |Type                                                                                |Status
AWS::IAM::InstanceProfile |control-plane.cluster-api-provider-aws.sigs.k8s.io                                  |CREATE_COMPLETE
AWS::IAM::InstanceProfile |controllers.cluster-api-provider-aws.sigs.k8s.io                                    |CREATE_COMPLETE
AWS::IAM::InstanceProfile |nodes.cluster-api-provider-aws.sigs.k8s.io                                          |CREATE_COMPLETE
AWS::IAM::ManagedPolicy   |arn:aws:iam::552276840222:policy/control-plane.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE
AWS::IAM::ManagedPolicy   |arn:aws:iam::552276840222:policy/nodes.cluster-api-provider-aws.sigs.k8s.io         |CREATE_COMPLETE
AWS::IAM::ManagedPolicy   |arn:aws:iam::552276840222:policy/controllers.cluster-api-provider-aws.sigs.k8s.io   |CREATE_COMPLETE
AWS::IAM::Role            |control-plane.cluster-api-provider-aws.sigs.k8s.io                                  |CREATE_COMPLETE
AWS::IAM::Role            |controllers.cluster-api-provider-aws.sigs.k8s.io                                    |CREATE_COMPLETE
AWS::IAM::Role            |nodes.cluster-api-provider-aws.sigs.k8s.io                                          |CREATE_COMPLETE

This might take a few minutes before you can continue and run clusterctl to initialise the Cluster API components on your Kubernetes management cluster with the option –watching-namespace where you can apply the cluster deployment manifests.

$ export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm bootstrap credentials encode-as-profile)

WARNING: `encode-as-profile` should only be used for bootstrapping.

$ clusterctl init --infrastructure aws --watching-namespace k8s
Fetching providers
Installing cert-manager Version="v0.16.1"
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v0.3.14" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.14" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-aws" Version="v0.6.3" TargetNamespace="capa-system"

Your management cluster has been initialized successfully!

You can now create your first workload cluster by running the following:

  clusterctl config cluster [name] --kubernetes-version [version] | kubectl apply -f -

Now we have finished deploying the needed Cluster API components and are ready to create your first Kubernetes workload cluster. I go through the different custom resources and configuration options for the cluster provisioning. This starts with the cloud infrastructure configuration as you see in the example below for the VPC setup. You don’t have to use all three Availability Zone and can start with a single AZ in a region.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSCluster
metadata:
  name: cluster-1
  namespace: k8s
spec:
  region: eu-west-1
  sshKeyName: default
  networkSpec:
    vpc:
      cidrBlock: "10.0.0.0/23"
    subnets:
    - availabilityZone: eu-west-1a
      cidrBlock: "10.0.0.0/27"
      isPublic: true
    - availabilityZone: eu-west-1b
      cidrBlock: "10.0.0.32/27"
      isPublic: true
    - availabilityZone: eu-west-1c
      cidrBlock: "10.0.0.64/27"
      isPublic: true
    - availabilityZone: eu-west-1a
      cidrBlock: "10.0.1.0/27"
    - availabilityZone: eu-west-1b
      cidrBlock: "10.0.1.32/27"
    - availabilityZone: eu-west-1c
      cidrBlock: "10.0.1.64/27"

Alternatively you can also provision the workload cluster into an existing VPC, in this case your cloud infrastructure configuration looks slightly different and you need to specify VPC and subnet IDs.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSCluster
metadata:
  name: cluster-1
  namespace: k8s
spec:
  region: eu-west-1
  sshKeyName: default
  networkSpec:
    vpc:
      id: vpc-0425c335226437144
    subnets:
    - id: subnet-0261219d564bb0dc5
    - id: subnet-0fdcccba78668e013
...

Next we define the Kubeadm control-plane configuration and start with the AWS Machine Template to define the instance type and custom node configuration. Then follows the Kubeadm control-plane config referencing the machine template and amounts of replicas and Kubernetes control-plane version:

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSMachineTemplate
metadata:
  name: cluster-1
  namespace: k8s
spec:
  template:
    spec:
      iamInstanceProfile: control-plane.cluster-api-provider-aws.sigs.k8s.io
      instanceType: t3.small
      sshKeyName: default
---
apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
kind: KubeadmControlPlane
metadata:
  name: cluster-1-control-plane
  namespace: k8s
spec:
  infrastructureTemplate:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: AWSMachineTemplate
    name: cluster-1-control-plane
  kubeadmConfigSpec:
    clusterConfiguration:
      apiServer:
        extraArgs:
          cloud-provider: aws
      controllerManager:
        extraArgs:
          cloud-provider: aws
    initConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: aws
        name: '{{ ds.meta_data.local_hostname }}'
    joinConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: aws
        name: '{{ ds.meta_data.local_hostname }}'
  replicas: 1
  version: v1.20.4

We continue with the data-plane (worker) nodes which also starts with the AWS machine template, additionally we need a Kubeadm Config Template and then the Machine Deployment for the worker nodes with a number of replicas and used Kubernetes version.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSMachineTemplate
metadata:
  name: cluster-1-data-plane-0
  namespace: k8s
spec:
  template:
    spec:
      iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
      instanceType: t3.small
      sshKeyName: default
---
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
kind: KubeadmConfigTemplate
metadata:
  name: cluster-1-data-plane-0
  namespace: k8s
spec:
  template:
    spec:
      joinConfiguration:
        nodeRegistration:
          kubeletExtraArgs:
            cloud-provider: aws
          name: '{{ ds.meta_data.local_hostname }}'
---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachineDeployment
metadata:
  name: cluster-1-data-plane-0
  namespace: k8s
spec:
  clusterName: cluster-1
  replicas: 1
  selector:
    matchLabels: null
  template:
    metadata:
      labels:
        "nodepool": "nodepool-0"
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
          kind: KubeadmConfigTemplate
          name: cluster-1-data-plane-0
      clusterName: cluster-1
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
        kind: AWSMachineTemplate
        name: cluster-1-data-plane-0
      version: v1.20.4

A workload cluster can be very easily upgraded by changing the .spec.version in the MachineDeployment and KubeadmControlPlane configuration. You can’t jump over a Kubernetes versions and can only upgrade to the next available version example: v1.18.4 to v1.19.8 or v1.19.8 to v1.20.4. See the list of supported AMIs and Kubernetes versions for the AWS provider.

At the beginning we enabled the feature-gates when we were initialising the management cluster to allow us to use ClusterResourceSets. This is incredible useful because I can define a set of resources which gets applied during the provisioning of the cluster. This only get executed one time during the bootstrap and will be not reconciled afterwards. In the configuration you see the reference to two configmaps for adding the Calico CNI plugin and the Nginx Ingress controller.

---
apiVersion: addons.cluster.x-k8s.io/v1alpha3
kind: ClusterResourceSet
metadata:
  name: cluster-1-crs-0
  namespace: k8s
spec:
  clusterSelector:
    matchLabels:
      cluster.x-k8s.io/cluster-name: cluster-1
  resources:
  - kind: ConfigMap
    name: calico-cni
  - kind: ConfigMap
    name: nginx-ingress

Example of the two configmaps which contain the YAML manifests:

apiVersion: v1
kind: ConfigMap
metadata:
  creationTimestamp: null
  name: calico-cni
  namespace: k8s
data:
  calico.yaml: |+
    ---
    # Source: calico/templates/calico-config.yaml
    # This ConfigMap is used to configure a self-hosted Calico installation.
    kind: ConfigMap
    apiVersion: v1
    metadata:
      name: calico-config
      namespace: kube-system
...
---
apiVersion: v1
data:
  deploy.yaml: |+
    ---
    apiVersion: v1
    kind: Namespace
    metadata:
      name: ingress-nginx
      labels:
        app.kubernetes.io/name: ingress-nginx
        app.kubernetes.io/instance: ingress-nginx
...

Without ClusterResourceSet you would need to manually apply the CNI and ingress controller manifests which is not great because you need the CNI plugin for all nodes to go into Ready state.

$ kubectl --kubeconfig=./cluster-1.kubeconfig   apply -f https://docs.projectcalico.org/v3.15/manifests/calico.yaml
$ kubectl --kubeconfig=./cluster-1.kubeconfig apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.41.2/deploy/static/provider/aws/deploy.yaml

Finally after we have created the configuration of the workload cluster we can apply cluster manifest with the option for setting custom clusterNetwork and specify with service and pod IP range.

---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: Cluster
metadata:
  name: cluster-1
  namespace: k8s
  labels:
    cluster.x-k8s.io/cluster-name: cluster-1
spec:
  clusterNetwork:
    services:
      cidrBlocks:
      - 172.30.0.0/16
    pods:
      cidrBlocks:
      - 10.128.0.0/14
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
    kind: KubeadmControlPlane
    name: cluster-1-control-plane
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: AWSCluster
    name: cluster-1

The provisioning of the workload cluster will take around 10 to 15 mins and you can follow the progress by checking the status of different configurations we have applied previously.

You can scale both Kubeadm control-plane and MachineDeployment afterwards to change the size of your cluster. MachineDeployment can be scaled down to zero to save cost.

$ kubectl scale KubeadmControlPlane cluster-1-control-plane --replicas=1
$ kubectl scale MachineDeployment cluster-1-data-plane-0 --replicas=0

After the provisioning is completed you can get kubeconfig of the cluster from the secret which got created during the bootstrap:

$ kubectl --namespace=k8s get secret cluster-1-kubeconfig    -o jsonpath={.data.value} | base64 --decode    > cluster-1.kubeconfig

Example check the node state.

$ kubectl --kubeconfig=./cluster-1.kubeconfig get nodes

When your cluster is provisioned and nodes are in Ready state you can apply the MachineHealthCheck for the data-plane (worker) nodes. This automatically remediate unhealthy nodes and provisions new nodes to join them into the cluster.

---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachineHealthCheck
metadata:
  name: cluster-1-node-unhealthy-5m
  namespace: k8s
spec:
  # clusterName is required to associate this MachineHealthCheck with a particular cluster
  clusterName: cluster-1
  # (Optional) maxUnhealthy prevents further remediation if the cluster is already partially unhealthy
  maxUnhealthy: 40%
  # (Optional) nodeStartupTimeout determines how long a MachineHealthCheck should wait for
  # a Node to join the cluster, before considering a Machine unhealthy
  nodeStartupTimeout: 10m
  # selector is used to determine which Machines should be health checked
  selector:
    matchLabels:
      nodepool: nodepool-0 
  # Conditions to check on Nodes for matched Machines, if any condition is matched for the duration of its timeout, the Machine is considered unhealthy
  unhealthyConditions:
  - type: Ready
    status: Unknown
    timeout: 300s
  - type: Ready
    status: "False"
    timeout: 300s

I hope this is a useful article for getting started with the Kubernetes Cluster API.

OpenShift Hive – Deploy Single Node (All-in-One) OKD Cluster on AWS

The concept of a single-node or All-in-One OpenShift / Kubernetes cluster isn’t something new, years ago when I was working with OpenShift 3 and before that with native Kubernetes, we were using single-node clusters as ephemeral development environment, integrations testing for pull-request or platform releases. It was only annoying because this required complex Jenkins pipelines, provision the node first, then install prerequisites and run the openshift-ansible installer playbook. Not always reliable and not a great experience but it done the job.

This is possible as well with the new OpenShift/OKD 4 version and with the help from OpenShift Hive. The experience is more reliable and quicker than previously and I don’t need to worry about de-provisioning, I will let Hive delete the cluster after a few hours automatically.

It requires a few simple modifications in the install-config. You need to add the Availability Zone you want where the instance will be created. When doing this the VPC will only have two subnets, one public and one private subnet in eu-west-1. You can also install the single-node cluster into an existing VPC you just have to specify subnet ids. Change the compute worker node replicas zero and control-plane replicas to one. Make sure to have an instance size with enough CPU and memory for all OpenShift components because they need to fit onto the single node. The rest of the install-config is pretty much standard.

---
apiVersion: v1
baseDomain: k8s.domain.com
compute:
- name: worker
  platform:
    aws:
      zones:
      - eu-west-1a
      rootVolume:
        iops: 100
        size: 22
        type: gp2
      type: r4.xlarge
  replicas: 0
controlPlane:
  name: master
  platform:
    aws:
      zones:
      - eu-west-1a
      rootVolume:
        iops: 100
        size: 22
        type: gp2
      type: r5.2xlarge
  replicas: 1
metadata:
  creationTimestamp: null
  name: okd-aio
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineCIDR: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: eu-west-1
pullSecret: ""
sshKey: ""

Create a new install-config secret for the cluster.

kubectl create secret generic install-config-aio -n okd --from-file=install-config.yaml=./install-config-aio.yaml

We will be using OpenShift Hive for the cluster deployment because the provision is more simplified and Hive can also apply any configuration using SyncSets or SelectorSyncSets which is needed. Add the annotation hive.openshift.io/delete-after: “2h” and Hive will automatically delete the cluster after 4 hours.

---
apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
  creationTimestamp: null
  annotations:
    hive.openshift.io/delete-after: "2h"
  name: okd-aio 
  namespace: okd
spec:
  baseDomain: k8s.domain.com
  clusterName: okd-aio
  controlPlaneConfig:
    servingCertificates: {}
  installed: false
  platform:
    aws:
      credentialsSecretRef:
        name: aws-creds
      region: eu-west-1
  provisioning:
    releaseImage: quay.io/openshift/okd:4.5.0-0.okd-2020-07-14-153706-ga
    installConfigSecretRef:
      name: install-config-aio
  pullSecretRef:
    name: pull-secret
  sshKey:
    name: ssh-key
status:
  clusterVersionStatus:
    availableUpdates: null
    desired:
      force: false
      image: ""
      version: ""
    observedGeneration: 0
    versionHash: ""

Apply the cluster deployment to your clusters namespace.

kubectl apply -f  ./clusterdeployment-aio.yaml

This is slightly faster than provision 6 nodes cluster and will take around 30mins until your ephemeral test cluster is ready to use.

Mozilla SOPS and GitOps Toolkit (Flux CD v2) to decrypt and apply Kubernetes secrets

Using GitOps way of working and tools like the GitOps toolkit (Flux CD v2) is great for applying configuration to your Kubernetes clusters but what about secrets and how can you store them securely in your repository? The perfect tool for this is Mozilla’s SOPS which uses a cloud based KMS, HashiCorp Vault or a PGP key to encrypt and decrypt your secrets and store them in encrypted form with the rest of your configuration in a code repostory. There is a guide in the Flux documentation about how to use SOPS but I did this slightly differently with a Google Cloud KMS.

Start by downloading the latest version of the Mozilla SOPS command-line binary. This is what makes SOPS so easy to use, there is not much you need to encrypt or decrypt secrets apart for an KMS system or a simple PGP key.

sudo wget -O /usr/local/bin/sops https://github.com/mozilla/sops/releases/download/v3.7.1/sops-v3.7.1.linux
sudo chmod 755 /usr/local/bin/sops

Next create the Google Cloud KMS, which I am using in my example.

$ gcloud auth application-default login
Go to the following link in your browser:

    https://accounts.google.com/o/oauth2/auth?code_challenge=xxxxxxx&prompt=select_account&code_challenge_method=S256&access_type=offline&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code&client_id=xxxxxxxxx-xxxxxxxxxxxxxxxxxx.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Faccounts.reauth


Enter verification code: xxxxxxxxxxxxxxx

Credentials saved to file: [/home/ubuntu/.config/gcloud/application_default_credentials.json]

These credentials will be used by any library that requests Application Default Credentials (ADC).
$ gcloud kms keyrings create sops --location global
$ gcloud kms keys create sops-key --location global --keyring sops --purpose encryption
$ gcloud kms keys list --location global --keyring sops
NAME                                                                           PURPOSE          ALGORITHM                    PROTECTION_LEVEL  LABELS  PRIMARY_ID  PRIMARY_STATE
projects/kubernetes-xxxxxx/locations/global/keyRings/sops/cryptoKeys/sops-key  ENCRYPT_DECRYPT  GOOGLE_SYMMETRIC_ENCRYPTION  SOFTWARE                  1           ENABLED

To encrypt secrets you need to create a .sops.yaml file in root your code repository.

creation_rules:
  - path_regex: \.yaml$
    gcp_kms: projects/kubernetes-xxxxxx/locations/global/keyRings/sops/cryptoKeys/sops-key
    encrypted_regex: ^(data|stringData)$

Let’s create a simple Kubernetes secret for testing.

$ cat secret.yaml
---
apiVersion: v1
kind: Secret
metadata:
  name: mysecret
  namespace: default
type: Opaque
data:
  username: YWRtaW4=
  password: MWYyZDFlMmU2N2Rm

Encrypt your secret.yaml using SOPS with the following example.

$ sops -e secret.yaml
apiVersion: v1
kind: Secret
metadata:
    name: mysecret
    namespace: default
type: Opaque
data:
    username: ENC[AES256_GCM,data:<-HASH->,type:str]
    password: ENC[AES256_GCM,data:<-HASH->,type:str]
sops:
    kms: []
    gcp_kms:
    -   resource_id: projects/kubernetes-xxxxxx/locations/global/keyRings/sops/cryptoKeys/sops-key
        created_at: '2021-03-01T17:25:29Z'
        enc: <-HASH->
    azure_kv: []
    lastmodified: '2021-03-01T17:25:29Z'
    mac: ENC[AES256_GCM,data:<-HASH->,type:str]
    pgp: []
    encrypted_regex: ^(data|stringData)$
    version: 3.5.0

Alternatively you can encrypt and replace the file in-place.

$ sops -i -e secret.yaml

To decrypt the yaml file use sops -d or replace in-place using sops -i -d.

$ sops -d secret.yaml
apiVersion: v1
kind: Secret
metadata:
    name: mysecret
    namespace: default
type: Opaque
data:
    username: YWRtaW4=
    password: MWYyZDFlMmU2N2Rm

You can also edit an encrypted file with the default terminal editor by directly using the sops command without any options.

$ sops secret.yaml
File has not changed, exiting.

Let’s use the Flux CD Kustomize controller for this to decrypt Kubernetes secrets and apply to the specified namespace. First you need to create a GCP service account for Flux and grant the permission to decrypt.

Download the GCP json authentication file for the service account and create a new secret in the Flux namespace.

$ kubectl create secret generic gcp-auth -n gotk-system --from-file=./sops-gcp
$ kubectl get secrets -n gotk-system gcp-auth -o yaml
apiVersion: v1
data:
  sops-gcp: <-BASE64-ENCODED-GCP-AUTH-JSON->
kind: Secret
metadata:
  creationTimestamp: "2021-03-01T17:34:11Z"
  name: gcp-auth
  namespace: gotk-system
  resourceVersion: "1879000"
  selfLink: /api/v1/namespaces/gotk-system/secrets/gcp-auth
  uid: 10a14c1f-19a6-41a2-8610-694b12efefee
type: Opaque

You need to update the kustomize-controller deployment and add the volume mount for the sops GCP secret and the environment variable with the value where to find the Google application credential file. This is where my example is different to what is documented because I am not using integrated cloud authentication because my cluster is running locally.

...
    spec:
      containers:
...
      - env:
        - name: GOOGLE_APPLICATION_CREDENTIALS
          value: /tmp/.gcp/credentials/sops-gcp
        name: manager
        volumeMounts:
        - mountPath: /tmp/.gcp/credentials
          name: sops-gcp
          readOnly: true
      volumes:
      - name: sops-gcp
        secret:
          defaultMode: 420
          secretName: sops-gcp
...

In the Kustomize object you enable the sops decryption provider and the controller automatically decrypts and applies secrets in the next reconcile loop.

apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
  name: cluster
  namespace: gotk-system
spec:
  decryption:
    provider: sops
  interval: 5m0s
  path: ./clusters/cluster-dev
  prune: true
  sourceRef:
    kind: GitRepository
    name: github-source

This takes a few minutes until the sync is completed and we can find out if the example secret got created correctly.

$ kubectl get secrets mysecret -n default -o yaml
apiVersion: v1
data:
  password: MWYyZDFlMmU2N2Rm
  username: YWRtaW4=
kind: Secret
metadata:
  name: mysecret
  namespace: default
  resourceVersion: "3439293"
  selfLink: /api/v1/namespaces/default/secrets/mysecret
  uid: 4a009675-3c89-448b-bb86-6211cec3d4ea
type: Opaque

This is how to use SOPS and Flux CD v2 to decrypt and apply Kubernetes secrets using GitOps.