Install OpenShift/OKD 4.9.x Single Node Cluster (SNO) using OpenShift Hive/ACM

I haven’t written much since the summer 2021 and I thought I start the New Year with a little update regarding OpenShift/OKD 4.9 Single Node cluster (SNO) installation. The single node type is not new because I have been using these All-in-One or Single Node clusters since OpenShift 3.x and it worked great until OpenShift 4.7. When RedHat released OpenShift 4.8 the single node installation stopped working because of issue with the control-plane because it expected three nodes for high availability and this installation method was possible till then but not officially supported by RedHat.

When the OpenShift 4.9 release was announced the single node installation method called SNO became a supported way for deploying OpenShift Edge clusters on  bare-metal or virtual machine using the RedHat Cloud Assisted Installer.

This opened the possibility again to install OpenShift/OKD 4.9 as a single node (SNO) on any cloud provider like AWS, GCP or Azure through the openshift-install command line utility or through OpenShift Hive / Advanced Cluster Management operator.

The install-config.yaml for a single node cluster is pretty much the same like for a normal cluster only that you change the worker node replicas to zero and control-plane (master) nodes to one. Make sure your instance size has minimum 8x vCPUs and 32 GB of memory.

---
apiVersion: v1
baseDomain: k8s.domain.com
compute:
- name: worker
  platform:
    aws:
      rootVolume:
        iops: 100
        size: 22
        type: gp2
      type: m5.2xlarge
  replicas: 0
controlPlane:
  name: master
  platform:
    aws:
      rootVolume:
        iops: 100
        size: 22
        type: gp2
      type: m5.2xlarge
  replicas: 1
metadata:
  creationTimestamp: null
  name: okd-eu-west-1
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineCIDR: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: eu-west-1
pullSecret: ""
sshKey: ""

I am using OpenShift Hive for installing the OKD 4.9 single node cluster which requires Kubernetes to run the Hive operator.

Create a install-config secret:

$ kubectl create secret generic install-config -n okd --from-file=install-config.yaml=./okd-sno-install-config.yaml 

In the ClusterDeployment you specify AWS credentials, reference the install-config and the release image for OKD 4.9. Here you can find the latest OKD release image tags: https://quay.io/repository/openshift/okd

---
apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
  creationTimestamp: null
  name: okd-eu-west-1
  namespace: okd
spec:
  baseDomain: k8s.domain.com
  clusterName: okd-eu-west-1
  controlPlaneConfig:
    servingCertificates: {}
  installed: false
  platform:
    aws:
      credentialsSecretRef:
        name: aws-creds
      region: eu-west-1
  provisioning:
    releaseImage: quay.io/openshift/okd:4.9.0-0.okd-2022-01-14-230113
    installConfigSecretRef:
      name: install-config
  pullSecretRef:
    name: pull-secret

Apply the cluster deployment and wait for Hive to install the OpenShift/OKD cluster.

$ kubectl apply -f ./okd-clusterdeployment.yaml 

The provision pod will output the messages from the openshift-install binary and the cluster will be finish the installation in around 35mins.

$ kubectl logs okd-eu-west-1-0-8vhnf-provision-qrjrg -c hive -f
time="2022-01-15T15:51:32Z" level=debug msg="Couldn't find install logs provider environment variable. Skipping."
time="2022-01-15T15:51:32Z" level=debug msg="checking for SSH private key" installID=m2zcxsds
time="2022-01-15T15:51:32Z" level=info msg="unable to initialize host ssh key" error="cannot configure SSH agent as SSH_PRIV_KEY_PATH is unset or empty" installID=m2zcxsds
time="2022-01-15T15:51:32Z" level=info msg="waiting for files to be available: [/output/openshift-install /output/oc]" installID=m2zcxsds
time="2022-01-15T15:51:32Z" level=info msg="found file" installID=m2zcxsds path=/output/openshift-install
time="2022-01-15T15:51:32Z" level=info msg="found file" installID=m2zcxsds path=/output/oc
time="2022-01-15T15:51:32Z" level=info msg="all files found, ready to proceed" installID=m2zcxsds
time="2022-01-15T15:51:35Z" level=info msg="copied /output/openshift-install to /home/hive/openshift-install" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=info msg="copied /output/oc to /home/hive/oc" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=info msg="copying install-config.yaml" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=info msg="copied /installconfig/install-config.yaml to /output/install-config.yaml" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=info msg="waiting for files to be available: [/output/.openshift_install.log]" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=info msg="cleaning up from past install attempts" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=warning msg="skipping cleanup as no infra ID set" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=debug msg="object does not exist" installID=m2zcxsds object=okd/okd-eu-west-1-0-8vhnf-admin-kubeconfig
time="2022-01-15T15:51:36Z" level=debug msg="object does not exist" installID=m2zcxsds object=okd/okd-eu-west-1-0-8vhnf-admin-password
time="2022-01-15T15:51:36Z" level=info msg="generating assets" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=info msg="running openshift-install create manifests" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=info msg="running openshift-install binary" args="[create manifests]" installID=m2zcxsds
time="2022-01-15T15:51:37Z" level=info msg="found file" installID=m2zcxsds path=/output/.openshift_install.log
time="2022-01-15T15:51:37Z" level=info msg="all files found, ready to proceed" installID=m2zcxsds
time="2022-01-15T15:51:36Z" level=debug msg="OpenShift Installer unreleased-master-5011-geb132dae953888e736c382f1176c799c0e1aa49e-dirty"
time="2022-01-15T15:51:36Z" level=debug msg="Built from commit eb132dae953888e736c382f1176c799c0e1aa49e"
time="2022-01-15T15:51:36Z" level=debug msg="Fetching Master Machines..."
time="2022-01-15T15:51:36Z" level=debug msg="Loading Master Machines..."
time="2022-01-15T15:51:36Z" level=debug msg="  Loading Cluster ID..."
time="2022-01-15T15:51:36Z" level=debug msg="    Loading Install Config..."
time="2022-01-15T15:51:36Z" level=debug msg="      Loading SSH Key..."
time="2022-01-15T15:51:36Z" level=debug msg="      Loading Base Domain..."

....

time="2022-01-15T16:14:17Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.9.0-0.okd-2022-01-14-230113"
time="2022-01-15T16:14:31Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.9.0-0.okd-2022-01-14-230113: 529 of 744 done (71% complete)"
time="2022-01-15T16:14:32Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.9.0-0.okd-2022-01-14-230113: 585 of 744 done (78% complete)"
time="2022-01-15T16:14:47Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.9.0-0.okd-2022-01-14-230113: 702 of 744 done (94% complete)"
time="2022-01-15T16:15:02Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.9.0-0.okd-2022-01-14-230113: 703 of 744 done (94% complete)"
time="2022-01-15T16:15:32Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.9.0-0.okd-2022-01-14-230113: 708 of 744 done (95% complete)"
time="2022-01-15T16:15:47Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.9.0-0.okd-2022-01-14-230113: 720 of 744 done (96% complete)"
time="2022-01-15T16:16:02Z" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.9.0-0.okd-2022-01-14-230113: 722 of 744 done (97% complete)"
time="2022-01-15T16:17:17Z" level=debug msg="Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, console, monitoring"
time="2022-01-15T16:18:02Z" level=debug msg="Cluster is initialized"
time="2022-01-15T16:18:02Z" level=info msg="Waiting up to 10m0s for the openshift-console route to be created..."
time="2022-01-15T16:18:02Z" level=debug msg="Route found in openshift-console namespace: console"
time="2022-01-15T16:18:02Z" level=debug msg="OpenShift console route is admitted"
time="2022-01-15T16:18:02Z" level=info msg="Install complete!"
time="2022-01-15T16:18:02Z" level=info msg="To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/output/auth/kubeconfig'"
time="2022-01-15T16:18:02Z" level=info msg="Access the OpenShift web-console here: https://console-openshift-console.apps.okd-eu-west-1.k8s.domain.com"
time="2022-01-15T16:18:02Z" level=debug msg="Time elapsed per stage:"
time="2022-01-15T16:18:02Z" level=debug msg="           cluster: 6m35s"
time="2022-01-15T16:18:02Z" level=debug msg="         bootstrap: 34s"
time="2022-01-15T16:18:02Z" level=debug msg="Bootstrap Complete: 12m46s"
time="2022-01-15T16:18:02Z" level=debug msg="               API: 4m2s"
time="2022-01-15T16:18:02Z" level=debug msg=" Bootstrap Destroy: 1m15s"
time="2022-01-15T16:18:02Z" level=debug msg=" Cluster Operators: 4m59s"
time="2022-01-15T16:18:02Z" level=info msg="Time elapsed: 26m13s"
time="2022-01-15T16:18:03Z" level=info msg="command completed successfully" installID=m2zcxsds
time="2022-01-15T16:18:03Z" level=info msg="saving installer output" installID=m2zcxsds
time="2022-01-15T16:18:03Z" level=info msg="install completed successfully" installID=m2zcxsds

Check the cluster deployment and get the kubeadmin password from the secret the Hive operator created during the installation and login to the web console:

$ kubectl get clusterdeployments
NAME            PLATFORM   REGION      CLUSTERTYPE   INSTALLED   INFRAID               VERSION   POWERSTATE   AGE
okd-eu-west-1   aws        eu-west-1                 true        okd-eu-west-1-l4g4n   4.9.0     Running      39m
$ kubectl get secrets okd-eu-west-1-0-8vhnf-admin-password -o jsonpath={.data.password} | base64 -d
EP5Fs-TZrKj-Vtst6-5GWZ9

The cluster details show that the control plane runs as single master node:

Your cluster has a single combined master/worker node:

These single node type clusters can be used in combination with OpenShift Hive ClusterPools to have an amount of pre-installed OpenShift/OKD clusters available for automated tests or as temporary development environment.

apiVersion: hive.openshift.io/v1
kind: ClusterPool
metadata:
  name: okd-eu-west-1-pool
  namespace: okd
spec:
  baseDomain: k8s.domain.com
  imageSetRef:
    name: 4.9.0-0.okd-2022-01-14
  installConfigSecretTemplateRef:
    name: install-config
  platform:
    aws:
      credentialsSecretRef:
        name: aws-creds
      region: eu-west-1
  pullSecretRef:
    name: pull-secret
  size: 3

The clusters are hibernating (shutdown) in the pool and will be powered on when you apply the ClusterClaim to allocate a cluster with a lifetime set to 8 hours. After 8 hours the cluster gets automatically deleted by the Hive operator.

apiVersion: hive.openshift.io/v1
kind: ClusterClaim
metadata:
  name: test-1
  namespace: okd
spec:
  clusterPoolName: okd-eu-west-1-pool
  lifetime: 8h

This sums up how to deploy a OpenShift/OKD 4.9 as single node cluster. I hope this article is helpful and leave a comment if you have questions.

OpenShift Hive v1.1.x – Latest updates & new features

Over a year has gone by since my first article about Getting started with OpenShift Hive and my talk at the RedHat OpenShift Gathering when the first stable OpenShift Hive v1 version got released. In between a lot has happened and OpenShift Hive v1.1.1 was released a few weeks ago. So I wanted to look into the new functionalities of OpenShift Hive.

  • Operator Lifecycle Manager (OLM) installation

Hive is now available through the Operator Hub community catalog and can be installed on both OpenShift or native Kubernetes cluster through the OLM. The install is straightforward by adding the operator-group and subscription manifests:

---
apiVersion: operators.coreos.com/v1alpha2
kind: OperatorGroup
metadata:
  name: operatorgroup
  namespace: hive
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: hive
  namespace: hive
spec:
  channel: alpha
  name: hive-operator
  source: operatorhubio-catalog
  sourceNamespace: olm

Alternatively the Hive subscription can be configured with a manual install plan. In this case the OLM will not automatically upgrade the Hive operator when a new version is released – I highly recommend this for production deployments!

---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: hive
  namespace: hive
spec:
  channel: alpha
  name: hive-operator
  installPlanApproval: Manual
  source: operatorhubio-catalog
  sourceNamespace: olm

After a few seconds you see an install plan being added.

$ k get installplan
NAME            CSV                    APPROVAL   APPROVED
install-9drmh   hive-operator.v1.1.0   Manual     false

Edit the install plan and set approved value to true – the OLM will start and install or upgrade the Hive operator automatically.

...
spec:
  approval: Manual
  approved: true
  clusterServiceVersionNames:
  - hive-operator.v1.1.0
  generation: 1
...

After the Hive operator is installed you need to apply the Hiveconfig object for the operator to install all of the needed Hive components. On non-OpenShift installs (native Kubernetes) you still need to generate Hiveadmission certificates for the admission controller pods to start otherwise they are missing the hiveadmission-serving-cert secret.

  • Hiveconfig – Velero backup and delete protection

There are a few small but also very useful changes in the Hiveconfig object. You can now enable the deleteProtection option which prevents administrators from accidental deletions of ClusterDeployments or SyncSets. Another great addition is that you can enable automatic configuration of Velero to backup your cluster namespaces, meaning you’re not required to configure backups separately.

---
apiVersion: hive.openshift.io/v1
kind: HiveConfig
metadata:
  name: hive
spec:
  logLevel: info
  targetNamespace: hive
  deleteProtection: enabled
  backup:
    velero:
      enabled: true
      namespace: velero

Backups are configured in the Velero namespace as specified in the Hiveconfig.

$ k get backups -n velero
NAME                              AGE
backup-okd-2021-03-26t11-57-32z   3h12m
backup-okd-2021-03-26t12-00-32z   3h9m
backup-okd-2021-03-26t12-35-44z   154m
backup-okd-2021-03-26t12-38-44z   151m
...

With the deletion protection enabled in the hiveconfig, the controller automatically adds the annotation hive.openshift.io/protected-delete: “true” to all resources and prevents these from accidental deletions:

$ k delete cd okd --wait=0
The ClusterDeployment "okd" is invalid: metadata.annotations.hive.openshift.io/protected-delete: Invalid value: "true": cannot delete while annotation is present
  • ClusterSync and Scaling Hive controller

To check applied resources through SyncSets and SelectorSyncSets, where Hive has previously used Syncsetnstance but these no longer exists. This now has move to ClusterSync to collect status information about applied resources:

$ k get clustersync okd -o yaml
apiVersion: hiveinternal.openshift.io/v1alpha1
kind: ClusterSync
metadata:
  name: okd
  namespace: okd
spec: {}
status:
  conditions:
  - lastProbeTime: "2021-03-26T16:13:57Z"
    lastTransitionTime: "2021-03-26T16:13:57Z"
    message: All SyncSets and SelectorSyncSets have been applied to the cluster
    reason: Success
    status: "False"
    type: Failed
  firstSuccessTime: "2021-03-26T16:13:57Z"
...

It is also possible to horizontally scale the Hive controller to change the synchronisation frequency for running larger OpenShift deployments.

---
apiVersion: hive.openshift.io/v1
kind: HiveConfig
metadata:
  name: hive
spec:
  logLevel: info
  targetNamespace: hive
  deleteProtection: enabled
  backup:
    velero:
      enabled: true
      namespace: velero
  controllersConfig:
    controllers:
    - config:
        concurrentReconciles: 10
        replicas: 3
      name: clustersync

Please checkout the scaling test script which I found in the Github repo, you can simulate fake clusters by adding the annotation “hive.openshift.io/fake-cluster=true” to your ClusterDeployment.

  • Hibernating clusters

RedHat introduced that you can hibernate (shutdown) clusters in OpenShift 4.5 when they are not needed and switch them easily back on when you need them. This is now possible with OpenShift Hive: you can hibernate and change the power state of a cluster deployment.

$ kubectl patch cd okd --type='merge' -p $'spec:\n powerState: Hibernating'

Checking the cluster deployment and power state change to stopping.

$ kubectl get cd
NAME   PLATFORM   REGION      CLUSTERTYPE   INSTALLED   INFRAID     VERSION   POWERSTATE   AGE
okd    aws        eu-west-1                 true        okd-jpqgb   4.7.0     Stopping     44m

After a couple of minutes the power state of the cluster nodes will change to hibernating.

$ kubectl get cd
NAME   PLATFORM   REGION      CLUSTERTYPE   INSTALLED   INFRAID     VERSION   POWERSTATE    AGE
okd    aws        eu-west-1                 true        okd-jpqgb   4.7.0     Hibernating   47m

In the AWS console you see the cluster instances as stopped.

When turning the cluster back online, change the power state in the cluster deployment to running.

$ kubectl patch cd okd --type='merge' -p $'spec:\n powerState: Running'

Again the power state changes to resuming.

$ kubectl get cd
NAME   PLATFORM   REGION      CLUSTERTYPE   INSTALLED   INFRAID     VERSION   POWERSTATE   AGE
okd    aws        eu-west-1                 true        okd-jpqgb   4.7.0     Resuming     49m

A few minutes later the cluster changes to running and is ready to use again.

$ k get cd
NAME   PLATFORM   REGION      CLUSTERTYPE   INSTALLED   INFRAID     VERSION   POWERSTATE   AGE
okd    aws        eu-west-1                 true        okd-jpqgb   4.7.0     Running      61m
  • Cluster pools

Cluster pools is something which came together with the hibernating feature which allows you to pre-provision OpenShift clusters without actually allocating them and after the provisioning they will hibernate until you claim a cluster. Again a nice feature and ideal use-case for ephemeral type development or integration test environments which allows you to have clusters ready to go to claim when needed and dispose them afterwards.

Create a ClusterPool custom resource which is similar to a cluster deployment.

apiVersion: hive.openshift.io/v1
kind: ClusterPool
metadata:
  name: okd-eu-west-1
  namespace: hive
spec:
  baseDomain: okd.domain.com
  imageSetRef:
    name: okd-4.7-imageset
  installConfigSecretTemplateRef: 
    name: install-config
  skipMachinePools: true
  platform:
    aws:
      credentialsSecretRef:
        name: aws-creds
      region: eu-west-1
  pullSecretRef:
    name: pull-secret
  size: 3

To claim a cluster from a pool, apply the ClusterClaim resource.

apiVersion: hive.openshift.io/v1
kind: ClusterClaim
metadata:
  name: okd-claim
  namespace: hive
spec:
  clusterPoolName: okd-eu-west-1
  lifetime: 8h

I haven’t tested this yet but will definitely start using this in the coming weeks. Have a look at the Hive documentation on using ClusterPool and ClusterClaim.

  • Cluster relocation

For me, having used OpenShift Hive for over one and half years to run OpenShift 4 cluster, this is a very useful functionality because at some point you might need to rebuild or move your management services to a new Hive cluster. The ClusterRelocator object gives you the option to do this.

$ kubectl create secret generic new-hive-cluster-kubeconfig -n hive --from-file=kubeconfig=./new-hive-cluster.kubeconfig

Create the ClusterRelocator object and specify the kubeconfig of the remote Hive cluster, and also add a clusterDeploymentSelector:

apiVersion: hive.openshift.io/v1
kind: ClusterRelocate
metadata:
  name: migrate
spec:
  kubeconfigSecretRef:
    namespace: hive
    name: new-hive-cluster-kubeconfig
  clusterDeploymentSelector:
    matchLabels:
      migrate: cluster

To move cluster deployments, add the label migrate=cluster to your OpenShift clusters you want to move.

$ kubectl label clusterdeployment okd migrate=cluster

The cluster deployment will move to the new Hive cluster and will be removed from the source Hive cluster without the de-provision. It’s important to keep in mind that you need to copy any other resources you need, such as secrets, syncsets, selectorsyncsets and syncidentiyproviders, before moving the clusters. Take a look at the Hive documentation for the exact steps.

  • Useful annotation

Pause SyncSets by adding the annotation “hive.openshift.io/syncset-pause=true” to the clusterdeployment which stops the reconcile of defined resources and great for troubleshooting.

In a cluster deployment you can set the option to preserve cluster on delete which allows the user to disconnect a cluster from Hive without de-provisioning it.

$ kubectl patch cd okd --type='merge' -p $'spec:\n preserveOnDelete: true'

This sums up the new features and functionalities you can use with the latest OpenShift Hive version.