Getting started with OpenShift Hive

If you don’t know OpenShift Hive I recommend having a look at the video of my talk at RedHat OpenShift Commons about OpenShift Hive where I also talk about how you can provision and manage the lifecycle of OpenShift 4 clusters using the Kubernetes API and the OpenShift Hive operator.

The Hive operator has three main components the admission controller,  the Hive controller and the Hive operator itself. For more information about the Hive architecture visit the Hive docs:

You can use an OpenShift or native Kubernetes cluster to run the operator, in my case I use a EKS cluster. Let’s go through the prerequisites which are required to generate the manifests and the hiveutil:

$ curl -s "https://raw.githubusercontent.com/\
> kubernetes-sigs/kustomize/master/hack/install_kustomize.sh"  | bash
$ sudo mv ./kustomize /usr/bin/
$ wget https://dl.google.com/go/go1.13.3.linux-amd64.tar.gz
$ tar -xvf go1.13.3.linux-amd64.tar.gz
$ sudo mv go /usr/local

To setup the Go environment copy the content below and add to your .profile:

export GOPATH="${HOME}/.go"
export PATH="$PATH:/usr/local/go/bin"
export PATH="$PATH:${GOPATH}/bin:${GOROOT}/bin"

Continue with installing the Go dependencies and clone the OpenShift Hive Github repository:

$ mkdir -p ~/.go/src/github.com/openshift/
$ go get github.com/golang/mock/mockgen
$ go get github.com/golang/mock/gomock
$ go get github.com/cloudflare/cfssl/cmd/cfssl
$ go get github.com/cloudflare/cfssl/cmd/cfssljson
$ cd ~/.go/src/github.com/openshift/
$ git clone https://github.com/openshift/hive.git
$ cd hive/
$ git checkout remotes/origin/master

Before we run make deploy I would recommend modifying the Makefile that we only generate the Hive manifests without deploying them to Kubernetes:

$ sed -i -e 's#oc apply -f config/crds# #' -e 's#kustomize build overlays/deploy | oc apply -f -#kustomize build overlays/deploy > hive.yaml#' Makefile
$ make deploy
# The apis-path is explicitly specified so that CRDs are not created for v1alpha1
go run tools/vendor/sigs.k8s.io/controller-tools/cmd/controller-gen/main.go crd --apis-path=pkg/apis/hive/v1
CRD files generated, files can be found under path /home/ubuntu/.go/src/github.com/openshift/hive/config/crds.
go generate ./pkg/... ./cmd/...
hack/update-bindata.sh
# Deploy the operator manifests:
mkdir -p overlays/deploy
cp overlays/template/kustomization.yaml overlays/deploy
cd overlays/deploy && kustomize edit set image registry.svc.ci.openshift.org/openshift/hive-v4.0:hive=registry.svc.ci.openshift.org/openshift/hivev1:hive
kustomize build overlays/deploy > hive.yaml
rm -rf overlays/deploy

Quick look at the content of the hive.yaml manifest:

$ cat hive.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: hive
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: hive-operator
  namespace: hive

...

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    control-plane: hive-operator
    controller-tools.k8s.io: "1.0"
  name: hive-operator
  namespace: hive
spec:
  replicas: 1
  revisionHistoryLimit: 4
  selector:
    matchLabels:
      control-plane: hive-operator
      controller-tools.k8s.io: "1.0"
  template:
    metadata:
      labels:
        control-plane: hive-operator
        controller-tools.k8s.io: "1.0"
    spec:
      containers:
      - command:
        - /opt/services/hive-operator
        - --log-level
        - info
        env:
        - name: CLI_CACHE_DIR
          value: /var/cache/kubectl
        image: registry.svc.ci.openshift.org/openshift/hive-v4.0:hive
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 1
          httpGet:
            path: /debug/health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 10
        name: hive-operator
        resources:
          requests:
            cpu: 100m
            memory: 256Mi
        volumeMounts:
        - mountPath: /var/cache/kubectl
          name: kubectl-cache
      serviceAccountName: hive-operator
      terminationGracePeriodSeconds: 10
      volumes:
      - emptyDir: {}
        name: kubectl-cache

Now we can apply the Hive custom resource definition (crds):

$ kubectl apply -f ./config/crds/
customresourcedefinition.apiextensions.k8s.io/checkpoints.hive.openshift.io created
customresourcedefinition.apiextensions.k8s.io/clusterdeployments.hive.openshift.io created
customresourcedefinition.apiextensions.k8s.io/clusterdeprovisions.hive.openshift.io created
customresourcedefinition.apiextensions.k8s.io/clusterimagesets.hive.openshift.io created
customresourcedefinition.apiextensions.k8s.io/clusterprovisions.hive.openshift.io created
customresourcedefinition.apiextensions.k8s.io/clusterstates.hive.openshift.io created
customresourcedefinition.apiextensions.k8s.io/dnszones.hive.openshift.io created
customresourcedefinition.apiextensions.k8s.io/hiveconfigs.hive.openshift.io created
customresourcedefinition.apiextensions.k8s.io/machinepools.hive.openshift.io created
customresourcedefinition.apiextensions.k8s.io/selectorsyncidentityproviders.hive.openshift.io created
customresourcedefinition.apiextensions.k8s.io/selectorsyncsets.hive.openshift.io created
customresourcedefinition.apiextensions.k8s.io/syncidentityproviders.hive.openshift.io created
customresourcedefinition.apiextensions.k8s.io/syncsets.hive.openshift.io created
customresourcedefinition.apiextensions.k8s.io/syncsetinstances.hive.openshift.io created

And continue to apply the hive.yaml manifest for deploying the OpenShift Hive operator and its components:

$ kubectl apply -f hive.yaml
namespace/hive created
serviceaccount/hive-operator created
clusterrole.rbac.authorization.k8s.io/hive-frontend created
clusterrole.rbac.authorization.k8s.io/hive-operator-role created
clusterrole.rbac.authorization.k8s.io/manager-role created
clusterrole.rbac.authorization.k8s.io/system:openshift:hive:hiveadmission created
rolebinding.rbac.authorization.k8s.io/extension-server-authentication-reader-hiveadmission created
clusterrolebinding.rbac.authorization.k8s.io/auth-delegator-hiveadmission created
clusterrolebinding.rbac.authorization.k8s.io/hive-frontend created
clusterrolebinding.rbac.authorization.k8s.io/hive-operator-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/hiveadmission-hive-hiveadmission created
clusterrolebinding.rbac.authorization.k8s.io/hiveapi-cluster-admin created
clusterrolebinding.rbac.authorization.k8s.io/manager-rolebinding created
deployment.apps/hive-operator created

For the Hive admission controller you need to generate a SSL certifcate:

$ ./hack/hiveadmission-dev-cert.sh
~/Dropbox/hive/hiveadmission-certs ~/Dropbox/hive
2020/02/03 22:17:30 [INFO] generate received request
2020/02/03 22:17:30 [INFO] received CSR
2020/02/03 22:17:30 [INFO] generating key: ecdsa-256
2020/02/03 22:17:30 [INFO] encoded CSR
certificatesigningrequest.certificates.k8s.io/hiveadmission.hive configured
certificatesigningrequest.certificates.k8s.io/hiveadmission.hive approved
-----BEGIN CERTIFICATE-----
MIICaDCCAVCgAwIBAgIQHvvDPncIWHRcnDzzoWGjQDANBgkqhkiG9w0BAQsFADAv
MS0wKwYDVQQDEyRiOTk2MzhhNS04OWQyLTRhZTAtYjI4Ny1iMWIwOGNmOGYyYjAw
HhcNMjAwMjAzMjIxNTA3WhcNMjUwMjAxMjIxNTA3WjAhMR8wHQYDVQQDExZoaXZl
YWRtaXNzaW9uLmhpdmUuc3ZjMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEea4N
UPbvzM3VdtOkdJ7lBytekRTvwGMqs9HgG14CtqCVCOFq8f+BeqqyrRbJsX83iBfn
gMc54moElb5kIQNjraNZMFcwDAYDVR0TAQH/BAIwADBHBgNVHREEQDA+ghZoaXZl
YWRtaXNzaW9uLmhpdmUuc3ZjgiRoaXZlYWRtaXNzaW9uLmhpdmUuc3ZjLmNsdXN0
ZXIubG9jYWwwDQYJKoZIhvcNAQELBQADggEBADhgT3tNnFs6hBIZFfWmoESe6nnZ
fy9GmlmF9qEBo8FZSk/LYvV0peOdgNZCHqsT2zaJjxULqzQ4zfSb/koYpxeS4+Bf
xwgHzIB/ylzf54wVkILWUFK3GnYepG5dzTXS7VHc4uiNJe0Hwc5JI4HBj7XdL3C7
cbPm7T2cBJi2jscoCWELWo/0hDxkcqZR7rdeltQQ+Uhz87LhTTqlknAMFzL7tM/+
pJePZMQgH97vANsbk97bCFzRZ4eABYSiN0iAB8GQM5M+vK33ZGSVQDJPKQQYH6th
Kzi9wrWEeyEtaWozD5poo9s/dxaLxFAdPDICkPB2yr5QZB+NuDgA+8IYffo=
-----END CERTIFICATE-----
secret/hiveadmission-serving-cert created
~/Dropbox/hive

Afterwards we can check if all the pods are running, this might take a few seconds:

$ kubectl get pods -n hive
NAME                                READY   STATUS    RESTARTS   AGE
hive-controllers-7c6ccc84b9-q7k7m   1/1     Running   0          31s
hive-operator-f9f4447fd-jbmkh       1/1     Running   0          55s
hiveadmission-6766c5bc6f-9667g      1/1     Running   0          27s
hiveadmission-6766c5bc6f-gvvlq      1/1     Running   0          27s

The Hive operator is successfully installed on your Kubernetes cluster but we are not finished yet. To create the required Cluster Deployment manifests we need to generate the hiveutil binary:

$ make hiveutil
go generate ./pkg/... ./cmd/...
hack/update-bindata.sh
go build -o bin/hiveutil github.com/openshift/hive/contrib/cmd/hiveutil

To generate Hive Cluster Deployment manifests just run the following hiveutil command below, I output the definition with -o into yaml:

$ bin/hiveutil create-cluster --base-domain=mydomain.example.com --cloud=aws mycluster -o yaml
apiVersion: v1
items:
- apiVersion: hive.openshift.io/v1
  kind: ClusterImageSet
  metadata:
    creationTimestamp: null
    name: mycluster-imageset
  spec:
    releaseImage: quay.io/openshift-release-dev/ocp-release:4.3.2-x86_64
  status: {}
- apiVersion: v1
  kind: Secret
  metadata:
    creationTimestamp: null
    name: mycluster-aws-creds
  stringData:
    aws_access_key_id: <-YOUR-AWS-ACCESS-KEY->
    aws_secret_access_key: <-YOUR-AWS-SECRET-KEY->
  type: Opaque
- apiVersion: v1
  data:
    install-config.yaml: <-BASE64-ENCODED-OPENSHIFT4-INSTALL-CONFIG->
  kind: Secret
  metadata:
    creationTimestamp: null
    name: mycluster-install-config
  type: Opaque
- apiVersion: hive.openshift.io/v1
  kind: ClusterDeployment
  metadata:
    creationTimestamp: null
    name: mycluster
  spec:
    baseDomain: mydomain.example.com
    clusterName: mycluster
    controlPlaneConfig:
      servingCertificates: {}
    installed: false
    platform:
      aws:
        credentialsSecretRef:
          name: mycluster-aws-creds
        region: us-east-1
    provisioning:
      imageSetRef:
        name: mycluster-imageset
      installConfigSecretRef:
        name: mycluster-install-config
  status:
    clusterVersionStatus:
      availableUpdates: null
      desired:
        force: false
        image: ""
        version: ""
      observedGeneration: 0
      versionHash: ""
- apiVersion: hive.openshift.io/v1
  kind: MachinePool
  metadata:
    creationTimestamp: null
    name: mycluster-worker
  spec:
    clusterDeploymentRef:
      name: mycluster
    name: worker
    platform:
      aws:
        rootVolume:
          iops: 100
          size: 22
          type: gp2
        type: m4.xlarge
    replicas: 3
  status:
    replicas: 0
kind: List
metadata: {}

I hope this post is useful in getting you started with OpenShift Hive. In my next article I will go through the details of the OpenShift 4 cluster deployment with Hive.

Read my new article about OpenShift / OKD 4.x Cluster Deployment using OpenShift Hive

OpenShift Hive – API driven OpenShift cluster provisioning and management operator

RedHat invited me and my colleague Matt to speak at RedHat OpenShift Commons in London about the API driven OpenShift cluster provisioning and management operator called OpenShift Hive. We have been using OpenShift Hive for the past few months to provision and manage the OpenShift 4 estate across multiple environments. Below the video recording of our talk at OpenShift Commons London:

The Hive operator requires to run on a separate Kubernetes cluster to centrally provision and manage the OpenShift 4 clusters. With Hive you can manage hundreds of cluster deployments and configuration with a single operator. There is nothing required on the OpenShift 4 clusters itself, Hive only requires access to the cluster API:

The ClusterDeployment custom resource is the definition for the cluster specs, similar to the openshift-installer install-config where you define cluster specifications, cloud credential and image pull secrets. Below is an example of the ClusterDeployment manifest:

---
apiVersion: hive.openshift.io/v1
kind: ClusterDeployment
metadata:
  name: mycluster
  namespace: mynamespace
spec:
  baseDomain: hive.example.com
  clusterName: mycluster
  platform:
    aws:
      credentialsSecretRef:
        name: mycluster-aws-creds
      region: eu-west-1
  provisioning:
    imageSetRef:
      name: openshift-v4.3.0
    installConfigSecretRef:
      name: mycluster-install-config
    sshPrivateKeySecretRef:
      name: mycluster-ssh-key
  pullSecretRef:
    name: mycluster-pull-secret

The SyncSet custom resource is defining the configuration and is able to regularly reconcile the manifests to keep all clusters synchronised. With SyncSets you can apply resources and patches as you see in the example below:

---
apiVersion: hive.openshift.io/v1
kind: SyncSet
metadata:
  name: mygroup
spec:
  clusterDeploymentRefs:
  - name: ClusterName
  resourceApplyMode: Upsert
  resources:
  - apiVersion: user.openshift.io/v1
    kind: Group
    metadata:
      name: mygroup
    users:
    - myuser
  patches:
  - kind: ConfigMap
    apiVersion: v1
    name: foo
    namespace: default
    patch: |-
      { "data": { "foo": "new-bar" } }
    patchType: merge
  secretReferences:
  - source:
      name: ad-bind-password
      namespace: default
    target:
      name: ad-bind-password
      namespace: openshift-config

Depending of the amount of resource and patches you want to apply, a SyncSet can get pretty large and is not very easy to manage. My colleague Matt wrote a SyncSet Generator, please check this Github repository.

In one of my next articles I will go into more detail on how to deploy OpenShift Hive and I’ll provide more examples of how to use ClusterDeployment and SyncSets. In the meantime please check out the OpenShift Hive repository for more details, additionally here are links to the Hive documentation on using Hive and Syncsets.

Read my new article about installing OpenShift Hive.

How to manage Kubernetes clusters the GitOps way with Flux CD

Kubernetes is becoming more and more popular, and so is managing clusters at scale. This article is about how to manage Kubernetes clusters the GitOps way using the Flux CD operator.

Flux can monitor container image and code repositories that you specify and trigger deployments to automatically change the configuration state of your Kubernetes cluster. The cluster configuration is centrally managed and stored in declarative form in Git, and there is no need for an administrator to manually apply manifests, the Flux operator synchronise to apply or delete the cluster configuration.

Before we start deploying the operator we need to install the fluxctl command-line utility and create the namespace:

sudo wget -O /usr/local/bin/fluxctl https://github.com/fluxcd/flux/releases/download/1.18.0/fluxctl_linux_amd64
sudo chmod 755 /usr/local/bin/fluxctl
kubectl create ns flux

Deploying the Flux operator is straight forward and requires a few options like git repository and git path. The path is important for my example because it tells the operator in which folder to look for manifests:

$ fluxctl install [email protected] [email protected]:berndonline/flux-cd.git --git-path=clusters/gke,common/stage --manifest-generation=true --git-branch=master --namespace=flux --registry-disable-scanning | kubectl apply -f -
deployment.apps/memcached created
service/memcached created
serviceaccount/flux created
clusterrole.rbac.authorization.k8s.io/flux created
clusterrolebinding.rbac.authorization.k8s.io/flux created
deployment.apps/flux created
secret/flux-git-deploy created

After you have applied the configuration, wait until the Flux pods are up and running:

$ kubectl get pods -n flux
NAME                       READY   STATUS    RESTARTS   AGE
flux-85cd9cd746-hnb4f      1/1     Running   0          74m
memcached-5dcd7579-d6vwh   1/1     Running   0          20h

The last step is to get the Flux operator deploy keys and copy the output to add to your Git repository:

fluxctl identity --k8s-fwd-ns flux

Now you are ready to synchronise the Flux operator with the repository. By default Flux automatically synchronises every 5 minutes to apply configuration changes:

$ fluxctl sync --k8s-fwd-ns flux
Synchronizing with [email protected]:berndonline/flux-cd.git
Revision of master to apply is 726944d
Waiting for 726944d to be applied ...
Done.

You are able to list workloads which are managed by the Flux operator:

$ fluxctl list-workloads --k8s-fwd-ns=flux -a
WORKLOAD                             CONTAINER         IMAGE                            RELEASE  POLICY
default:deployment/hello-kubernetes  hello-kubernetes  paulbouwer/hello-kubernetes:1.5  ready    automated

How do we manage the configuration for multiple Kubernetes clusters?

I want to show you a simple example using Kustomize to manage multiple clusters across two environments (staging and production) with Flux. Basically you have a single repository and multiple clusters synchronising the configuration depending how you configure the –git-path variable of the Flux operator. The option –manifest-generation enables Kustomize for the operator and it is required to add a .flux.yaml to run Kustomize build on the cluster directories and to apply the generated manifests.

Let’s look at the repository file and folder structure. We have the base folder containing the common deployment configuration, the common folder with the environment separation for stage and prod overlays and the clusters folder which contains more cluster specific configuration:

├── .flux.yaml 
├── base
│   └── common
│       ├── deployment.yaml
│       ├── kustomization.yaml
│       ├── namespace.yaml
│       └── service.yaml
├── clusters
│   ├── eks
|   |   ├── eks-app1
│   │   |   ├── deployment.yaml
|   |   |   ├── kustomization.yaml
│   │   |   └── service.yaml
|   |   └── kustomization.yaml
│   ├── gke
|   |   ├── gke-app1
│   │   |   ├── deployment.yaml
|   |   |   ├── kustomization.yaml
│   │   |   └── service.yaml
|   |   ├── gke-app2
│   │   |   ├── deployment.yaml
|   |   |   ├── kustomization.yaml
│   │   |   └── service.yaml
|   |   └── kustomization.yaml
└── common
    ├── prod
    |   ├── prod.yaml
    |   └── kustomization.yaml
    └── stage
        ├──  team1
        |    ├── deployment.yaml
        |    ├── kustomization.yaml
        |    ├── namespace.yaml
        |    └── service.yaml
        ├── stage.yaml
        └── kustomization.yaml

If you are new to Kustomize I would recommend reading the article Kustomize – The right way to do templating in Kubernetes.

The last thing we need to do is to deploy the Flux operator to the two Kubernetes clusters. The only difference between both is the git-path variable which points the operator to the cluster and common directories were Kustomize applies the overlays based what is specified in kustomize.yaml. More details about the configuration you find in my example repository: https://github.com/berndonline/flux-cd

Flux config for Google GKE staging cluster:

fluxctl install [email protected] [email protected]:berndonline/flux-cd.git --git-path=clusters/gke,common/stage --manifest-generation=true --git-branch=master --namespace=flux | kubectl apply -f -

Flux config for Amazon EKS production cluster:

fluxctl install [email protected] [email protected]:berndonline/flux-cd.git --git-path=clusters/eks,common/prod --manifest-generation=true --git-branch=master --namespace=flux | kubectl apply -f -

After a few minutes the configuration is applied to the two clusters and you can validate the configuration.

Google GKE stage workloads:

$ fluxctl list-workloads --k8s-fwd-ns=flux -a
WORKLOAD                   CONTAINER         IMAGE                            RELEASE  POLICY
common:deployment/common   hello-kubernetes  paulbouwer/hello-kubernetes:1.5  ready    automated
default:deployment/gke1    hello-kubernetes  paulbouwer/hello-kubernetes:1.5  ready    
default:deployment/gke2    hello-kubernetes  paulbouwer/hello-kubernetes:1.5  ready    
team1:deployment/team1     hello-kubernetes  paulbouwer/hello-kubernetes:1.5  ready
$ kubectl get svc --all-namespaces | grep LoadBalancer
common        common                 LoadBalancer   10.91.14.186   35.240.53.46     80:31537/TCP    16d
default       gke1                   LoadBalancer   10.91.7.169    35.195.241.46    80:30218/TCP    16d
default       gke2                   LoadBalancer   10.91.10.239   35.195.144.68    80:32589/TCP    16d
team1         team1                  LoadBalancer   10.91.1.178    104.199.107.56   80:31049/TCP    16d

GKE common stage application:

Amazon EKS prod workloads:

$ fluxctl list-workloads --k8s-fwd-ns=flux -a
WORKLOAD                          CONTAINER         IMAGE                                                                RELEASE  POLICY
common:deployment/common          hello-kubernetes  paulbouwer/hello-kubernetes:1.5                                      ready    automated
default:deployment/eks1           hello-kubernetes  paulbouwer/hello-kubernetes:1.5                                      ready
$ kubectl get svc --all-namespaces | grep LoadBalancer
common        common       LoadBalancer   10.100.254.171   a4caafcbf2b2911ea87370a71555111a-958093179.eu-west-1.elb.amazonaws.com    80:32318/TCP    3m8s
default       eks1         LoadBalancer   10.100.170.10    a4caeada52b2911ea87370a71555111a-1261318311.eu-west-1.elb.amazonaws.com   80:32618/TCP    3m8s

EKS common prod application:

I hope this article is useful to get started with GitOps and the Flux operator. In the future, I would like to see Flux being able to watch git tags which will make it easier to promote changes and manage clusters with version tags.

For more technical information have a look at the Flux CD documentation.

How to validate OpenShift using Ansible

You might have seen my previous article about the OpenShift troubleshooting guide. With this blog post I want to show how to validate that an OpenShift container platform is fully functional. This is less of an issue when you do a fresh install but becomes more important when you apply changes or do in-place upgrades your cluster. And because we all like automation; I show how to do this with Ansible in an automated way.

Let’s jump right into it and look at the different steps. Here the links to the Ansible role and playbook for more details:

  • Prepare workspace – create a temp directories and copy admin.kubeconfig there.
  • Check node state – run command to check for Not Ready nodes:
  • oc get nodes --no-headers=true | grep -v ' Ready' | true
    
  • Check node scheduling – run command to check for nodes where scheduling is disabled:
  • oc get nodes --no-headers=true | grep 'SchedulingDisabled' | awk '{ print $1 }'
    
  • Check master certificates – check validity of master API, controller and etcd certificates:
  • cat /etc/origin/master/ca.crt | openssl x509 -text | grep -i Validity -A2
    cat /etc/origin/master/master.server.crt | openssl x509 -text | grep -i Validity -A2
    cat /etc/origin/master/admin.crt | openssl x509 -text | grep -i Validity -A2
    cat /etc/etcd/ca.crt | openssl x509 -text | grep -i Validity -A2
    
  • Check nodes certificates – check validity of worker node certificate. This needs to run on all compute and infra nodes, not masters:
  • cat /etc/origin/node/server.crt | openssl x509 -text | grep -i Validity -A2
    
  • Check etcd health – run command for cluster health check:
  • /usr/bin/etcdctl --cert-file /etc/etcd/peer.crt --key-file /etc/etcd/peer.key --ca-file /etc/etcd/ca.crt -C https://{{ hostname.stdout }}:2379 cluster-health
    
  • Check important default projects for failed pods – look out for failed pods:
  • oc get pods -o wide --no-headers=true -n {{ item }} | grep -v " Running\|Completed" || true
    
  • Check registry health – run GET docker-registry healthz path and expect http 200:
  • curl -kv https://{{ registry_ip.stdout }}/healthz
    
  • Check SkyDNS resolution – try to resolve internal hostnames:
  • nslookup docker-registry.default.svc.cluster.local
    nslookup docker-registry.default.svc
    
  • Check upstream DNS resolution – try and resolve cluster external dns names.
  • Create test project
  • Run persistent volume test – create busybox container and claim volume
    1. apply imagestream, deploymentconfig and persistent volume claim configuration
    2. synchronise testfile to container pv
    3. check content of testfile
    4. delete testfile
  • Run test build – run the following steps to create multiple application pods on the OpenShift cluster:
    1. apply buildconfig, imagestream, deploymentconfig, service and route configuration
    2. check pods are running
    3. get route hostnames
    4. connect to routes and show output
    5. trigger new-build and check new build is created
    6. check if pods are running
    7. connect to routes and show output
  • Delete test project
  • Delete workspace folder – at the end the validation the role deletes the temporary folder and all its contents.

Next we need to run the playbook and see the output below:

PLAY [Check OpenShift cluster installation] ***********************************************************************************************************************

TASK [check : make temp directory] ********************************************************************************************************************************
changed: [master1]

TASK [check : create temp directory] ******************************************************************************************************************************
ok: [master1]

TASK [check : create template folder] *****************************************************************************************************************************
changed: [master1]

TASK [check : create test file folder] ****************************************************************************************************************************
changed: [master1]

TASK [check : get hostname] ***************************************************************************************************************************************
changed: [master1]

TASK [check : copy admin config] **********************************************************************************************************************************
ok: [master1]

TASK [check : check for not ready nodes] **************************************************************************************************************************
changed: [master1]

TASK [check : not ready nodes] ************************************************************************************************************************************
ok: [master1] => {
    "notready.stdout_lines": []
}

TASK [check : check for scheduling disabled nodes] ****************************************************************************************************************
changed: [master1]

TASK [check : scheduling disabled nodes] **************************************************************************************************************************
ok: [master1] => {
    "schedulingdisabled.stdout_lines": []
}

TASK [check : validate master certificates] ***********************************************************************************************************************
changed: [master1] => (item=cat /etc/origin/master/ca.crt | openssl x509 -text | grep -i Validity -A2)
changed: [master1] => (item=cat /etc/origin/master/master.server.crt | openssl x509 -text | grep -i Validity -A2)
changed: [master1] => (item=cat /etc/origin/master/admin.crt | openssl x509 -text | grep -i Validity -A2)
changed: [master1] => (item=cat /etc/etcd/ca.crt | openssl x509 -text | grep -i Validity -A2)

TASK [check : ca certificate] *************************************************************************************************************************************
ok: [master1] => {
    "msg": [
        [
            "        Validity",
            "            Not Before: Jan 31 17:13:52 2019 GMT",
            "            Not After : Jan 30 17:13:53 2024 GMT"
        ],
        [
            "        Validity",
            "            Not Before: Jan 31 17:13:53 2019 GMT",
            "            Not After : Jan 30 17:13:54 2021 GMT"
        ],
        [
            "        Validity",
            "            Not Before: Jan 31 17:13:53 2019 GMT",
            "            Not After : Jan 30 17:13:54 2021 GMT"
        ],
        [
            "        Validity",
            "            Not Before: Jan 31 17:11:09 2019 GMT",
            "            Not After : Jan 30 17:11:09 2024 GMT"
        ]
    ]
}

TASK [check : check etcd state] ***********************************************************************************************************************************
changed: [master1]

TASK [check : show etcd state] ************************************************************************************************************************************
ok: [master1] => {
    "etcdstate.stdout_lines": [
        "member 335450512aab5650 is healthy: got healthy result from https://172.26.7.132:2379",
        "cluster is healthy"
    ]
}

TASK [check : check default openshift-infra and logging projects for failed pods] *********************************************************************************
ok: [master1] => (item=default)
ok: [master1] => (item=kube-system)
ok: [master1] => (item=kube-service-catalog)
ok: [master1] => (item=openshift-logging)
ok: [master1] => (item=openshift-infra)
ok: [master1] => (item=openshift-console)
ok: [master1] => (item=openshift-web-console)
ok: [master1] => (item=openshift-monitoring)
ok: [master1] => (item=openshift-node)
ok: [master1] => (item=openshift-sdn)

TASK [check : failed failedpods] **********************************************************************************************************************************
ok: [master1] => {
    "msg": [
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        []
    ]
}

TASK [check : get container registry ip] **************************************************************************************************************************
changed: [master1]

TASK [check : check container registry health] ********************************************************************************************************************
ok: [master1]

TASK [check : check internal SysDNS resolution for cluster.local] *************************************************************************************************
changed: [master1] => (item=docker-registry.default.svc.cluster.local)
changed: [master1] => (item=docker-registry.default.svc)

TASK [check : check external DNS upstream resolution] *************************************************************************************************************
changed: [master1] => (item=www.google.com)
changed: [master1] => (item=www.google.co.uk)
changed: [master1] => (item=www.google.de)

TASK [check : create test project] ********************************************************************************************************************************
changed: [master1]

TASK [check : run test persistent volume] *************************************************************************************************************************
included: /var/jenkins_home/workspace/openshift/ansible/roles/check/tasks/pv.yml for master1

TASK [check : create sequence number list for pv] *****************************************************************************************************************
ok: [master1] => (item=1)
ok: [master1] => (item=2)
ok: [master1] => (item=3)
ok: [master1] => (item=4)
ok: [master1] => (item=5)

TASK [check : copy build templates] *******************************************************************************************************************************
changed: [master1] => (item={u'dest': u'busybox.yml', u'src': u'busybox.j2'})
changed: [master1] => (item={u'dest': u'pv.yml', u'src': u'pv.j2'})

TASK [check : copy testfile] **************************************************************************************************************************************
changed: [master1]

TASK [check : create pvs] *****************************************************************************************************************************************
ok: [master1]

TASK [check : deploy busybox pod] *********************************************************************************************************************************
changed: [master1]

TASK [check : check if all pods are running] **********************************************************************************************************************
FAILED - RETRYING: check if all pods are running (10 retries left).
changed: [master1] => (item=busybox)

TASK [check : get busybox pod name] *******************************************************************************************************************************
changed: [master1]

TASK [check : sync testfile to pod] *******************************************************************************************************************************
changed: [master1]

TASK [check : check testfile in pv] *******************************************************************************************************************************
changed: [master1]

TASK [check : delete testfile in pv] ******************************************************************************************************************************
changed: [master1]

TASK [check : run test build] *************************************************************************************************************************************
included: /var/jenkins_home/workspace/openshift/ansible/roles/check/tasks/build.yml for master1

TASK [check : create sequence number list for hello openshift] ****************************************************************************************************
ok: [master1] => (item=0)
ok: [master1] => (item=1)
ok: [master1] => (item=2)
ok: [master1] => (item=3)
ok: [master1] => (item=4)
ok: [master1] => (item=5)
ok: [master1] => (item=6)
ok: [master1] => (item=7)
ok: [master1] => (item=8)
ok: [master1] => (item=9)

TASK [check : create pod list] ************************************************************************************************************************************
ok: [master1] => (item=[u'0', {u'svc': u'http'}])
ok: [master1] => (item=[u'1', {u'svc': u'http'}])
ok: [master1] => (item=[u'2', {u'svc': u'http'}])
ok: [master1] => (item=[u'3', {u'svc': u'http'}])
ok: [master1] => (item=[u'4', {u'svc': u'http'}])
ok: [master1] => (item=[u'5', {u'svc': u'http'}])
ok: [master1] => (item=[u'6', {u'svc': u'http'}])
ok: [master1] => (item=[u'7', {u'svc': u'http'}])
ok: [master1] => (item=[u'8', {u'svc': u'http'}])
ok: [master1] => (item=[u'9', {u'svc': u'http'}])

TASK [check : copy build templates] *******************************************************************************************************************************
changed: [master1] => (item={u'dest': u'hello-openshift.yml', u'src': u'hello-openshift.j2'})

TASK [check : deploy hello openshift pods] ************************************************************************************************************************
changed: [master1]

TASK [check : check if all pods are running] **********************************************************************************************************************
FAILED - RETRYING: check if all pods are running (10 retries left).
changed: [master1] => (item={u'name': u'hello-http-0'})
FAILED - RETRYING: check if all pods are running (10 retries left).
changed: [master1] => (item={u'name': u'hello-http-1'})
changed: [master1] => (item={u'name': u'hello-http-2'})
changed: [master1] => (item={u'name': u'hello-http-3'})
changed: [master1] => (item={u'name': u'hello-http-4'})
changed: [master1] => (item={u'name': u'hello-http-5'})
changed: [master1] => (item={u'name': u'hello-http-6'})
changed: [master1] => (item={u'name': u'hello-http-7'})
changed: [master1] => (item={u'name': u'hello-http-8'})
changed: [master1] => (item={u'name': u'hello-http-9'})

TASK [check : get hello openshift pod hostnames] ******************************************************************************************************************
changed: [master1]

TASK [check : convert check_route string to json] *****************************************************************************************************************
ok: [master1]

TASK [check : set query to get pod hostname] **********************************************************************************************************************
ok: [master1]

TASK [check : get hostname list] **********************************************************************************************************************************
ok: [master1]

TASK [check : connect to route via curl] **************************************************************************************************************************
changed: [master1] => (item=hello-http-0-test.paas.domain.com)
changed: [master1] => (item=hello-http-1-test.paas.domain.com)
changed: [master1] => (item=hello-http-2-test.paas.domain.com)
changed: [master1] => (item=hello-http-3-test.paas.domain.com)
changed: [master1] => (item=hello-http-4-test.paas.domain.com)
changed: [master1] => (item=hello-http-5-test.paas.domain.com)
changed: [master1] => (item=hello-http-6-test.paas.domain.com)
changed: [master1] => (item=hello-http-7-test.paas.domain.com)
changed: [master1] => (item=hello-http-8-test.paas.domain.com)
changed: [master1] => (item=hello-http-9-test.paas.domain.com)

TASK [check : set json query] *************************************************************************************************************************************
ok: [master1]

TASK [check : show route http response] ***************************************************************************************************************************
ok: [master1] => {
    "msg": [
        "hello-http-0",
        "hello-http-1",
        "hello-http-2",
        "hello-http-3",
        "hello-http-4",
        "hello-http-5",
        "hello-http-6",
        "hello-http-7",
        "hello-http-8",
        "hello-http-9"
    ]
}

TASK [check : trigger new build] **********************************************************************************************************************************
changed: [master1]

TASK [check : check new build is created] *************************************************************************************************************************
changed: [master1]

TASK [check : check if all pods with new build are running] *******************************************************************************************************
FAILED - RETRYING: check if all pods with new build are running (10 retries left).
FAILED - RETRYING: check if all pods with new build are running (9 retries left).
changed: [master1] => (item={u'name': u'hello-http-0'})
changed: [master1] => (item={u'name': u'hello-http-1'})
changed: [master1] => (item={u'name': u'hello-http-2'})
changed: [master1] => (item={u'name': u'hello-http-3'})
changed: [master1] => (item={u'name': u'hello-http-4'})
changed: [master1] => (item={u'name': u'hello-http-5'})
changed: [master1] => (item={u'name': u'hello-http-6'})
changed: [master1] => (item={u'name': u'hello-http-7'})
changed: [master1] => (item={u'name': u'hello-http-8'})
changed: [master1] => (item={u'name': u'hello-http-9'})

TASK [check : connect to route via curl] **************************************************************************************************************************
changed: [master1] => (item=hello-http-0-test.paas.domain.com)
changed: [master1] => (item=hello-http-1-test.paas.domain.com)
changed: [master1] => (item=hello-http-2-test.paas.domain.com)
changed: [master1] => (item=hello-http-3-test.paas.domain.com)
changed: [master1] => (item=hello-http-4-test.paas.domain.com)
changed: [master1] => (item=hello-http-5-test.paas.domain.com)
changed: [master1] => (item=hello-http-6-test.paas.domain.com)
changed: [master1] => (item=hello-http-7-test.paas.domain.com)
changed: [master1] => (item=hello-http-8-test.paas.domain.com)
changed: [master1] => (item=hello-http-9-test.paas.domain.com)

TASK [check : show route http response] ***************************************************************************************************************************
ok: [master1] => {
    "msg": [
        "hello-http-0",
        "hello-http-1",
        "hello-http-2",
        "hello-http-3",
        "hello-http-4",
        "hello-http-5",
        "hello-http-6",
        "hello-http-7",
        "hello-http-8",
        "hello-http-9"
    ]
}

TASK [check : delete test project] ********************************************************************************************************************************
changed: [master1]

TASK [check : delete temp directory] ******************************************************************************************************************************
ok: [master1]

PLAY RECAP ********************************************************************************************************************************************************
master1                : ok=52   changed=30   unreachable=0    failed=0

The cluster validation playbook successfully finished without errors and this is just a simple way to do a basic check of your OpenShift platform. Check out the OpenShift documentation about environment_health_checks.

Deploy OpenShift 3.11 Container Platform on Google Cloud Platform using Terraform

Over the past few days I have converted the OpenShift 3.11 infrastructure on Amazon AWS to run on Google Cloud Platform. I have kept the similar VPC network layout and instances to run OpenShift.

Before you start you need to create a project on Google Cloud Platform, then continue to create the service account and generate the private key and download the credential as JSON file.

Create the new project:

Create the service account:

Give the service account compute admin and storage object creator permissions:

Then create a storage bucket for the Terraform backend state and assign the correct bucket permission to the terraform service account:

Bucket permissions:

To start, clone my openshift-terraform github repository and checkout the google-dev branch:

git clone https://github.com/berndonline/openshift-terraform.git
cd ./openshift-terraform/ && git checkout google-dev

Add your previously downloaded credentials json file:

cat << EOF > ./credentials.json
{
  "type": "service_account",
  "project_id": "<--your-project-->",
  "private_key_id": "<--your-key-id-->",
  "private_key": "-----BEGIN PRIVATE KEY-----

...

}
EOF

There are a few things you need to modify in the main.tf and variables.tf before you can start:

...
terraform {
  backend "gcs" {
    bucket    = "<--your-bucket-name-->"
    prefix    = "openshift-311"
    credentials = "credentials.json"
  }
}
...
...
variable "gcp_region" {
  description = "Google Compute Platform region to launch servers."
  default     = "europe-west3"
}
variable "gcp_project" {
  description = "Google Compute Platform project name."
  default     = "<--your-project-name-->"
}
variable "gcp_zone" {
  type = "string"
  default = "europe-west3-a"
  description = "The zone to provision into"
}
...

Add the needed environment variables to apply changes to CloudFlare DNS:

export TF_VAR_email='<-YOUR-CLOUDFLARE-EMAIL-ADDRESS->'
export TF_VAR_token='<-YOUR-CLOUDFLARE-TOKEN->'
export TF_VAR_domain='<-YOUR-CLOUDFLARE-DOMAIN->'
export TF_VAR_htpasswd='<-YOUR-OPENSHIFT-DEMO-USER-HTPASSWD->'

Let’s start creating the infrastructure and verify afterwards the created resources on GCP.

terraform init && terraform apply -auto-approve

VPC and public and private subnets in region europe-west3:

Created instances:

Created load balancers for master and infra nodes:

Copy the ssh key and ansible-hosts file to the bastion host from where you need to run the Ansible OpenShift playbooks.

scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -r ./helper_scripts/id_rsa [email protected]$(terraform output bastion):/home/centos/.ssh/
scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -r ./inventory/ansible-hosts  [email protected]$(terraform output bastion):/home/centos/ansible-hosts

I recommend waiting a few minutes as the cloud-init script prepares the bastion host. Afterwards continue with the pre and install playbooks. You can connect to the bastion host and run the playbooks directly.

ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -l centos $(terraform output bastion) -A "cd /openshift-ansible/ && ansible-playbook ./playbooks/openshift-pre.yml -i ~/ansible-hosts"
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -l centos $(terraform output bastion) -A "cd /openshift-ansible/ && ansible-playbook ./playbooks/openshift-install.yml -i ~/ansible-hosts"

After the installation is completed, continue to create your project and applications:

When you are finished with the testing, run terraform destroy.

terraform destroy -force 

Please share your feedback and leave a comment.