How to backup OpenShift with Heptio Velero(Ark)

I have found an interesting open source tool called Heptio Velero previously known as Heptio Ark which is able to backup Kubernetes and OpenShift container platforms. The tool mainly does this via the API and backup namespace objects and additionally is able to create snapshots for PVs on Azure, AWS and GCP.

The user uses the ark command line utility to create and restore backups.

The installation on Velero is super simple, just follow the steps below:

# Download and extract the latest Velero release from github
wget https://github.com/heptio/velero/releases/download/v0.10.1/ark-v0.10.1-linux-amd64.tar.gz
tar -xzf ark-v0.10.1-linux-amd64.tar.gz -c ./velero/

# Move the ark binary to somewhere in your PATH
mv ./velero/ark /usr/sbin/

# The last two commands create namespace and applies configuration
oc create -f ./velero/config/common/00-prereqs.yaml
oc create -f ./velero/config/minio/

You can expose Minio to access the web console from the outside.

# Create route
oc expose service minio

# View access and secret key to login via the web console
oc describe deployment.apps/minio | grep -i Environment -A2
    Environment:
      MINIO_ACCESS_KEY:  minio
      MINIO_SECRET_KEY:  minio123

Here a few command options on how to backup objects:

# Create a backup for any object that matches the app=pod label selector:
ark backup create <backup-name> --selector <key>=<value> 

# Alternatively if you want to backup all objects except those matching the label backup=ignore:
ark backup create <backup-name> --selector 'backup notin (ignore)'

# Create regularly scheduled backups based on a cron expression using the app=pod label selector:
ark schedule create <backup-name> --schedule="0 1 * * *" --selector <key>=<value>

# Create a backup for a namespace:
ark backup create <backup-name> --include-namespaces <namespace-name>

Let’s do a backup and restore tests; I have created a new OpenShift project with a simple hello-openshift build- and deployment-config:

[[email protected] ~]# ark backup create mybackup --include-namespaces myapplication
Backup request "mybackup" submitted successfully.
Run `ark backup describe mybackup` or `ark backup logs mybackup` for more details.
[[email protected] ~]# ark backup get
NAME          STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
mybackup      Completed   2019-02-08 17:14:09 +0000 UTC   29d       default            

Once the backup has completed we can delete the project.

[[email protected] ~]# oc delete project myapplication
project.project.openshift.io "myapplication" deleted

Now let’s restore the project namespace from the previous created backup:

[[email protected] ~]# ark restore create --from-backup mybackup
Restore request "mybackup-20190208171745" submitted successfully.
Run `ark restore describe mybackup-20190208171745` or `ark restore logs mybackup-20190208171745` for more details.
[[email protected] ~]# ark restore get
NAME                         BACKUP        STATUS       WARNINGS   ERRORS    CREATED                         SELECTOR
mybackup-20190208171745      mybackup      InProgress   0          0         2019-02-08 17:17:45 +0000 UTC   
[[email protected] ~]# ark restore get
NAME                         BACKUP        STATUS      WARNINGS   ERRORS    CREATED                         SELECTOR
mybackup-20190208171745      mybackup      Completed   1          0         2019-02-08 17:17:45 +0000 UTC   

The project is back in the state it was when we created the backup.

[[email protected] ~]# oc get pods
NAME                     READY     STATUS    RESTARTS   AGE
hello-app-http-1-qn8jj   1/1       Running   0          2m
[[email protected] ~]# curl -k --insecure https://hello-app-http-myapplication.aio.hostgate.net/
Hello OpenShift!

There are a few issues around the restore which I have seen and I want to explain, I’m not sure if these are related to OpenShift in general or just the latest 3.11 version. The secrets for the builder account are missing or didn’t restore correctly and cannot be used.

[[email protected] ~]# oc get build
NAME                 TYPE      FROM         STATUS                               STARTED   DURATION
hello-build-http-1   Docker    Dockerfile   New (CannotRetrieveServiceAccount)
hello-build-http-2   Docker    Dockerfile   New
[[email protected] ~]# oc get events | grep Failed
1m          1m           2         hello-build-http.15816e39eefb637d         BuildConfig                                     Warning   BuildConfigInstantiateFailed   buildconfig-controller                                error instantiating Build from BuildConfig myapplication/hello-build-http (0): Error resolving ImageStreamTag hello-openshift-source:latest in namespace myapplication: imagestreams.image.openshift.io "hello-openshift-source" not found
1m          1m           6         hello-build-http.15816e39f446207f         BuildConfig                                     Warning   BuildConfigInstantiateFailed   buildconfig-controller                                error instantiating Build from BuildConfig myapplication/hello-build-http (0): Error resolving ImageStreamTag hello-openshift-source:latest in namespace myapplication: unable to find latest tagged image
1m          1m           1         hello-build-http.15816e3a49f21411         BuildConfig                                     Warning   BuildConfigInstantiateFailed   buildconfig-controller                                error instantiating Build from BuildConfig myapplication/hello-build-http (0): builds.build.openshift.io "hello-build-http-1" already exists
[[email protected] ~]# oc get secrets | grep builder
builder-token-5q646        kubernetes.io/service-account-token   4         5m

# OR
[[email protected] ~]# oc get build
NAME                 TYPE      FROM         STATUS                        STARTED   DURATION
hello-build-http-1   Docker    Dockerfile   Pending (MissingPushSecret)
hello-build-http-2   Docker    Dockerfile   New
[[email protected] ~]# oc get events | grep FailedMount
15m         19m          10        hello-build-http-1-build.15816cc22f35795c   Pod                                             Warning   FailedMount                    kubelet, ip-172-26-12-32.eu-west-1.compute.internal   MountVolume.SetUp failed for volume "builder-dockercfg-k55f6-push" : secrets "builder-dockercfg-k55f6" not found
15m         17m          2         hello-build-http-1-build.15816cdec9dc561a   Pod                                             Warning   FailedMount                    kubelet, ip-172-26-12-32.eu-west-1.compute.internal   Unable to mount volumes for pod "hello-build-http-1-build_myapplication(4c2f1113-2bb5-11e9-8a6b-0a007934f01e)": timeout expired waiting for volumes to attach or mount for pod "myapplication"/"hello-build-http-1-build". list of unmounted volumes=[builder-dockercfg-k55f6-push]. list of unattached volumes=[buildworkdir docker-socket crio-socket builder-dockercfg-k55f6-push builder-dockercfg-m6d2v-pull builder-token-sjvw5]
13m         13m          1         hello-build-http-1-build.15816d1e3e65ad2a   Pod                                             Warning   FailedMount                    kubelet, ip-172-26-12-32.eu-west-1.compute.internal   Unable to mount volumes for pod "hello-build-http-1-build_myapplication(4c2f1113-2bb5-11e9-8a6b-0a007934f01e)": timeout expired waiting for volumes to attach or mount for pod "myapplication"/"hello-build-http-1-build". list of unmounted volumes=[buildworkdir docker-socket crio-socket builder-dockercfg-k55f6-push builder-dockercfg-m6d2v-pull builder-token-sjvw5]. list of unattached volumes=[buildworkdir docker-socket crio-socket builder-dockercfg-k55f6-push builder-dockercfg-m6d2v-pull builder-token-sjvw5]
[[email protected] ~]# oc get secrets | grep builder
NAME                       TYPE                                  DATA      AGE
builder-dockercfg-m6d2v    kubernetes.io/dockercfg               1         5m
builder-token-4chx4        kubernetes.io/service-account-token   4         5m
builder-token-sjvw5        kubernetes.io/service-account-token   4         5m

The deployment config seems to be disconnected and doesn’t know the state of the running pod:

[[email protected] ~]# oc get dc
NAME             REVISION   DESIRED   CURRENT   TRIGGERED BY
hello-app-http   0          1         0         config,image(hello-openshift:latest)
[[email protected] ~]#

Here are the steps to recover out of this situation:

# First cancel all builds - the restore seems to have triggered a new build:
[[email protected] ~]# oc cancel-build $(oc get build --no-headers | awk '{ print $1 }')
build.build.openshift.io/hello-build-http-1 marked for cancellation, waiting to be cancelled
build.build.openshift.io/hello-build-http-2 marked for cancellation, waiting to be cancelled
build.build.openshift.io/hello-build-http-1 cancelled
build.build.openshift.io/hello-build-http-2 cancelled

# Delete all builds otherwise you will get later a problem because of duplicate name:
[[email protected] ~]# oc delete build $(oc get build --no-headers | awk '{ print $1 }')
build.build.openshift.io "hello-build-http-1" deleted
build.build.openshift.io "hello-build-http-2" deleted

# Delete the project builder account - this triggers openshift to re-create the builder
[[email protected] ~]# oc delete sa builder
serviceaccount "builder" deleted
[[email protected] ~]# oc get secrets | grep builder
builder-dockercfg-vwckw    kubernetes.io/dockercfg               1         24s
builder-token-dpgj9        kubernetes.io/service-account-token   4         24s
builder-token-lt7z2        kubernetes.io/service-account-token   4         24s

# Start the build and afterwards do a rollout for the deployment config:
[[email protected] ~]# oc start-build hello-build-http
build.build.openshift.io/hello-build-http-3 started
[[email protected] ~]# oc rollout latest dc/hello-app-http
deploymentconfig.apps.openshift.io/hello-app-http rolled out

After doing all this your build- and deployment-config is back synchronised.

[[email protected] ~]# oc get dc
NAME             REVISION   DESIRED   CURRENT   TRIGGERED BY
hello-app-http   3          1         1         config,image(hello-openshift:latest)

My feedback about Heptio Velero(Ark); apart from the restore issues with the build- and deployment-config, I find the tool great especially in scenarios where I accidently deleted a namespace or for DR where I need to recover a whole cluster. What makes the tool worth it, is actually the possibility to create snapshots from PV disks on your cloud provider.

Check out the official documentation from Heptio for more information and if you like this article please leave a comment.

Deploy OpenShift 3.11 Container Platform on Google Cloud Platform using Terraform

Over the past few days I have converted the OpenShift 3.11 infrastructure on Amazon AWS to run on Google Cloud Platform. I have kept the similar VPC network layout and instances to run OpenShift.

Before you start you need to create a project on Google Cloud Platform, then continue to create the service account and generate the private key and download the credential as JSON file.

Create the new project:

Create the service account:

Give the service account compute admin and storage object creator permissions:

Then create a storage bucket for the Terraform backend state and assign the correct bucket permission to the terraform service account:

Bucket permissions:

To start, clone my openshift-terraform github repository and checkout the google-dev branch:

git clone https://github.com/berndonline/openshift-terraform.git
cd ./openshift-terraform/ && git checkout google-dev

Add your previously downloaded credentials json file:

cat << EOF > ./credentials.json
{
  "type": "service_account",
  "project_id": "<--your-project-->",
  "private_key_id": "<--your-key-id-->",
  "private_key": "-----BEGIN PRIVATE KEY-----

...

}
EOF

There are a few things you need to modify in the main.tf and variables.tf before you can start:

...
terraform {
  backend "gcs" {
    bucket    = "<--your-bucket-name-->"
    prefix    = "openshift-311"
    credentials = "credentials.json"
  }
}
...
...
variable "gcp_region" {
  description = "Google Compute Platform region to launch servers."
  default     = "europe-west3"
}
variable "gcp_project" {
  description = "Google Compute Platform project name."
  default     = "<--your-project-name-->"
}
variable "gcp_zone" {
  type = "string"
  default = "europe-west3-a"
  description = "The zone to provision into"
}
...

Add the needed environment variables to apply changes to CloudFlare DNS:

export TF_VAR_email='<-YOUR-CLOUDFLARE-EMAIL-ADDRESS->'
export TF_VAR_token='<-YOUR-CLOUDFLARE-TOKEN->'
export TF_VAR_domain='<-YOUR-CLOUDFLARE-DOMAIN->'
export TF_VAR_htpasswd='<-YOUR-OPENSHIFT-DEMO-USER-HTPASSWD->'

Let’s start creating the infrastructure and verify afterwards the created resources on GCP.

terraform init && terraform apply -auto-approve

VPC and public and private subnets in region europe-west3:

Created instances:

Created load balancers for master and infra nodes:

Copy the ssh key and ansible-hosts file to the bastion host from where you need to run the Ansible OpenShift playbooks.

scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -r ./helper_scripts/id_rsa [email protected]$(terraform output bastion):/home/centos/.ssh/
scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -r ./inventory/ansible-hosts  [email protected]$(terraform output bastion):/home/centos/ansible-hosts

I recommend waiting a few minutes as the cloud-init script prepares the bastion host. Afterwards continue with the pre and install playbooks. You can connect to the bastion host and run the playbooks directly.

ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -l centos $(terraform output bastion) -A "cd /openshift-ansible/ && ansible-playbook ./playbooks/openshift-pre.yml -i ~/ansible-hosts"
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -l centos $(terraform output bastion) -A "cd /openshift-ansible/ && ansible-playbook ./playbooks/openshift-install.yml -i ~/ansible-hosts"

After the installation is completed, continue to create your project and applications:

When you are finished with the testing, run terraform destroy.

terraform destroy -force 

Please share your feedback and leave a comment.