I have found an interesting open source tool called Heptio Velero previously known as Heptio Ark which is able to backup Kubernetes and OpenShift container platforms. The tool mainly does this via the API and backup namespace objects and additionally is able to create snapshots for PVs on Azure, AWS and GCP.
The user uses the ark command line utility to create and restore backups.
The installation on Velero is super simple, just follow the steps below:
# Download and extract the latest Velero release from github wget https://github.com/heptio/velero/releases/download/v0.10.1/ark-v0.10.1-linux-amd64.tar.gz tar -xzf ark-v0.10.1-linux-amd64.tar.gz -c ./velero/ # Move the ark binary to somewhere in your PATH mv ./velero/ark /usr/sbin/ # The last two commands create namespace and applies configuration oc create -f ./velero/config/common/00-prereqs.yaml oc create -f ./velero/config/minio/
You can expose Minio to access the web console from the outside.
# Create route oc expose service minio # View access and secret key to login via the web console oc describe deployment.apps/minio | grep -i Environment -A2 Environment: MINIO_ACCESS_KEY: minio MINIO_SECRET_KEY: minio123
Here a few command options on how to backup objects:
# Create a backup for any object that matches the app=pod label selector: ark backup create <backup-name> --selector <key>=<value> # Alternatively if you want to backup all objects except those matching the label backup=ignore: ark backup create <backup-name> --selector 'backup notin (ignore)' # Create regularly scheduled backups based on a cron expression using the app=pod label selector: ark schedule create <backup-name> --schedule="0 1 * * *" --selector <key>=<value> # Create a backup for a namespace: ark backup create <backup-name> --include-namespaces <namespace-name>
Let’s do a backup and restore tests; I have created a new OpenShift project with a simple hello-openshift build- and deployment-config:
[root@master1 ~]# ark backup create mybackup --include-namespaces myapplication Backup request "mybackup" submitted successfully. Run `ark backup describe mybackup` or `ark backup logs mybackup` for more details. [root@master1 ~]# ark backup get NAME STATUS CREATED EXPIRES STORAGE LOCATION SELECTOR mybackup Completed 2019-02-08 17:14:09 +0000 UTC 29d default
Once the backup has completed we can delete the project.
[root@master1 ~]# oc delete project myapplication project.project.openshift.io "myapplication" deleted
Now let’s restore the project namespace from the previous created backup:
[root@master1 ~]# ark restore create --from-backup mybackup Restore request "mybackup-20190208171745" submitted successfully. Run `ark restore describe mybackup-20190208171745` or `ark restore logs mybackup-20190208171745` for more details. [root@master1 ~]# ark restore get NAME BACKUP STATUS WARNINGS ERRORS CREATED SELECTOR mybackup-20190208171745 mybackup InProgress 0 0 2019-02-08 17:17:45 +0000 UTC [root@master1 ~]# ark restore get NAME BACKUP STATUS WARNINGS ERRORS CREATED SELECTOR mybackup-20190208171745 mybackup Completed 1 0 2019-02-08 17:17:45 +0000 UTC
The project is back in the state it was when we created the backup.
[root@master1 ~]# oc get pods NAME READY STATUS RESTARTS AGE hello-app-http-1-qn8jj 1/1 Running 0 2m [root@master1 ~]# curl -k --insecure https://hello-app-http-myapplication.aio.hostgate.net/ Hello OpenShift!
There are a few issues around the restore which I have seen and I want to explain, I’m not sure if these are related to OpenShift in general or just the latest 3.11 version. The secrets for the builder account are missing or didn’t restore correctly and cannot be used.
[root@master1 ~]# oc get build NAME TYPE FROM STATUS STARTED DURATION hello-build-http-1 Docker Dockerfile New (CannotRetrieveServiceAccount) hello-build-http-2 Docker Dockerfile New [root@master1 ~]# oc get events | grep Failed 1m 1m 2 hello-build-http.15816e39eefb637d BuildConfig Warning BuildConfigInstantiateFailed buildconfig-controller error instantiating Build from BuildConfig myapplication/hello-build-http (0): Error resolving ImageStreamTag hello-openshift-source:latest in namespace myapplication: imagestreams.image.openshift.io "hello-openshift-source" not found 1m 1m 6 hello-build-http.15816e39f446207f BuildConfig Warning BuildConfigInstantiateFailed buildconfig-controller error instantiating Build from BuildConfig myapplication/hello-build-http (0): Error resolving ImageStreamTag hello-openshift-source:latest in namespace myapplication: unable to find latest tagged image 1m 1m 1 hello-build-http.15816e3a49f21411 BuildConfig Warning BuildConfigInstantiateFailed buildconfig-controller error instantiating Build from BuildConfig myapplication/hello-build-http (0): builds.build.openshift.io "hello-build-http-1" already exists [root@master1 ~]# oc get secrets | grep builder builder-token-5q646 kubernetes.io/service-account-token 4 5m # OR [root@master1 ~]# oc get build NAME TYPE FROM STATUS STARTED DURATION hello-build-http-1 Docker Dockerfile Pending (MissingPushSecret) hello-build-http-2 Docker Dockerfile New [root@master1 ~]# oc get events | grep FailedMount 15m 19m 10 hello-build-http-1-build.15816cc22f35795c Pod Warning FailedMount kubelet, ip-172-26-12-32.eu-west-1.compute.internal MountVolume.SetUp failed for volume "builder-dockercfg-k55f6-push" : secrets "builder-dockercfg-k55f6" not found 15m 17m 2 hello-build-http-1-build.15816cdec9dc561a Pod Warning FailedMount kubelet, ip-172-26-12-32.eu-west-1.compute.internal Unable to mount volumes for pod "hello-build-http-1-build_myapplication(4c2f1113-2bb5-11e9-8a6b-0a007934f01e)": timeout expired waiting for volumes to attach or mount for pod "myapplication"/"hello-build-http-1-build". list of unmounted volumes=[builder-dockercfg-k55f6-push]. list of unattached volumes=[buildworkdir docker-socket crio-socket builder-dockercfg-k55f6-push builder-dockercfg-m6d2v-pull builder-token-sjvw5] 13m 13m 1 hello-build-http-1-build.15816d1e3e65ad2a Pod Warning FailedMount kubelet, ip-172-26-12-32.eu-west-1.compute.internal Unable to mount volumes for pod "hello-build-http-1-build_myapplication(4c2f1113-2bb5-11e9-8a6b-0a007934f01e)": timeout expired waiting for volumes to attach or mount for pod "myapplication"/"hello-build-http-1-build". list of unmounted volumes=[buildworkdir docker-socket crio-socket builder-dockercfg-k55f6-push builder-dockercfg-m6d2v-pull builder-token-sjvw5]. list of unattached volumes=[buildworkdir docker-socket crio-socket builder-dockercfg-k55f6-push builder-dockercfg-m6d2v-pull builder-token-sjvw5] [root@master1 ~]# oc get secrets | grep builder NAME TYPE DATA AGE builder-dockercfg-m6d2v kubernetes.io/dockercfg 1 5m builder-token-4chx4 kubernetes.io/service-account-token 4 5m builder-token-sjvw5 kubernetes.io/service-account-token 4 5m
The deployment config seems to be disconnected and doesn’t know the state of the running pod:
[root@ip-172-26-12-32 ~]# oc get dc NAME REVISION DESIRED CURRENT TRIGGERED BY hello-app-http 0 1 0 config,image(hello-openshift:latest) [root@ip-172-26-12-32 ~]#
Here are the steps to recover out of this situation:
# First cancel all builds - the restore seems to have triggered a new build: [root@master1 ~]# oc cancel-build $(oc get build --no-headers | awk '{ print $1 }') build.build.openshift.io/hello-build-http-1 marked for cancellation, waiting to be cancelled build.build.openshift.io/hello-build-http-2 marked for cancellation, waiting to be cancelled build.build.openshift.io/hello-build-http-1 cancelled build.build.openshift.io/hello-build-http-2 cancelled # Delete all builds otherwise you will get later a problem because of duplicate name: [root@master1 ~]# oc delete build $(oc get build --no-headers | awk '{ print $1 }') build.build.openshift.io "hello-build-http-1" deleted build.build.openshift.io "hello-build-http-2" deleted # Delete the project builder account - this triggers openshift to re-create the builder [root@master1 ~]# oc delete sa builder serviceaccount "builder" deleted [root@master1 ~]# oc get secrets | grep builder builder-dockercfg-vwckw kubernetes.io/dockercfg 1 24s builder-token-dpgj9 kubernetes.io/service-account-token 4 24s builder-token-lt7z2 kubernetes.io/service-account-token 4 24s # Start the build and afterwards do a rollout for the deployment config: [root@master1 ~]# oc start-build hello-build-http build.build.openshift.io/hello-build-http-3 started [root@master1 ~]# oc rollout latest dc/hello-app-http deploymentconfig.apps.openshift.io/hello-app-http rolled out
After doing all this your build- and deployment-config is back synchronised.
[root@master1 ~]# oc get dc NAME REVISION DESIRED CURRENT TRIGGERED BY hello-app-http 3 1 1 config,image(hello-openshift:latest)
My feedback about Heptio Velero(Ark); apart from the restore issues with the build- and deployment-config, I find the tool great especially in scenarios where I accidently deleted a namespace or for DR where I need to recover a whole cluster. What makes the tool worth it, is actually the possibility to create snapshots from PV disks on your cloud provider.
Check out the official documentation from Heptio for more information and if you like this article please leave a comment.
thanks for sharing, can you give me more detailed information about recover a whole cluster? can I use a single backup for this propose?
Hi Jacky,
Velero is meant for backup and restore namespaces less for full cluster recovery.
Or you use Velero for migrations between different cluster versions, between OpenShift 3.11 and 4.1 for example.
-Bernd
is there an newer version of this install for Velero?
Hi Robert,
Yes, in August they release v1.1.0.
https://github.com/vmware-tanzu/velero/releases/download/v1.1.0/velero-v1.1.0-linux-amd64.tar.gz
-Bernd