OpenShift – Page 3 – techbloc.net

April 7, 2019

Getting started with OpenShift 4.0 Container Platform

I had a first look at OpenShift 4.0 and I wanted to share some information from what I have seen so far. The installation of the cluster is super easy and RedHat did a lot to improve the overall experience of the installation process to the previous OpenShift v3.x Ansible based installation and moving towards ephemeral cluster deployments.

There are a many changes under the hood and it’s not as obvious as Bootkube for the self-hosted/healing control-plane, MachineSets and the many internal operators to install and manage the OpenShift components ( api server, scheduler, controller manager, cluster-autoscaler, cluster-monitoring, web-console, dns, ingress, networking, node-tuning, and authentication ).

For the OpenShift 4.0 developer preview you need an RedHat account because you require a pull-secret for the cluster installation. For more information please visit: https://cloud.openshift.com/clusters/install

First we need to download the openshift-installer binary:

wget https://github.com/openshift/installer/releases/download/v0.16.1/openshift-install-linux-amd64
mv openshift-install-linux-amd64 openshift-install
chmod +x openshift-install

Then we create the install-configuration, it is required that you already have AWS account credentials and an Route53 DNS domain set-up:

$ ./openshift-install create install-config
INFO Platform aws
INFO AWS Access Key ID *********
INFO AWS Secret Access Key [? for help] *********
INFO Writing AWS credentials to "/home/centos/.aws/credentials" (https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html)
INFO Region eu-west-1
INFO Base Domain paas.domain.com
INFO Cluster Name cluster1
INFO Pull Secret [? for help] *********

Let’s look at the install-config.yaml

apiVersion: v1beta4
baseDomain: paas.domain.com
compute:
- name: worker
  platform: {}
  replicas: 3
controlPlane:
  name: master
  platform: {}
  replicas: 3
metadata:
  creationTimestamp: null
  name: ew1
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineCIDR: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: eu-west-1
pullSecret: '{"auths":{...}'

Now we can continue to create the OpenShift v4 cluster which takes around 30mins to complete. At the end of the openshift-installer you see the auto-generate credentials to connect to the cluster:

$ ./openshift-install create cluster
INFO Consuming "Install Config" from target directory
INFO Creating infrastructure resources...
INFO Waiting up to 30m0s for the Kubernetes API at https://api.cluster1.paas.domain.com:6443...
INFO API v1.12.4+0ba401e up
INFO Waiting up to 30m0s for the bootstrap-complete event...
INFO Destroying the bootstrap resources...
INFO Waiting up to 30m0s for the cluster at https://api.cluster1.paas.domain.com:6443 to initialize...
INFO Waiting up to 10m0s for the openshift-console route to be created...
INFO Install complete!
INFO Run 'export KUBECONFIG=/home/centos/auth/kubeconfig' to manage the cluster with 'oc', the OpenShift CLI.
INFO The cluster is ready when 'oc login -u kubeadmin -p jMTSJ-F6KYy-mVVZ4-QVNPP' succeeds (wait a few minutes).
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.cluster1.paas.domain.com
INFO Login to the console with user: kubeadmin, password: jMTSJ-F6KYy-mVVZ4-QVNPP

The web-console has a very clean new design which I really like in addition to all the great improvements.

Under administration -> cluster settings you can explore the new auto-upgrade functionality of OpenShift 4.0:

You choose the new version to upgrade and everything else happens in the background which is a massive improvement to OpenShift v3.x where you had to run the ansible installer for this.

In the background the cluster operator upgrades the different platform components one by one.

Slowly you will see that the components move to the new build version.

Finished cluster upgrade:

You can only upgrade from one version 4.0.0-0.9 to the next version 4.0.0-0.10. It is not possible to upgrade and go straight from x-0.9 to x-0.11.

But let’s deploy the Google Hipster Shop example and expose the frontend-external service for some more testing:

oc login -u kubeadmin -p jMTSJ-F6KYy-mVVZ4-QVNPP https://api.cluster1.paas.domain.com:6443 --insecure-skip-tls-verify=true
oc new-project myproject
oc create -f https://raw.githubusercontent.com/berndonline/openshift-ansible/master/examples/hipster-shop.yml
oc expose svc frontend-external

Getting the hostname for the exposed service:

$ oc get route
NAME                HOST/PORT                                                   PATH      SERVICES            PORT      TERMINATION   WILDCARD
frontend-external   frontend-external-myproject.apps.cluster1.paas.domain.com             frontend-external   http                    None

Use the browser to connect to our Hipster Shop:

It’s also very easy to destroy the cluster as it is to create it, as you seen previously:

$ ./openshift-install destroy cluster
INFO Disassociated                                 arn="arn:aws:ec2:eu-west-1:552276840222:route-table/rtb-083e2da5d1183efa7" id=rtbassoc-01d27db162fa45402
INFO Disassociated                                 arn="arn:aws:ec2:eu-west-1:552276840222:route-table/rtb-083e2da5d1183efa7" id=rtbassoc-057f593640067efc0
INFO Disassociated                                 arn="arn:aws:ec2:eu-west-1:552276840222:route-table/rtb-083e2da5d1183efa7" id=rtbassoc-05e821b451bead18f
INFO Disassociated                                 IAM instance profile="arn:aws:iam::552276840222:instance-profile/ocp4-bgx4c-worker-profile" arn="arn:aws:ec2:eu-west-1:552276840222:instance/i-0f64a911b1ffa3eff" id=i-0f64a911b1ffa3eff name=ocp4-bgx4c-worker-profile role=ocp4-bgx4c-worker-role
INFO Deleted                                       IAM instance profile="arn:aws:iam::552276840222:instance-profile/ocp4-bgx4c-worker-profile" arn="arn:aws:ec2:eu-west-1:552276840222:instance/i-0f64a911b1ffa3eff" id=i-0f64a911b1ffa3eff name=0xc00090f9a8
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:instance/i-0f64a911b1ffa3eff" id=i-0f64a911b1ffa3eff
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:instance/i-00b5eedc186ba26a7" id=i-00b5eedc186ba26a7
...
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:security-group/sg-016d4c7d435a1c97f" id=sg-016d4c7d435a1c97f
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:subnet/subnet-076348368858e9a82" id=subnet-076348368858e9a82
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:vpc/vpc-00c611ae1b9b8e10a" id=vpc-00c611ae1b9b8e10a
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:dhcp-options/dopt-0ce8b6a1c31e0ceac" id=dopt-0ce8b6a1c31e0ceac

The install experience is great for OpenShift 4.0 which makes it very easy for everyone to create and get started quickly with an enterprise container platform. From the operational perspective I still need to see how to run the new platform because all the operators are great and makes it an easy to use cluster but what happens when one of the operators goes rogue and debugging this I am most interested in.

Over the coming weeks I will look into more detail around OpenShift 4.0 and the different new features, I am especially interested in Service Mesh.

March 24, 2019March 24, 2019

How to backup OpenShift with Heptio Velero(Ark)

I have found an interesting open source tool called Heptio Velero previously known as Heptio Ark which is able to backup Kubernetes and OpenShift container platforms. The tool mainly does this via the API and backup namespace objects and additionally is able to create snapshots for PVs on Azure, AWS and GCP.

The user uses the ark command line utility to create and restore backups.

The installation on Velero is super simple, just follow the steps below:

# Download and extract the latest Velero release from github
wget https://github.com/heptio/velero/releases/download/v0.10.1/ark-v0.10.1-linux-amd64.tar.gz
tar -xzf ark-v0.10.1-linux-amd64.tar.gz -c ./velero/

# Move the ark binary to somewhere in your PATH
mv ./velero/ark /usr/sbin/

# The last two commands create namespace and applies configuration
oc create -f ./velero/config/common/00-prereqs.yaml
oc create -f ./velero/config/minio/

You can expose Minio to access the web console from the outside.

# Create route
oc expose service minio

# View access and secret key to login via the web console
oc describe deployment.apps/minio | grep -i Environment -A2
    Environment:
      MINIO_ACCESS_KEY:  minio
      MINIO_SECRET_KEY:  minio123

Here a few command options on how to backup objects:

# Create a backup for any object that matches the app=pod label selector:
ark backup create <backup-name> --selector <key>=<value> 

# Alternatively if you want to backup all objects except those matching the label backup=ignore:
ark backup create <backup-name> --selector 'backup notin (ignore)'

# Create regularly scheduled backups based on a cron expression using the app=pod label selector:
ark schedule create <backup-name> --schedule="0 1 * * *" --selector <key>=<value>

# Create a backup for a namespace:
ark backup create <backup-name> --include-namespaces <namespace-name>

Let’s do a backup and restore tests; I have created a new OpenShift project with a simple hello-openshift build- and deployment-config:

[root@master1 ~]# ark backup create mybackup --include-namespaces myapplication
Backup request "mybackup" submitted successfully.
Run `ark backup describe mybackup` or `ark backup logs mybackup` for more details.
[root@master1 ~]# ark backup get
NAME          STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
mybackup      Completed   2019-02-08 17:14:09 +0000 UTC   29d       default

Once the backup has completed we can delete the project.

[root@master1 ~]# oc delete project myapplication
project.project.openshift.io "myapplication" deleted

Now let’s restore the project namespace from the previous created backup:

[root@master1 ~]# ark restore create --from-backup mybackup
Restore request "mybackup-20190208171745" submitted successfully.
Run `ark restore describe mybackup-20190208171745` or `ark restore logs mybackup-20190208171745` for more details.
[root@master1 ~]# ark restore get
NAME                         BACKUP        STATUS       WARNINGS   ERRORS    CREATED                         SELECTOR
mybackup-20190208171745      mybackup      InProgress   0          0         2019-02-08 17:17:45 +0000 UTC   
[root@master1 ~]# ark restore get
NAME                         BACKUP        STATUS      WARNINGS   ERRORS    CREATED                         SELECTOR
mybackup-20190208171745      mybackup      Completed   1          0         2019-02-08 17:17:45 +0000 UTC

The project is back in the state it was when we created the backup.

[root@master1 ~]# oc get pods
NAME                     READY     STATUS    RESTARTS   AGE
hello-app-http-1-qn8jj   1/1       Running   0          2m
[root@master1 ~]# curl -k --insecure https://hello-app-http-myapplication.aio.hostgate.net/
Hello OpenShift!

There are a few issues around the restore which I have seen and I want to explain, I’m not sure if these are related to OpenShift in general or just the latest 3.11 version. The secrets for the builder account are missing or didn’t restore correctly and cannot be used.

[root@master1 ~]# oc get build
NAME                 TYPE      FROM         STATUS                               STARTED   DURATION
hello-build-http-1   Docker    Dockerfile   New (CannotRetrieveServiceAccount)
hello-build-http-2   Docker    Dockerfile   New
[root@master1 ~]# oc get events | grep Failed
1m          1m           2         hello-build-http.15816e39eefb637d         BuildConfig                                     Warning   BuildConfigInstantiateFailed   buildconfig-controller                                error instantiating Build from BuildConfig myapplication/hello-build-http (0): Error resolving ImageStreamTag hello-openshift-source:latest in namespace myapplication: imagestreams.image.openshift.io "hello-openshift-source" not found
1m          1m           6         hello-build-http.15816e39f446207f         BuildConfig                                     Warning   BuildConfigInstantiateFailed   buildconfig-controller                                error instantiating Build from BuildConfig myapplication/hello-build-http (0): Error resolving ImageStreamTag hello-openshift-source:latest in namespace myapplication: unable to find latest tagged image
1m          1m           1         hello-build-http.15816e3a49f21411         BuildConfig                                     Warning   BuildConfigInstantiateFailed   buildconfig-controller                                error instantiating Build from BuildConfig myapplication/hello-build-http (0): builds.build.openshift.io "hello-build-http-1" already exists
[root@master1 ~]# oc get secrets | grep builder
builder-token-5q646        kubernetes.io/service-account-token   4         5m

# OR
[root@master1 ~]# oc get build
NAME                 TYPE      FROM         STATUS                        STARTED   DURATION
hello-build-http-1   Docker    Dockerfile   Pending (MissingPushSecret)
hello-build-http-2   Docker    Dockerfile   New
[root@master1 ~]# oc get events | grep FailedMount
15m         19m          10        hello-build-http-1-build.15816cc22f35795c   Pod                                             Warning   FailedMount                    kubelet, ip-172-26-12-32.eu-west-1.compute.internal   MountVolume.SetUp failed for volume "builder-dockercfg-k55f6-push" : secrets "builder-dockercfg-k55f6" not found
15m         17m          2         hello-build-http-1-build.15816cdec9dc561a   Pod                                             Warning   FailedMount                    kubelet, ip-172-26-12-32.eu-west-1.compute.internal   Unable to mount volumes for pod "hello-build-http-1-build_myapplication(4c2f1113-2bb5-11e9-8a6b-0a007934f01e)": timeout expired waiting for volumes to attach or mount for pod "myapplication"/"hello-build-http-1-build". list of unmounted volumes=[builder-dockercfg-k55f6-push]. list of unattached volumes=[buildworkdir docker-socket crio-socket builder-dockercfg-k55f6-push builder-dockercfg-m6d2v-pull builder-token-sjvw5]
13m         13m          1         hello-build-http-1-build.15816d1e3e65ad2a   Pod                                             Warning   FailedMount                    kubelet, ip-172-26-12-32.eu-west-1.compute.internal   Unable to mount volumes for pod "hello-build-http-1-build_myapplication(4c2f1113-2bb5-11e9-8a6b-0a007934f01e)": timeout expired waiting for volumes to attach or mount for pod "myapplication"/"hello-build-http-1-build". list of unmounted volumes=[buildworkdir docker-socket crio-socket builder-dockercfg-k55f6-push builder-dockercfg-m6d2v-pull builder-token-sjvw5]. list of unattached volumes=[buildworkdir docker-socket crio-socket builder-dockercfg-k55f6-push builder-dockercfg-m6d2v-pull builder-token-sjvw5]
[root@master1 ~]# oc get secrets | grep builder
NAME                       TYPE                                  DATA      AGE
builder-dockercfg-m6d2v    kubernetes.io/dockercfg               1         5m
builder-token-4chx4        kubernetes.io/service-account-token   4         5m
builder-token-sjvw5        kubernetes.io/service-account-token   4         5m

The deployment config seems to be disconnected and doesn’t know the state of the running pod:

[root@ip-172-26-12-32 ~]# oc get dc
NAME             REVISION   DESIRED   CURRENT   TRIGGERED BY
hello-app-http   0          1         0         config,image(hello-openshift:latest)
[root@ip-172-26-12-32 ~]#

Here are the steps to recover out of this situation:

# First cancel all builds - the restore seems to have triggered a new build:
[root@master1 ~]# oc cancel-build $(oc get build --no-headers | awk '{ print $1 }')
build.build.openshift.io/hello-build-http-1 marked for cancellation, waiting to be cancelled
build.build.openshift.io/hello-build-http-2 marked for cancellation, waiting to be cancelled
build.build.openshift.io/hello-build-http-1 cancelled
build.build.openshift.io/hello-build-http-2 cancelled

# Delete all builds otherwise you will get later a problem because of duplicate name:
[root@master1 ~]# oc delete build $(oc get build --no-headers | awk '{ print $1 }')
build.build.openshift.io "hello-build-http-1" deleted
build.build.openshift.io "hello-build-http-2" deleted

# Delete the project builder account - this triggers openshift to re-create the builder
[root@master1 ~]# oc delete sa builder
serviceaccount "builder" deleted
[root@master1 ~]# oc get secrets | grep builder
builder-dockercfg-vwckw    kubernetes.io/dockercfg               1         24s
builder-token-dpgj9        kubernetes.io/service-account-token   4         24s
builder-token-lt7z2        kubernetes.io/service-account-token   4         24s

# Start the build and afterwards do a rollout for the deployment config:
[root@master1 ~]# oc start-build hello-build-http
build.build.openshift.io/hello-build-http-3 started
[root@master1 ~]# oc rollout latest dc/hello-app-http
deploymentconfig.apps.openshift.io/hello-app-http rolled out

After doing all this your build- and deployment-config is back synchronised.

[root@master1 ~]# oc get dc
NAME             REVISION   DESIRED   CURRENT   TRIGGERED BY
hello-app-http   3          1         1         config,image(hello-openshift:latest)

My feedback about Heptio Velero(Ark); apart from the restore issues with the build- and deployment-config, I find the tool great especially in scenarios where I accidently deleted a namespace or for DR where I need to recover a whole cluster. What makes the tool worth it, is actually the possibility to create snapshots from PV disks on your cloud provider.

Check out the official documentation from Heptio for more information and if you like this article please leave a comment.

March 17, 2019March 20, 2019

OpenShift Networking and Network Policies

This article is about OpenShift networking in general but I also want to look at the Kubernetes CNI feature NetworkPolicy in a bit more detail. The latest OpenShift version 3.11 comes with three SDN deployment models:

ovs-subnet – This creates a single large vxlan between all the namespace and everyone is able to talk to each other.
ovs-multitenant – As the name already says this separates the namespaces into separate vxlan’s and only resources within the namespace are able to talk to each other. You have the possibility to join or making namespaces global.
ovs-networkpolicy – The newest SDN deployment method for OpenShift to enabling micro-segmentation to control the communication between pods and namespaces.
ovs-ovn – Next generation SDN for OpenShift but not yet officially released for OpenShift. For more information visit the OpenvSwitch Github repository ovn-kubernetes.

Here an overview of the common ovs-multitenant software defined network:

On an OpenShift node the tun0 interfaces owns the default gateway and is forwarding traffic to external endpoints outside the OpenShift platform or routing internal traffic to the openvswitch overlay. Both openvswitch and iptables are central components which are very important for the networking on the platform.

Read the official OpenShift documentation managing networking or configuring the SDN for more information.

NetworkPolicy in Action

Let me first explain the example I use to test NetworkPolicy. We will have one hello-openshift pod behind service, and a busybox pod for testing the internal communication. I will create a default ingress deny policy and specifically allow tcp port 8080 to my hello-openshift pod. I am not planning to restrict the busybox pod with an egress policy, so all egress traffic is allowed.

Here you find the example yaml files to replicate the layout: busybox.yml and hello-openshift.yml

Short recap about Kubernetes service definition, they are just simple iptables entries and for this reason you cannot restrict them with NetworkPolicy.

[root@master1 ~]# iptables-save | grep 172.30.231.77
-A KUBE-SERVICES ! -s 10.128.0.0/14 -d 172.30.231.77/32 -p tcp -m comment --comment "myproject/hello-app-http:web cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 172.30.231.77/32 -p tcp -m comment --comment "myproject/hello-app-http:web cluster IP" -m tcp --dport 80 -j KUBE-SVC-LFWXBQW674LJXLPD
[root@master1 ~]#

When you install OpenShift with ovs-networkpolicy, the default policy allows all traffic within a namespace. Let’s do a first test without a custom NetworkPolicy rule to see if I am able to connect to my hello-app-http service.

[root@master1 ~]# oc exec busybox-1-wn592 -- wget -S --spider http://hello-app-http
Connecting to hello-app-http (172.30.231.77:80)
  HTTP/1.1 200 OK
  Date: Tue, 19 Feb 2019 13:59:04 GMT
  Content-Length: 17
  Content-Type: text/plain; charset=utf-8
  Connection: close

[root@master1 ~]#

Now we add a default ingress deny policy to the namespace:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: deny-all-ingress
spec:
  podSelector:
  ingress: []

After applying the default deny policy you are not able to connect to the hello-app-http service. The connection is timing out because no flows entries are defined yet in the OpenFlow table:

[root@master1 ~]# oc exec busybox-1-wn592 -- wget -S --spider http://hello-app-http
Connecting to hello-app-http (172.30.231.77:80)
wget: can't connect to remote host (172.30.231.77): Connection timed out
command terminated with exit code 1
[root@master1 ~]#

Let’s add a new policy and allow tcp port 8080 and specifying a podSelector to match all pods with the label “role: web”.

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-tcp8080
spec:
  podSelector:
    matchLabels:
      role: web
  ingress:
  - ports:
    - protocol: TCP
      port: 8080

This alone doesn’t do anything, you still need to patch the deployment config and add the label “role: web” to your deployment config metadata information.

oc patch dc/hello-app-http --patch '{"spec":{"template":{"metadata":{"labels":{"role":"web"}}}}}'

To rollback the previous changes simply use the ‘oc rollback dc/hello-app-http’ command.

Now let’s check the openvswitch flow table and you will see that a new flow got added with the destination of my hello-openshift pod 10.128.0.103 on port 8080.

Afterwards we try again to connect to my hello-app-http service and you see that we get a succesful connect:

[root@master1 ~]# oc exec ovs-q4p8m -n openshift-sdn -- ovs-ofctl -O OpenFlow13 dump-flows br0 | grep '10.128.0.103.*8080'
 cookie=0x0, duration=221.251s, table=80, n_packets=15, n_bytes=1245, priority=150,tcp,reg1=0x2dfc74,nw_dst=10.128.0.103,tp_dst=8080 actions=output:NXM_NX_REG2[]
[root@master1 ~]#

[root@master1 ~]# oc exec busybox-1-wn592 -- wget -S --spider http://hello-app-http
Connecting to hello-app-http (172.30.231.77:80)
  HTTP/1.1 200 OK
  Date: Tue, 19 Feb 2019 14:21:57 GMT
  Content-Length: 17
  Content-Type: text/plain; charset=utf-8
  Connection: close

[root@master1 ~]#

The hello openshift container publishes two tcp ports 8080 and 8888, so finally let’s try to connect to the pod IP address on port 8888, and we will find out that I am not able to connect, the reason is that I only allowed 8080 in the policy.

[root@master1 ~]# oc exec busybox-1-wn592 -- wget -S --spider http://10.128.0.103:8888
Connecting to 10.128.0.103:8888 (10.128.0.103:8888)
wget: can't connect to remote host (10.128.0.103): Connection timed out
command terminated with exit code 1
[root@master1 ~]#

There are great posts on the RedHat OpenShift blog which you should checkout networkpolicies-and-microsegmentation and openshift-and-network-security-zones-coexistence-approaches. Otherwise I can recommend having a look at Ahmet Alp Balkan Github repository about Kubernetes network policy recipes, where you can find some good examples.

March 16, 2019March 20, 2019

How to display OpenShift/Kubernetes namespace on bash prompt

Very short but useful post about how to display the current Kubernetes namespace on the bash command prompt. I got used add an -n <namespace-name> when I execute an oc command but it is still very useful to get the current namespace displayed on the command prompt especially when troubleshooting issues to not get lost in the different platform namespaces.

Create new file ~/.oc-prompt.sh in your users home folder.

#!/bin/bash
__oc_ps1()
{
    # Get current context
    CONTEXT=$(cat ~/.kube/config 2>/dev/null| grep -o '^current-context: [^/]*' | cut -d' ' -f2)

    if [ -n "$CONTEXT" ]; then
        echo "(ocp:${CONTEXT})"
    fi
}

Add the following lines at the end of the ~/.bashrc and re-connect your terminal session.

NORMAL="\[\033[00m\]"
BLUE="\[\033[01;34m\]"
YELLOW="\[\e[1;33m\]"
GREEN="\[\e[1;32m\]"

export PS1="${BLUE}\W ${GREEN}\u${YELLOW}\$(__oc_ps1)${NORMAL} \$ "
source ~/.oc-prompt.sh

The example bash prompt showing the current OpenShift/Kubernetes namespace:

Very useful when you need to administrate a cluster with multiple namespace.

March 10, 2019March 20, 2019

Host and Container Monitoring with SysDig

After my previous articles about troubleshooting and to validate OpenShift using Ansible, I wanted to continue and show how SysDig is helping you to identify potentials issues on your nodes or container platform before they occur.

The open source version is a simple but very powerful tool to inspect your linux host via the command line but it has no capabilities to centrally monitor or store capture information. The enterprise version provides these capabilities like a web console and centrally stores metrics, it is also able to trigger remote captures without the need to connect to the host.

Sysdig Open Source

Let’s install sysdig open source, here the official SysDig installation guide.

# Host install
curl -s https://s3.amazonaws.com/download.draios.com/stable/install-sysdig | sudo bash

# Alternatively the container based install
yum -y install kernel-devel-$(uname -r)
docker pull sysdig/sysdig
docker run -i -t --name sysdig --privileged -v /var/run/docker.sock:/host/var/run/docker.sock -v /dev:/host/dev -v /proc:/host/proc:ro -v /boot:/host/boot:ro -v /lib/modules:/host/lib/modules:ro -v /usr:/host/usr:ro sysdig/sysdig

The csysdig command is nice and user friendly menu driven interface to see real-time system call information of your host. To collect information from Kubernetes or OpenShift please use the option [-kK] like seen in the example below:

csysdig -k https://localhost:8443 -K /etc/origin/master/admin.crt:/etc/origin/master/admin.key

For more information about how to use csysdig please have a look at the manual or watch the short Youtube video.

The main sysdig command is showing output directly in the terminal session and you are able to apply filters (chisels) to more granularly see the system calls. Like with csysdig, the option [-kK] enabled Kubernetes integration:

sysdig -k https://localhost:8443 -K /etc/origin/master/admin.crt:/etc/origin/master/admin.key

Here some useful commands to inspect Kubernetes or OpenShift events:

# Monitor Kubernetes namespace ip communication:
sudo sysdig -A -s8192 "fd.type in (ipv4, ipv6) and (k8s.ns.name=<-NAMESPACE-NAME->)" -k https://localhost:8443 -K /etc/origin/master/admin.crt:/e/origin/master/admin.key

# Monitor namespace and pod name, the 2nd command filters to only show GET requests:
sudo sysdig -A -s8192 "fd.type in (ipv4, ipv6) and (k8s.ns.name=<-NAMESPACE-NAME-> and k8s.pod.name=<-POD-NAME->)" -k https://localhost:8443 -K /etc/origin/master/admin.crt:/etc/origin/master/admin.key
sudo sysdig -A -s8192 "fd.type in (ipv4, ipv6) and (k8s.ns.name=<-NAMESPACE-NAME-> and k8s.pod.name=<-POD-NAME->) and evt.buffer contai GET" -k https://localhost:8443 -K /etc/origin/master/admin.crt:/etc/origin/master/admin.key 

# Monitor ns and pod names and apply chisel echo_fds:
sudo sysdig -A -s8192 "fd.type in (ipv4, ipv6) and (k8s.ns.name=<-NAMESPACE-NAME-> and k8s.pod.name=<-POD-NAME->)" -c echo_fds -k https://localhost:8443 -K /etc/origin/master/admin.crt:/etc/origin/master/admin.key

SysDig example

This capture is an http request between an busybox pod (name: busybox-2-hjhq8 ip: 10.128.0.81) via service (name: hello-app-http ip: 172.30.43.111) to the hello-openshift pod (name: hello-app-http-1-8v57x ip: 10.128.0.77) in the namespace myproject. I use a simple “wget -S –spider http://hello-app-http/” to simulate the request:

# Command to capture ip communication in myproject namespace including dnsmasq and wget processes:
sudo sysdig -s2000 -A -pk "fd.type in (ipv4, ipv6) and (k8s.ns.name=myproject or proc.name=dnsmasq) or proc.name=wget" -k https://localhost:8443 -K /etc/origin/master/admin.crt:/etc/origin/master/admin.key

# Output:
70739 19:36:51.401062017 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < socket fd=3(<4>)
70741 19:36:51.401062878 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > connect fd=3(<4>)
70748 19:36:51.401072194 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < connect res=0 tuple=10.128.0.81:44993->172.26.11.254:53
70749 19:36:51.401074599 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > sendto fd=3(<4u>10.128.0.81:44993->172.26.11.254:53) size=60 tuple=NULL
71083 19:36:51.401575859 0  (host) dnsmasq (20933:20933) > recvmsg fd=6(<4u>172.26.11.254:53)
71087 19:36:51.401582008 0  (host) dnsmasq (20933:20933) < recvmsg res=60 size=60 data= hello-app-httpmyprojectsvcclusterlocal tuple=10.128.0.81:44993->172.26.11.254:53
71088 19:36:51.401584101 0  (host) dnsmasq (20933:20933) > ioctl fd=6(<4u>10.128.0.81:44993->172.26.11.254:53) request=8910 argument=7FFE208E30C0
71089 19:36:51.401586692 0  (host) dnsmasq (20933:20933) < ioctl res=0
71108 19:36:51.401623408 0  (host) dnsmasq (20933:20933) < socket fd=58(<4>)
71109 19:36:51.401624563 0  (host) dnsmasq (20933:20933) > fcntl fd=58(<4>) cmd=4(F_GETFL)
71110 19:36:51.401625584 0  (host) dnsmasq (20933:20933) < fcntl res=2(/dev/null)
71111 19:36:51.401626259 0  (host) dnsmasq (20933:20933) > fcntl fd=58(<4>) cmd=5(F_SETFL)
71112 19:36:51.401626825 0  (host) dnsmasq (20933:20933) < fcntl res=0(/dev/null)
71113 19:36:51.401627787 0  (host) dnsmasq (20933:20933) > bind fd=58(<4>)
71129 19:36:51.401680355 0  (host) dnsmasq (20933:20933) < bind res=0 addr=0.0.0.0:22969
71130 19:36:51.401681698 0  (host) dnsmasq (20933:20933) > sendto fd=58(<4u>0.0.0.0:22969) size=60 tuple=0.0.0.0:22969->127.0.0.1:53
71131 19:36:51.401715726 0  (host) dnsmasq (20933:20933) < sendto res=60 data=
hello-app-httpmyprojectsvcclusterlocal
71469 19:36:51.402632442 1  (host) dnsmasq (20933:20933) > recvfrom fd=58(<4u>127.0.0.1:53->127.0.0.1:22969) size=5131
71474 19:36:51.402636604 1  (host) dnsmasq (20933:20933) < recvfrom res=114 data=
hello-app-httpmyprojectsvcclusterlocal)<*nsdns)
hostmaster)\`tp :< tuple=127.0.0.1:53->0.0.0.0:22969
71479 19:36:51.402643363 1  (host) dnsmasq (20933:20933) > sendmsg fd=6(<4u>10.128.0.81:44993->172.26.11.254:53) size=114 tuple=172.26.11.254:53->10.128.0.81:44993
71492 19:36:51.402666311 1  (host) dnsmasq (20933:20933) < sendmsg res=114 data=
hello-app-httpmyprojectsvcclusterlocal)<*nsdns)
hostmaster)\`tp :<
71493 19:36:51.402668199 1  (host) dnsmasq (20933:20933) > close fd=58(<4u>127.0.0.1:53->127.0.0.1:22969)
71494 19:36:51.402669009 1  (host) dnsmasq (20933:20933) < close res=0
80786 19:36:51.430143868 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < sendto res=60 data= hello-app-httpmyprojectsvcclusterlocal 80793 19:36:51.430153453 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > recvfrom fd=3(<4u>10.128.0.81:44993->172.26.11.254:53) size=512
80794 19:36:51.430158626 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < recvfrom res=114 data=
hello-app-httpmyprojectsvcclusterlocal)<*nsdns)
hostmaster)\`tp :< tuple=NULL 80795 19:36:51.430160257 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > close fd=3(<4u>10.128.0.81:44993->172.26.11.254:53)
80796 19:36:51.430161712 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < close res=0
80835 19:36:51.430260103 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < socket fd=3(<4>)
80838 19:36:51.430261013 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > connect fd=3(<4>)
80840 19:36:51.430269080 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < connect res=0 tuple=10.128.0.81:41405->172.26.11.254:53
80841 19:36:51.430271011 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > sendto fd=3(<4u>10.128.0.81:41405->172.26.11.254:53) size=60 tuple=NULL
80874 19:36:51.430433333 1  (host) dnsmasq (20933:20933) > recvmsg fd=6(<4u>10.128.0.81:44993->172.26.11.254:53)
80879 19:36:51.430439631 1  (host) dnsmasq (20933:20933) < recvmsg res=60 size=60 data= hello-app-httpmyprojectsvcclusterlocal tuple=10.128.0.81:41405->172.26.11.254:53
80881 19:36:51.430454839 1  (host) dnsmasq (20933:20933) > ioctl fd=6(<4u>10.128.0.81:41405->172.26.11.254:53) request=8910 argument=7FFE208E30C0
80885 19:36:51.430457716 1  (host) dnsmasq (20933:20933) < ioctl res=0
80895 19:36:51.430493317 1  (host) dnsmasq (20933:20933) < socket fd=58(<4>)
80896 19:36:51.430494522 1  (host) dnsmasq (20933:20933) > fcntl fd=58(<4>) cmd=4(F_GETFL)
80897 19:36:51.430495527 1  (host) dnsmasq (20933:20933) < fcntl res=2(/dev/null)
80898 19:36:51.430496189 1  (host) dnsmasq (20933:20933) > fcntl fd=58(<4>) cmd=5(F_SETFL)
80899 19:36:51.430496769 1  (host) dnsmasq (20933:20933) < fcntl res=0(/dev/null)
80900 19:36:51.430497538 1  (host) dnsmasq (20933:20933) > bind fd=58(<4>)
80913 19:36:51.430551876 1  (host) dnsmasq (20933:20933) < bind res=0 addr=0.0.0.0:64640
80914 19:36:51.430553226 1  (host) dnsmasq (20933:20933) > sendto fd=58(<4u>0.0.0.0:64640) size=60 tuple=0.0.0.0:64640->127.0.0.1:53
80922 19:36:51.430581962 1  (host) dnsmasq (20933:20933) < sendto res=60 data=
:=hello-app-httpmyprojectsvcclusterlocal
81032 19:36:51.430806106 1  (host) dnsmasq (20933:20933) > recvfrom fd=58(<4u>127.0.0.1:53->127.0.0.1:64640) size=5131
81035 19:36:51.430809074 1  (host) dnsmasq (20933:20933) < recvfrom res=76 data= :=hello-app-httpmyprojectsvcclusterlocal+o tuple=127.0.0.1:53->0.0.0.0:64640
81040 19:36:51.430818116 1  (host) dnsmasq (20933:20933) > sendmsg fd=6(<4u>10.128.0.81:41405->172.26.11.254:53) size=76 tuple=172.26.11.254:53->10.128.0.81:41405
81051 19:36:51.430840305 1  (host) dnsmasq (20933:20933) < sendmsg res=76 data=
hello-app-httpmyprojectsvcclusterlocal+o
81052 19:36:51.430842129 1  (host) dnsmasq (20933:20933) > close fd=58(<4u>127.0.0.1:53->127.0.0.1:64640)
81053 19:36:51.430842956 1  (host) dnsmasq (20933:20933) < close res=0
84676 19:36:51.436248790 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < sendto res=60 data= hello-app-httpmyprojectsvcclusterlocal 84683 19:36:51.436254334 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > recvfrom fd=3(<4u>10.128.0.81:41405->172.26.11.254:53) size=512
84684 19:36:51.436256892 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < recvfrom res=76 data= hello-app-httpmyprojectsvcclusterlocal+o tuple=NULL 84685 19:36:51.436264998 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > close fd=3(<4u>10.128.0.81:41405->172.26.11.254:53)
84686 19:36:51.436265743 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < close res=0
85420 19:36:51.437492301 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < socket fd=3(<4>)
85421 19:36:51.437493337 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > connect fd=3(<4>)
86222 19:36:51.438494771 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < connect res=0 tuple=10.128.0.81:39656->172.30.43.111:80
86226 19:36:51.438497506 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > fcntl fd=3(<4t>10.128.0.81:39656->172.30.43.111:80) cmd=4(F_GETFL)
86228 19:36:51.438498484 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < fcntl res=2(/dev/pts/1)
86229 19:36:51.438499943 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > ioctl fd=3(<4t>10.128.0.81:39656->172.30.43.111:80) request=5401 argument=7FFDBF5E434C
86233 19:36:51.438501658 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < ioctl res=-25(ENOTTY) 86242 19:36:51.438509833 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > write fd=3(<4t>10.128.0.81:39656->172.30.43.111:80) size=105
86285 19:36:51.438557309 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < write res=105 data= GET / HTTP/1.1 Host: hello-app-http.myproject.svc.cluster.local User-Agent: Wget Connection: close 86291 19:36:51.438561615 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > read fd=3(<4t>10.128.0.81:39656->172.30.43.111:80) size=4096
107714 19:36:51.478518400 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) < accept fd=6(<4t>10.128.0.81:39656->10.128.0.77:8080) tuple=10.128.0.81:39656->10.128.0.77:8080 queuepct=0 queuelen=0 queuemax=128
107772 19:36:51.478636516 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) > read fd=6(<4t>10.128.0.81:39656->10.128.0.77:8080) size=4096
107773 19:36:51.478640241 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) < read res=105 data= GET / HTTP/1.1 Host: hello-app-http.myproject.svc.cluster.local User-Agent: Wget Connection: close 107857 19:36:51.478817861 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) > write fd=6(<4t>10.128.0.81:39656->10.128.0.77:8080) size=153
107869 19:36:51.478870349 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) < write res=153 data= HTTP/1.1 200 OK Date: Sun, 10 Feb 2019 19:36:51 GMT Content-Length: 17 Content-Type: text/plain; charset=utf-8 Connection: close Hello OpenShift! 107886 19:36:51.478892928 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) > close fd=6(<4t>10.128.0.81:39656->10.128.0.77:8080)
107887 19:36:51.478893676 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) < close res=0
107899 19:36:51.478998208 0 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < read res=153 data= HTTP/1.1 200 OK Date: Sun, 10 Feb 2019 19:36:51 GMT Content-Length: 17 Content-Type: text/plain; charset=utf-8 Connection: close Hello OpenShift! 108908 19:36:51.480114626 0 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > close fd=3(<4t>10.128.0.81:39656->172.30.43.111:80)
108910 19:36:51.480115482 0 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < close res=0
112966 19:36:51.488041049 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11183:6) < accept fd=6(<4t>10.128.0.1:55052->10.128.0.77:8080) tuple=10.128.0.1:55052->10.128.0.77:8080 queuepct=0 queuelen=0 queuemax=128
113001 19:36:51.488096304 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11183:6) > read fd=6(<4t>10.128.0.1:55052->10.128.0.77:8080) size=4096
113002 19:36:51.488098693 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11183:6) < read res=0 data= 113005 19:36:51.488105730 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11183:6) > close fd=6(<4t>10.128.0.1:55052->10.128.0.77:8080)
113006 19:36:51.488106302 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11183:6) < close res=0

Below a list of some more useful sysdig cli examples:

# Sysdig Chisels and Filters:
sudo sysdig -cl

# To find out more information about a particular chisel:
sudo sysdig -i lscontainers

# To view a list of available field classes, fields and their description:
sudo sysdig -l

# Create and write sysdig trace files, 2nd option sets byte limit for trace file:
sudo sysdig -w mytrace.scap
sudo sysdig -s 8192 -w trace.scap 

# Read sysdig trace files, 2nd option read and filter based on proc.name:
sudo sysdig -r trace.scap
sudo sysdig -r trace.scap proc.name=dnsmasq

# Monitor linux processes:
sudo sysdig -c ps

# Monitor linux processes by CPU utilisation:
sudo sysdig -c topprocs_cpu

# Monitor network connections:
sudo sysdig -c netstat
sudo sysdig -c topconns
sudo sysdig -c topprocs_net

# Monitor system file i/o:
sudo sysdig -c echo_fds
sudo sysdig -c topprocs_file

# Troubleshoot system performance:
sudo sysdig -c bottlenecks

# Monitor process execution time
sudo sysdig -c proc_exec_time 

# Monitor network i/o performance
sudo sysdig -c netlower 1

# Watch log entries
sudo sysdig -c spy_logs

# Monitor http requests:
sudo sysdig -c httplog    
sudo sysdig -c httptop [Print Top HTTP Requests]

SysDig Monitor Enterprise

The paid enterprise version provides a web console to centrally access metrics and events from your fleet of monitored nodes.

You can run SysDig enterprise directly on OpenShift as DaemonSet and deploy the agent to all nodes in the cluster. For more detailed information about Kubernetes or OpenShift installation, read the official documentation.

oc adm new-project sysdig-agent --node-selector='app=sysdig-agent'
oc project sysdig-agent
oc label node --all "app=sysdig-agent"
oc create serviceaccount sysdig-agent
oc adm policy add-scc-to-user privileged -n sysdig-agent -z sysdig-agent
oc adm policy add-cluster-role-to-user cluster-reader -n sysdig-agent -z sysdig-agent

wget https://raw.githubusercontent.com/draios/sysdig-cloud-scripts/master/agent_deploy/kubernetes/sysdig-agent-daemonset-v2.yaml
wget https://raw.githubusercontent.com/draios/sysdig-cloud-scripts/master/agent_deploy/kubernetes/sysdig-agent-configmap.yaml
oc create secret generic sysdig-agent --from-literal=access-key=<-YOUR-ACCESS-KEY->

# Edit sysdig-agent-daemonset-v2.yaml to uncomment the line: serviceAccount: sysdig-agent and edit sysdig-agent-configmap.yaml to uncomment the line: new_k8s: true
# This allows kube-state-metrics to be automatically detected, monitored, and displayed in Sysdig Monitor. 
# Edit sysdig-agent-configmap.yaml to uncomment the line: k8s_cluster_name: and add your cluster name.

oc create -f sysdig-agent-daemonset-v2.yaml
oc create -f sysdig-agent-configmap.yaml

SysDig is a great tool to monitor and even further provides you the possibility to troubleshoot in depth your linux hosts and container platforms.