Getting started with OpenShift 4.0 Container Platform

I had a first look at OpenShift 4.0 and I wanted to share some information from what I have seen so far. The installation of the cluster is super easy and RedHat did a lot to improve the overall experience of the installation process to the previous OpenShift v3.x Ansible based installation and moving towards ephemeral cluster deployments.

There are a many changes under the hood and it’s not as obvious as Bootkube for the self-hosted/healing control-plane, MachineSets and the many internal operators to install and manage the OpenShift components ( api serverscheduler, controller manager, cluster-autoscalercluster-monitoringweb-consolednsingressnetworkingnode-tuning, and authentication ).

For the OpenShift 4.0 developer preview you need an RedHat account because you require a pull-secret for the cluster installation. For more information please visit:

First we need to download the openshift-installer binary:

mv openshift-install-linux-amd64 openshift-install
chmod +x openshift-install

Then we create the install-configuration, it is required that you already have AWS account credentials and an Route53 DNS domain set-up:

$ ./openshift-install create install-config
INFO Platform aws
INFO AWS Access Key ID *********
INFO AWS Secret Access Key [? for help] *********
INFO Writing AWS credentials to "/home/centos/.aws/credentials" (
INFO Region eu-west-1
INFO Base Domain
INFO Cluster Name cluster1
INFO Pull Secret [? for help] *********

Let’s look at the install-config.yaml

apiVersion: v1beta4
- name: worker
  platform: {}
  replicas: 3
  name: master
  platform: {}
  replicas: 3
  creationTimestamp: null
  name: ew1
  - cidr:
    hostPrefix: 23
  networkType: OpenShiftSDN
    region: eu-west-1
pullSecret: '{"auths":{...}'

Now we can continue to create the OpenShift v4 cluster which takes around 30mins to complete. At the end of the openshift-installer you see the auto-generate credentials to connect to the cluster:

$ ./openshift-install create cluster
INFO Consuming "Install Config" from target directory
INFO Creating infrastructure resources...
INFO Waiting up to 30m0s for the Kubernetes API at
INFO API v1.12.4+0ba401e up
INFO Waiting up to 30m0s for the bootstrap-complete event...
INFO Destroying the bootstrap resources...
INFO Waiting up to 30m0s for the cluster at to initialize...
INFO Waiting up to 10m0s for the openshift-console route to be created...
INFO Install complete!
INFO Run 'export KUBECONFIG=/home/centos/auth/kubeconfig' to manage the cluster with 'oc', the OpenShift CLI.
INFO The cluster is ready when 'oc login -u kubeadmin -p jMTSJ-F6KYy-mVVZ4-QVNPP' succeeds (wait a few minutes).
INFO Access the OpenShift web-console here:
INFO Login to the console with user: kubeadmin, password: jMTSJ-F6KYy-mVVZ4-QVNPP

The web-console has a very clean new design which I really like in addition to all the great improvements.

Under administration -> cluster settings you can explore the new auto-upgrade functionality of OpenShift 4.0:

You choose the new version to upgrade and everything else happens in the background which is a massive improvement to OpenShift v3.x where you had to run the ansible installer for this.

In the background the cluster operator upgrades the different platform components one by one.

Slowly you will see that the components move to the new build version.

Finished cluster upgrade:

You can only upgrade from one version 4.0.0-0.9 to the next version 4.0.0-0.10. It is not possible to upgrade and go straight from x-0.9 to x-0.11.

But let’s deploy the Google Hipster Shop example and expose the frontend-external service for some more testing:

oc login -u kubeadmin -p jMTSJ-F6KYy-mVVZ4-QVNPP --insecure-skip-tls-verify=true
oc new-project myproject
oc create -f
oc expose svc frontend-external

Getting the hostname for the exposed service:

$ oc get route
NAME                HOST/PORT                                                   PATH      SERVICES            PORT      TERMINATION   WILDCARD
frontend-external             frontend-external   http                    None

Use the browser to connect to our Hipster Shop:

It’s also very easy to destroy the cluster as it is to create it, as you seen previously:

$ ./openshift-install destroy cluster
INFO Disassociated                                 arn="arn:aws:ec2:eu-west-1:552276840222:route-table/rtb-083e2da5d1183efa7" id=rtbassoc-01d27db162fa45402
INFO Disassociated                                 arn="arn:aws:ec2:eu-west-1:552276840222:route-table/rtb-083e2da5d1183efa7" id=rtbassoc-057f593640067efc0
INFO Disassociated                                 arn="arn:aws:ec2:eu-west-1:552276840222:route-table/rtb-083e2da5d1183efa7" id=rtbassoc-05e821b451bead18f
INFO Disassociated                                 IAM instance profile="arn:aws:iam::552276840222:instance-profile/ocp4-bgx4c-worker-profile" arn="arn:aws:ec2:eu-west-1:552276840222:instance/i-0f64a911b1ffa3eff" id=i-0f64a911b1ffa3eff name=ocp4-bgx4c-worker-profile role=ocp4-bgx4c-worker-role
INFO Deleted                                       IAM instance profile="arn:aws:iam::552276840222:instance-profile/ocp4-bgx4c-worker-profile" arn="arn:aws:ec2:eu-west-1:552276840222:instance/i-0f64a911b1ffa3eff" id=i-0f64a911b1ffa3eff name=0xc00090f9a8
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:instance/i-0f64a911b1ffa3eff" id=i-0f64a911b1ffa3eff
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:instance/i-00b5eedc186ba26a7" id=i-00b5eedc186ba26a7
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:security-group/sg-016d4c7d435a1c97f" id=sg-016d4c7d435a1c97f
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:subnet/subnet-076348368858e9a82" id=subnet-076348368858e9a82
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:vpc/vpc-00c611ae1b9b8e10a" id=vpc-00c611ae1b9b8e10a
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:dhcp-options/dopt-0ce8b6a1c31e0ceac" id=dopt-0ce8b6a1c31e0ceac

The install experience is great for OpenShift 4.0 which makes it very easy for everyone to create and get started quickly with an enterprise container platform. From the operational perspective I still need to see how to run the new platform because all the operators are great and makes it an easy to use cluster but what happens when one of the operators goes rogue and debugging this I am most interested in.

Over the coming weeks I will look into more detail around OpenShift 4.0 and the different new features, I am especially interested in Service Mesh.

Typhoon Kubernetes Distribution

I stumbled across a very interesting Kubernetes distribution called Typhoon which runs a self-hosted control-plane using Bootkube running on CoreOS. Typhoon uses Terraform to deploy the required instances on various cloud providers or on bare-metal servers. I really like the concept of a minimal Kubernetes distribution and a simple bootstrap to deploy a full featured cluster in a few minutes. Check out the official Typhoon website or their Github repository for more information.

To install Typhoon I followed the documentation, everything is pretty simple with a bit of Terraform knowledge. Here’s my Github repository with my cluster configuration:

Before you start you need to install Terraform v0.11.x and terraform-provider-ct, and setup a AWS Route53 domain for the Kubernetes cluster.

I created a new subdomain on Route53 and configured delegation on CloudFlare for the domain.

Let’s checkout the configuration, first the which I have modified slightly because I use Jenkins to deploy the Kubernetes cluster.

module "aws-cluster" {
  source = "git::"

  providers = {
    aws = "aws.default"
    local = "local.default"
    null = "null.default"
    template = "template.default"
    tls = "tls.default"

  # AWS
  cluster_name = "typhoon"
  dns_zone     = "${var.dns}"
  dns_zone_id  = "${var.dns_id}"

  # configuration
  ssh_authorized_key = "${var.ssh_key}"
  asset_dir          = "./.secrets/clusters/typhoon"

  # optional
  worker_count = 2
  worker_type  = "t3.small"

In the I have only added S3 to be used for the Terraform backend state but otherwise I’ve left the defaults.

provider "aws" {
  version = "~> 1.13.0"
  alias   = "default"

  region  = "eu-west-1"

terraform {
  backend "s3" {
    bucket = "techbloc-terraform-data"
    key    = "openshift-311"
    region = "eu-west-1"


I added a file for the DNS and SSH variables.

variable "dns" {
variable "dns_id" {
variable "ssh_key" {

Let’s have a quick look at my simple Jenkins pipeline to deploy Typhoon Kubernetes. Apart from installing Kubernetes I am deploying the Nginx Ingress controller and  Heapster addons for the cluster. I’ve also added an example application I have used previously after deploying the cluster.

pipeline {
    agent any
    environment {
        AWS_ACCESS_KEY_ID = credentials('AWS_ACCESS_KEY_ID')
        TF_VAR_dns = credentials('TF_VAR_dns')
        TF_VAR_dns_id = credentials('TF_VAR_dns_id')
        TF_VAR_ssh_key = credentials('TF_VAR_ssh_key')
    stages {
        stage('Prepare workspace') {
            steps {
                sh 'rm -rf *'
                git branch: 'aws', url: ''
                sh 'terraform init'
        stage('terraform apply') {
            steps {
                sshagent (credentials: ['fcdca8fa-aab9-3846-832f-4756392b7e2c']) {
                    sh 'terraform apply -auto-approve'
                    sh 'sleep 30'
        stage('deploy nginx-ingress and heapster') {
            steps {    
                sh 'kubectl apply -R -f ./nginx-ingress/ --kubeconfig=./.secrets/clusters/typhoon/auth/kubeconfig'
                sh 'kubectl apply -R -f ./heapster/ --kubeconfig=./.secrets/clusters/typhoon/auth/kubeconfig'
                sh 'sleep 30'
        stage('deploy example application') {
            steps {    
                sh 'kubectl apply -f ./example/hello-kubernetes.yml --kubeconfig=./.secrets/clusters/typhoon/auth/kubeconfig'
        stage('Run terraform destroy') {
            steps {
                input 'Run terraform destroy?'
        stage('terraform destroy') {
            steps {
                sshagent (credentials: ['fcdca8fa-aab9-3846-832f-4756392b7e2c']) {
                    sh 'terraform destroy -force'

Let’s start the Jenkins pipeline:

Let’s check if I can access the hello-kubernetes application. For everyone who is interested, this is the link to the Github repository for the hello-kubernetes example application I have used.

I really like the Typhoon Kubernetes distribution and the work that went into it to create a easy way for everyone to install a Kubernetes cluster and start using it in a few minutes. I also find the way they’ve used Terraform and Bootkube to deploy the platform on CoreOS very inspiring and it gave me some ideas how I can make use of it for production clusters.

I actually like CoreOS and the easy bootstrapping with Terraform and Bootkube which I have not used before, I’ve always deployed OpenShift/Kubernetes on either RedHat or CentOS with Ansible, and find it a very interesting way to deploy a Kubernetes platform.

OpenShift Networking and Network Policies

This article is about OpenShift networking in general but I also want to look at the Kubernetes CNI feature NetworkPolicy in a bit more detail. The latest OpenShift version 3.11 comes with three SDN deployment models:

  • ovs-subnet – This creates a single large vxlan between all the namespace and everyone is able to talk to each other.
  • ovs-multitenant – As the name already says this separates the namespaces into separate vxlan’s and only resources within the namespace are able to talk to each other. You have the possibility to join or making namespaces global.
  • ovs-networkpolicy – The newest SDN deployment method for OpenShift to enabling micro-segmentation to control the communication between pods and namespaces.
  • ovs-ovn – Next generation SDN for OpenShift but not yet officially released for OpenShift. For more information visit the OpenvSwitch Github repository ovn-kubernetes.

Here an overview of the common ovs-multitenant software defined network:

On an OpenShift node the tun0 interfaces owns the default gateway and is forwarding traffic to external endpoints outside the OpenShift platform or routing internal traffic to the openvswitch overlay. Both openvswitch and iptables are central components which are very important for the networking  on the platform.

Read the official OpenShift documentation managing networking or configuring the SDN for more information.

NetworkPolicy in Action

Let me first explain the example I use to test NetworkPolicy. We will have one hello-openshift pod behind service, and a busybox pod for testing the internal communication. I will create a default ingress deny policy and specifically allow tcp port 8080 to my hello-openshift pod. I am not planning to restrict the busybox pod with an egress policy, so all egress traffic is allowed.

Here you find the example yaml files to replicate the layout: busybox.yml and hello-openshift.yml

Short recap about Kubernetes service definition, they are just simple iptables entries and for this reason you cannot restrict them with NetworkPolicy.

[[email protected] ~]# iptables-save | grep
-A KUBE-SERVICES ! -s -d -p tcp -m comment --comment "myproject/hello-app-http:web cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d -p tcp -m comment --comment "myproject/hello-app-http:web cluster IP" -m tcp --dport 80 -j KUBE-SVC-LFWXBQW674LJXLPD
[[email protected] ~]#

When you install OpenShift with ovs-networkpolicy, the default policy allows all traffic within a namespace. Let’s do a first test without a custom NetworkPolicy rule to see if I am able to connect to my hello-app-http service.

[[email protected] ~]# oc exec busybox-1-wn592 -- wget -S --spider http://hello-app-http
Connecting to hello-app-http (
  HTTP/1.1 200 OK
  Date: Tue, 19 Feb 2019 13:59:04 GMT
  Content-Length: 17
  Content-Type: text/plain; charset=utf-8
  Connection: close

[[email protected] ~]#

Now we add a default ingress deny policy to the namespace:

kind: NetworkPolicy
  name: deny-all-ingress
  ingress: []

After applying the default deny policy you are not able to connect to the hello-app-http service. The connection is timing out because no flows entries are defined yet in the OpenFlow table:

[[email protected] ~]# oc exec busybox-1-wn592 -- wget -S --spider http://hello-app-http
Connecting to hello-app-http (
wget: can't connect to remote host ( Connection timed out
command terminated with exit code 1
[[email protected] ~]#

Let’s add a new policy and allow tcp port 8080 and specifying a podSelector to match all pods with the label “role: web”.

kind: NetworkPolicy
  name: allow-tcp8080
      role: web
  - ports:
    - protocol: TCP
      port: 8080

This alone doesn’t do anything, you still need to patch the deployment config and add the label “role: web” to your deployment config metadata information.

oc patch dc/hello-app-http --patch '{"spec":{"template":{"metadata":{"labels":{"role":"web"}}}}}'

To rollback the previous changes simply use the ‘oc rollback dc/hello-app-http’ command.

Now let’s check the openvswitch flow table and you will see that a new flow got added with the destination of my hello-openshift pod on port 8080.

Afterwards we try again to connect to my hello-app-http service and you see that we get a succesful connect:

[[email protected] ~]# oc exec ovs-q4p8m -n openshift-sdn -- ovs-ofctl -O OpenFlow13 dump-flows br0 | grep '*8080'
 cookie=0x0, duration=221.251s, table=80, n_packets=15, n_bytes=1245, priority=150,tcp,reg1=0x2dfc74,nw_dst=,tp_dst=8080 actions=output:NXM_NX_REG2[]
[[email protected] ~]#
[[email protected] ~]# oc exec busybox-1-wn592 -- wget -S --spider http://hello-app-http
Connecting to hello-app-http (
  HTTP/1.1 200 OK
  Date: Tue, 19 Feb 2019 14:21:57 GMT
  Content-Length: 17
  Content-Type: text/plain; charset=utf-8
  Connection: close

[[email protected] ~]#

The hello openshift container publishes two tcp ports 8080 and 8888, so finally let’s try to connect to the pod IP address on port 8888, and we will find out that I am not able to connect, the reason is that I only allowed 8080 in the policy.

[[email protected] ~]# oc exec busybox-1-wn592 -- wget -S --spider
Connecting to (
wget: can't connect to remote host ( Connection timed out
command terminated with exit code 1
[[email protected] ~]#

There are great posts on the RedHat OpenShift blog which you should checkout networkpolicies-and-microsegmentation and openshift-and-network-security-zones-coexistence-approaches. Otherwise I can recommend having a look at Ahmet Alp Balkan Github repository about Kubernetes network policy recipes, where you can find some good examples.

Host and Container Monitoring with SysDig

After my previous articles about troubleshooting and to validate OpenShift using Ansible, I wanted to continue and show how SysDig is helping you to identify potentials issues on your nodes or container platform before they occur.

The open source version is a simple but very powerful tool to inspect your linux host via the command line but it has no capabilities to centrally monitor or store capture information. The enterprise version provides these capabilities like a web console and centrally stores metrics, it is also able to trigger remote captures without the need to connect to the host.

Sysdig Open Source

Let’s install sysdig open source, here the official SysDig installation guide.

# Host install
curl -s | sudo bash

# Alternatively the container based install
yum -y install kernel-devel-$(uname -r)
docker pull sysdig/sysdig
docker run -i -t --name sysdig --privileged -v /var/run/docker.sock:/host/var/run/docker.sock -v /dev:/host/dev -v /proc:/host/proc:ro -v /boot:/host/boot:ro -v /lib/modules:/host/lib/modules:ro -v /usr:/host/usr:ro sysdig/sysdig

The csysdig command is nice and user friendly menu driven interface to see real-time system call information of your host. To collect information from Kubernetes or OpenShift please use the option [-kK] like seen in the example below:

csysdig -k https://localhost:8443 -K /etc/origin/master/admin.crt:/etc/origin/master/admin.key

For more information about how to use csysdig please have a look at the manual or watch the short Youtube video.

The main sysdig command is showing output directly in the terminal session and you are able to apply filters (chisels) to more granularly see the system calls. Like with csysdig, the option [-kK] enabled Kubernetes integration:

sysdig -k https://localhost:8443 -K /etc/origin/master/admin.crt:/etc/origin/master/admin.key

Here some useful commands to inspect Kubernetes or OpenShift events:

# Monitor Kubernetes namespace ip communication:
sudo sysdig -A -s8192 "fd.type in (ipv4, ipv6) and (<-NAMESPACE-NAME->)" -k https://localhost:8443 -K /etc/origin/master/admin.crt:/e/origin/master/admin.key

# Monitor namespace and pod name, the 2nd command filters to only show GET requests:
sudo sysdig -A -s8192 "fd.type in (ipv4, ipv6) and (<-NAMESPACE-NAME-> and<-POD-NAME->)" -k https://localhost:8443 -K /etc/origin/master/admin.crt:/etc/origin/master/admin.key
sudo sysdig -A -s8192 "fd.type in (ipv4, ipv6) and (<-NAMESPACE-NAME-> and<-POD-NAME->) and evt.buffer contai GET" -k https://localhost:8443 -K /etc/origin/master/admin.crt:/etc/origin/master/admin.key 

# Monitor ns and pod names and apply chisel echo_fds:
sudo sysdig -A -s8192 "fd.type in (ipv4, ipv6) and (<-NAMESPACE-NAME-> and<-POD-NAME->)" -c echo_fds -k https://localhost:8443 -K /etc/origin/master/admin.crt:/etc/origin/master/admin.key

SysDig example

This capture is an http request between an busybox pod (name: busybox-2-hjhq8 ip: via service (name: hello-app-http ip: to the hello-openshift pod (name: hello-app-http-1-8v57x ip: in the namespace myproject. I use a simple “wget -S –spider http://hello-app-http/” to simulate the request:

# Command to capture ip communication in myproject namespace including dnsmasq and wget processes:
sudo sysdig -s2000 -A -pk "fd.type in (ipv4, ipv6) and ( or or" -k https://localhost:8443 -K /etc/origin/master/admin.crt:/etc/origin/master/admin.key

# Output:
70739 19:36:51.401062017 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < socket fd=3(<4>)
70741 19:36:51.401062878 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > connect fd=3(<4>)
70748 19:36:51.401072194 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < connect res=0 tuple=>
70749 19:36:51.401074599 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > sendto fd=3(<4u>> size=60 tuple=NULL
71083 19:36:51.401575859 0  (host) dnsmasq (20933:20933) > recvmsg fd=6(<4u>
71087 19:36:51.401582008 0  (host) dnsmasq (20933:20933) < recvmsg res=60 size=60 data= hello-app-httpmyprojectsvcclusterlocal tuple=>
71088 19:36:51.401584101 0  (host) dnsmasq (20933:20933) > ioctl fd=6(<4u>> request=8910 argument=7FFE208E30C0
71089 19:36:51.401586692 0  (host) dnsmasq (20933:20933) < ioctl res=0
71108 19:36:51.401623408 0  (host) dnsmasq (20933:20933) < socket fd=58(<4>)
71109 19:36:51.401624563 0  (host) dnsmasq (20933:20933) > fcntl fd=58(<4>) cmd=4(F_GETFL)
71110 19:36:51.401625584 0  (host) dnsmasq (20933:20933) < fcntl res=2(/dev/null)
71111 19:36:51.401626259 0  (host) dnsmasq (20933:20933) > fcntl fd=58(<4>) cmd=5(F_SETFL)
71112 19:36:51.401626825 0  (host) dnsmasq (20933:20933) < fcntl res=0(/dev/null)
71113 19:36:51.401627787 0  (host) dnsmasq (20933:20933) > bind fd=58(<4>)
71129 19:36:51.401680355 0  (host) dnsmasq (20933:20933) < bind res=0 addr=
71130 19:36:51.401681698 0  (host) dnsmasq (20933:20933) > sendto fd=58(<4u> size=60 tuple=>
71131 19:36:51.401715726 0  (host) dnsmasq (20933:20933) < sendto res=60 data=
71469 19:36:51.402632442 1  (host) dnsmasq (20933:20933) > recvfrom fd=58(<4u>> size=5131
71474 19:36:51.402636604 1  (host) dnsmasq (20933:20933) < recvfrom res=114 data=
hostmaster)\`tp :< tuple=>
71479 19:36:51.402643363 1  (host) dnsmasq (20933:20933) > sendmsg fd=6(<4u>> size=114 tuple=>
71492 19:36:51.402666311 1  (host) dnsmasq (20933:20933) < sendmsg res=114 data=
hostmaster)\`tp :<
71493 19:36:51.402668199 1  (host) dnsmasq (20933:20933) > close fd=58(<4u>>
71494 19:36:51.402669009 1  (host) dnsmasq (20933:20933) < close res=0
80786 19:36:51.430143868 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < sendto res=60 data= hello-app-httpmyprojectsvcclusterlocal 80793 19:36:51.430153453 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > recvfrom fd=3(<4u>> size=512
80794 19:36:51.430158626 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < recvfrom res=114 data=
hostmaster)\`tp :< tuple=NULL 80795 19:36:51.430160257 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > close fd=3(<4u>>
80796 19:36:51.430161712 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < close res=0
80835 19:36:51.430260103 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < socket fd=3(<4>)
80838 19:36:51.430261013 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > connect fd=3(<4>)
80840 19:36:51.430269080 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < connect res=0 tuple=>
80841 19:36:51.430271011 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > sendto fd=3(<4u>> size=60 tuple=NULL
80874 19:36:51.430433333 1  (host) dnsmasq (20933:20933) > recvmsg fd=6(<4u>>
80879 19:36:51.430439631 1  (host) dnsmasq (20933:20933) < recvmsg res=60 size=60 data= hello-app-httpmyprojectsvcclusterlocal tuple=>
80881 19:36:51.430454839 1  (host) dnsmasq (20933:20933) > ioctl fd=6(<4u>> request=8910 argument=7FFE208E30C0
80885 19:36:51.430457716 1  (host) dnsmasq (20933:20933) < ioctl res=0
80895 19:36:51.430493317 1  (host) dnsmasq (20933:20933) < socket fd=58(<4>)
80896 19:36:51.430494522 1  (host) dnsmasq (20933:20933) > fcntl fd=58(<4>) cmd=4(F_GETFL)
80897 19:36:51.430495527 1  (host) dnsmasq (20933:20933) < fcntl res=2(/dev/null)
80898 19:36:51.430496189 1  (host) dnsmasq (20933:20933) > fcntl fd=58(<4>) cmd=5(F_SETFL)
80899 19:36:51.430496769 1  (host) dnsmasq (20933:20933) < fcntl res=0(/dev/null)
80900 19:36:51.430497538 1  (host) dnsmasq (20933:20933) > bind fd=58(<4>)
80913 19:36:51.430551876 1  (host) dnsmasq (20933:20933) < bind res=0 addr=
80914 19:36:51.430553226 1  (host) dnsmasq (20933:20933) > sendto fd=58(<4u> size=60 tuple=>
80922 19:36:51.430581962 1  (host) dnsmasq (20933:20933) < sendto res=60 data=
81032 19:36:51.430806106 1  (host) dnsmasq (20933:20933) > recvfrom fd=58(<4u>> size=5131
81035 19:36:51.430809074 1  (host) dnsmasq (20933:20933) < recvfrom res=76 data= :=hello-app-httpmyprojectsvcclusterlocal+o tuple=>
81040 19:36:51.430818116 1  (host) dnsmasq (20933:20933) > sendmsg fd=6(<4u>> size=76 tuple=>
81051 19:36:51.430840305 1  (host) dnsmasq (20933:20933) < sendmsg res=76 data=
81052 19:36:51.430842129 1  (host) dnsmasq (20933:20933) > close fd=58(<4u>>
81053 19:36:51.430842956 1  (host) dnsmasq (20933:20933) < close res=0
84676 19:36:51.436248790 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < sendto res=60 data= hello-app-httpmyprojectsvcclusterlocal 84683 19:36:51.436254334 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > recvfrom fd=3(<4u>> size=512
84684 19:36:51.436256892 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < recvfrom res=76 data= hello-app-httpmyprojectsvcclusterlocal+o tuple=NULL 84685 19:36:51.436264998 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > close fd=3(<4u>>
84686 19:36:51.436265743 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < close res=0
85420 19:36:51.437492301 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < socket fd=3(<4>)
85421 19:36:51.437493337 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > connect fd=3(<4>)
86222 19:36:51.438494771 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < connect res=0 tuple=>
86226 19:36:51.438497506 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > fcntl fd=3(<4t>> cmd=4(F_GETFL)
86228 19:36:51.438498484 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < fcntl res=2(/dev/pts/1)
86229 19:36:51.438499943 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > ioctl fd=3(<4t>> request=5401 argument=7FFDBF5E434C
86233 19:36:51.438501658 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < ioctl res=-25(ENOTTY) 86242 19:36:51.438509833 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > write fd=3(<4t>> size=105
86285 19:36:51.438557309 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < write res=105 data= GET / HTTP/1.1 Host: hello-app-http.myproject.svc.cluster.local User-Agent: Wget Connection: close 86291 19:36:51.438561615 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > read fd=3(<4t>> size=4096
107714 19:36:51.478518400 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) < accept fd=6(<4t>> tuple=> queuepct=0 queuelen=0 queuemax=128
107772 19:36:51.478636516 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) > read fd=6(<4t>> size=4096
107773 19:36:51.478640241 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) < read res=105 data= GET / HTTP/1.1 Host: hello-app-http.myproject.svc.cluster.local User-Agent: Wget Connection: close 107857 19:36:51.478817861 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) > write fd=6(<4t>> size=153
107869 19:36:51.478870349 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) < write res=153 data= HTTP/1.1 200 OK Date: Sun, 10 Feb 2019 19:36:51 GMT Content-Length: 17 Content-Type: text/plain; charset=utf-8 Connection: close Hello OpenShift! 107886 19:36:51.478892928 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) > close fd=6(<4t>>
107887 19:36:51.478893676 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) < close res=0
107899 19:36:51.478998208 0 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < read res=153 data= HTTP/1.1 200 OK Date: Sun, 10 Feb 2019 19:36:51 GMT Content-Length: 17 Content-Type: text/plain; charset=utf-8 Connection: close Hello OpenShift! 108908 19:36:51.480114626 0 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > close fd=3(<4t>>
108910 19:36:51.480115482 0 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < close res=0
112966 19:36:51.488041049 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11183:6) < accept fd=6(<4t>> tuple=> queuepct=0 queuelen=0 queuemax=128
113001 19:36:51.488096304 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11183:6) > read fd=6(<4t>> size=4096
113002 19:36:51.488098693 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11183:6) < read res=0 data= 113005 19:36:51.488105730 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11183:6) > close fd=6(<4t>>
113006 19:36:51.488106302 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11183:6) < close res=0

Below a list of some more useful sysdig cli examples:

# Sysdig Chisels and Filters:
sudo sysdig -cl

# To find out more information about a particular chisel:
sudo sysdig -i lscontainers

# To view a list of available field classes, fields and their description:
sudo sysdig -l

# Create and write sysdig trace files, 2nd option sets byte limit for trace file:
sudo sysdig -w mytrace.scap
sudo sysdig -s 8192 -w trace.scap 

# Read sysdig trace files, 2nd option read and filter based on
sudo sysdig -r trace.scap
sudo sysdig -r trace.scap

# Monitor linux processes:
sudo sysdig -c ps

# Monitor linux processes by CPU utilisation:
sudo sysdig -c topprocs_cpu

# Monitor network connections:
sudo sysdig -c netstat
sudo sysdig -c topconns
sudo sysdig -c topprocs_net

# Monitor system file i/o:
sudo sysdig -c echo_fds
sudo sysdig -c topprocs_file

# Troubleshoot system performance:
sudo sysdig -c bottlenecks

# Monitor process execution time
sudo sysdig -c proc_exec_time 

# Monitor network i/o performance
sudo sysdig -c netlower 1

# Watch log entries
sudo sysdig -c spy_logs

# Monitor http requests:
sudo sysdig -c httplog    
sudo sysdig -c httptop [Print Top HTTP Requests] 

SysDig Monitor Enterprise

The paid enterprise version provides a web console to centrally access metrics and events from your fleet of monitored nodes.

You can run SysDig enterprise directly on OpenShift as DaemonSet and deploy the agent to all nodes in the cluster. For more detailed information about Kubernetes or OpenShift installation, read the official documentation.

oc adm new-project sysdig-agent --node-selector='app=sysdig-agent'
oc project sysdig-agent
oc label node --all "app=sysdig-agent"
oc create serviceaccount sysdig-agent
oc adm policy add-scc-to-user privileged -n sysdig-agent -z sysdig-agent
oc adm policy add-cluster-role-to-user cluster-reader -n sysdig-agent -z sysdig-agent

oc create secret generic sysdig-agent --from-literal=access-key=<-YOUR-ACCESS-KEY->

# Edit sysdig-agent-daemonset-v2.yaml to uncomment the line: serviceAccount: sysdig-agent and edit sysdig-agent-configmap.yaml to uncomment the line: new_k8s: true
# This allows kube-state-metrics to be automatically detected, monitored, and displayed in Sysdig Monitor. 
# Edit sysdig-agent-configmap.yaml to uncomment the line: k8s_cluster_name: and add your cluster name.

oc create -f sysdig-agent-daemonset-v2.yaml
oc create -f sysdig-agent-configmap.yaml

SysDig is a great tool to monitor and even further provides you the possibility to troubleshoot in depth your linux hosts and container platforms.

OpenShift Container Platform Troubleshooting Guide

On the first look OpenShift/Kubernetes seems like a very complex platform but once you start to get to know the different components and what they are doing, you will see it gets easier and easier. The purpose of this article to give you an every day guide based on my experience on how to successfully troubleshoot issues on OpenShift.

  • OpenShift service logging
# OpenShift 3.1 to OpenShift 3.9:

# OpenShift 3.10 and later versions:
/etc/origin/master/master.env # for API and Controllers

The log levels for the OpenShift services can be controlled via the –loglevel parameter in the service options.

0 – Errors and warnings only
2 – Normal information
4 – Debugging information
6 – API- debugging information (request / response)
8 – Body API debugging information

For example add or edit the line in /etc/sysconfig/atomic-openshift-node to OPTIONS=’–loglevel=4′ and afterward restart the service with systemctl restart atomic-openshift-node.

Viewing OpenShift service logs:

# OpenShift 3.1 to OpenShift 3.9:
journalctl -u atomic-openshift-master-api
journalctl -u atomic-openshift-master-controllers
journalctl -u atomic-openshift-node
journalctl -u etcd # or 'etcd_container' for containerized install

# OpenShift 3.10 and later versions:
/usr/local/bin/master-logs api api
/usr/local/bin/master-logs controllers controllers
/usr/local/bin/master-logs etcd etcd
journalctl -u atomic-openshift-node
  • Docker service logging

Change the docker daemon log level and add the parameter –log-level for the OPTIONS variable in dockers service file located in /etc/sysconfig/docker.

The available log levels are: ( debug, info, warn, error, fatal )

See the example below; to enable debug logging in /etc/sysconfig/docker to set log level equal to debug (After making the changes on the docker service you need to will restart with systemctl restart docker.):

OPTIONS='--insecure-registry= --selinux-enabled --log-level=debug'
  • OC command logging

The oc and oadm command also accept a loglevel option that can help get additional information. Value between 6 and 8 will provide extensive logging, API requests (loglevel 6), API headers (loglevel 7) and API responses received (loglevel 8):

oc whoami --loglevel=8
  • OpenShift SkyDNS

SkyDNS is the internal service discovery for OpenShift and DNS is important for OpenShift to function:

# Test full qualified cluster domain name
nslookup docker-registry.default.svc.cluster.local
# OR
dig +short docker-registry.default.svc.cluster.local

# Check if clusterip match the previous result
oc get svc/docker-registry -n default

# Test short name
nslookup docker-registry.default.svc
nslookup <endpoint-name>.<project-name>.svc

If short name doesn’t work look out if cluster.local is missing in dns search suffix. If resolution doesn’t work at all before enable debug logging, check if Dnsmasq service running and correctly configured. OpenShift uses a dispatcher script to maintain the DNS configuration of a node.

Add the options “–logspec ‘dns=10’” to the /etc/sysconfig/atomic-openshift-node service configuration on a node running skydns and restart the atomic-openshift-node service afterwards. There will then be skydns debug information in the journalctl logs.

OPTIONS="--loglevel=2 --logspec dns*=10"
  • OpenShift Master API and Web Console

In the following example, the is used by the internal cluster, and the is used by external clients

# Run the following commands on any node host
curl -k

# The OpenShift API service runs on all master instances. To see the status of the service, view the master-api pods in the kube-system project:
oc get pod -n kube-system -l
oc get pod -n kube-system -o wide
curl -k --insecure https://$HOSTNAME:8443/healthz
  • OpenShift Controller role

The OpenShift Container Platform controller service is available on all master nodes. The service runs in active/passive mode, which means it should only be running on one master.

# Verify the master host running the controller service
oc get -n kube-system cm openshift-master-controllers -o yaml
    • OpenShift Certificates

During the installation of OpenShift the playbooks generates a CA to sign every certificate in the cluster. One of the most common issues are expired node certificates. Below are a list of important certificate files:

# Is the OpenShift Certificate Authority, and it signs every other certificate unless specified otherwise.

# Contains a bundle with the current and the old CA's (if exists) to trust them all. If there has been only one ca.crt, then this file is the same as ca.crt.

# The internal API, also known as cluster internal address or the variable masterURL here all the internal components authenticates to access the API, such as nodes, routers and other services.

# Master controller certificate authenticates to kubernetes as a client using the admin.kubeconfig

# Node certificates
/etc/origin/node/ca.crt                   # to be able to trust the API, a copy of masters CA bundle is placed in:
/etc/origin/node/server.crt               # to secure this communication
/etc/origin/node/system:node:{fqdn}.crt   # Nodes needs to authenticate to the Kubernetes API as a client. 

# Etcd certificates
/etc/etcd/ca.crt                          # is the etcd CA, it is used to sign every certificate.
/etc/etcd/server.crt                      # is used by the etcd to listen to clients.
/etc/etcd/peer.crt                        # is used by etcd to authenticate as a client.

# Master certificates to auth to etcd
/etc/origin/master/master.etcd-ca.crt     # is a copy of /etc/etcd/ca.crt. Used to trust the etcd cluster.
/etc/origin/master/master.etcd-client.crt # is used to authenticate as a client of the etcd cluster.

# Services ca certificate. All self-signed internal certificates are signed by this CA.

Here’s an example to check the validity of the master server certificate:

cat /etc/origin/master/master.server.crt | openssl x509 -text | grep -i Validity -A2
# OR
openssl x509 -enddate -noout -in /etc/origin/master/master.server.crt

It’s worth checking the documentation about how to re-deploy certificates on OpenShift.

  • OpenShift etcd

On the etcd node (master) set source to etcd.conf file for most of the needed variables.

source /etc/etcd/etcd.conf
export ETCDCTL_API=3

# Set endpoint variable to include all etcd endpoints
ETCD_ALL_ENDPOINTS=` etcdctl  --cert=$ETCD_PEER_CERT_FILE --key=$ETCD_PEER_KEY_FILE --cacert=$ETCD_TRUSTED_CA_FILE --endpoints=$ETCD_LISTEN_CLIENT_URLS --write-out=fields   member list | awk '/ClientURL/{printf "%s%s",sep,$3; sep=","}'`

# Cluster status and health checks
etcdctl  --cert=$ETCD_PEER_CERT_FILE --key=$ETCD_PEER_KEY_FILE --cacert=$ETCD_TRUSTED_CA_FILE --endpoints=$ETCD_LISTEN_CLIENT_URLS --write-out=table  member list
etcdctl  --cert=$ETCD_PEER_CERT_FILE --key=$ETCD_PEER_KEY_FILE --cacert=$ETCD_TRUSTED_CA_FILE --endpoints=$ETCD_ALL_ENDPOINTS  --write-out=table endpoint status
etcdctl  --cert=$ETCD_PEER_CERT_FILE --key=$ETCD_PEER_KEY_FILE --cacert=$ETCD_TRUSTED_CA_FILE --endpoints=$ETCD_ALL_ENDPOINTS endpoint health

Check etcd database key entries:

etcdctl  --cert=$ETCD_PEER_CERT_FILE --key=$ETCD_PEER_KEY_FILE --cacert=$ETCD_TRUSTED_CA_FILE --endpoints="https://$(hostname):2379" get / --prefix --keys-only
  • OpenShift Registry

To get detailed information about the pods running the internal registry run the following command:

oc get pods -n default | grep registry | awk '{ print $1 }' | xargs -i oc describe pod {}

For a basic health check that the internal registry is running and responding you need to “curl” the /healthz path. Normally this should return a 200 HTTP response:

Registry=$(oc get svc docker-registry -n default -o 'jsonpath={.spec.clusterIP}:{.spec.ports[0].port}')

curl -vk $Registry/healthz
# OR
curl -vk https://$Registry/healthz

If a persistent volume is attached to the registry make sure that the registry can write to the volume.

oc project default 
oc rsh `oc get pods -o name -l docker-registry`

$ touch /registry/test-file
$ ls -la /registry/ 
$ rm /registry/test-file
$ exit

If the registry is insecure make sure you have edited the /etc/sysconfig/docker file and add –insecure-registry to the OPTIONS parameter on the nodes.

For more information about testing the internal registry please have a look at the documentation about Accessing the Registry.

  • OpenShift Router 

To increase the log level for OpenShift router pod, set loglevel=4 in the container args:

# Increase logging level
oc patch dc -n default router -p '[{"op": "add", "path": "/spec/template/spec/containers/0/args", "value":["--loglevel=4"]}]' --type=json 

# View logs
oc logs <router-pod-name> -n default

# Remove logging change 
oc patch dc -n default router -p '[{"op": "remove", "path": "/spec/template/spec/containers/0/args", "value":["--loglevel=4"]}]' --type=json

OpenShift router image version 3.3 and later, the logging for http requests can be forwarded to an external syslog server:

oc set env dc/router ROUTER_SYSLOG_ADDRESS=<syslog-server-ip> ROUTER_LOG_LEVEL=debug

If you are facing issues with ingress routes to your application run the below command to collect more information:

oc logs dc/router -n default
oc get dc/router -o yaml -n default
oc get route <route-name> -n <project-name> 
oc get endpoints --all-namespaces 
oc exec -it <router-pod-name> -- ls -la 
oc exec -it <router-pod-name> -- find /var/lib/haproxy -regex ".*\(.map\|config.*\|.json\)" -print -exec cat {} \; > haproxy_configs_and_maps

Check if your application domain is / and dig for an ANSWER containing the load balancer VIP address:

dig \*

Confirm that certificates are being severed out correctly by running the following:

echo -n | openssl s_client -connect :443 -servername 2>&1 | openssl x509 -noout -text
curl -kv 
  • OpenShift SDN

Please checkout the official Troubleshooting OpenShift SDN documentation

To get OpenFlow table export, connect to the openvswitch container and run following command:

docker exec openvswitch ovs-ofctl -O OpenFlow13 dump-flows br0
  • OpenShift Namespace events

Useful to collect events from the namespace to identify pod creation issues before you did in the container logs:

oc get events [-n |--all-namespaces]

In the default namespace you find relevant events for monitoring or auditing a cluster, such as Node and resource events related to the OpenShift platform.

  • OpenShift Pod and Container Logs

Container/pod logs can be viewed using the OpenShift oc command line. Add option “-p” to print the logs for the previous instance of the container in a pod if it exists and add option “-f” to stream the logs:

oc logs <pod-name> [-f]

The logs are saved to the worker nodes disk where the container/pod is running and it is located at:

For setting the log file limits for containers on a worker node the –log-opt can be configured with max-size and max-file so that a containers logs are rolled over:

# cat /etc/sysconfig/docker 
OPTIONS='--insecure-registry= --selinux-enabled --log-opt max-size=50m --log-opt max-file=5'

# Restart docker service for the changes to take effect.
systemctl restart docker 

To remove all logs from a given container run the following commands:

cat /dev/null > /var/lib/docker/containers/<container-id>/<container-id>-json.log
# OR
cat /dev/null >  $(docker inspect --format='{{.LogPath}}' <container-id> )

To generate a list of the largest files run the following commands:

# Log files
find /var/lib/docker/ -name "*.log" -exec ls -sh {} \; | sort -n -r | head -20

# All container files
du -aSh /var/lib/docker/ | sort -n -r | head -n 10

Finding out the veth# interface of a docker container and use tcpdump to capture traffic more easily. The iflink of the container is the same as the ifindex of the veth#. You can get the iflink of the container as follows:

docker exec -it <container-name>  bash -c 'cat /sys/class/net/eth0/iflink'

# Let's say that the results in 14, then grep for 14
grep -l 14 /sys/class/net/veth*/ifindex

# Which will give a unique result on the worker node

Here a simple bash script to get the container and veth id’s:

for container in $(docker ps -q); do
    iflink=`docker exec -it $container bash -c 'cat /sys/class/net/eth0/iflink'`
    iflink=`echo $iflink|tr -d '\r'`
    veth=`grep -l $iflink /sys/class/net/veth*/ifindex`
    veth=`echo $veth|sed -e 's;^.*net/\(.*\)/ifindex$;\1;'`
    echo $container:$veth
  • OpenShift Builder Pod Logs

If you want to troubleshoot a particular build of “myapp” you can view logs with:

oc logs [bc/|dc/]<name> [-f]

To increase the logging level add a BUILD_LOGLEVEL environment variable to the BuildConfig strategy:

    - name: "BUILD_LOGLEVEL"
      value: "5"

I hope you found this article useful and that it helped you troubleshoot OpenShift. Please let me know what you think and leave a comment.