Running Istio Service Mesh on OpenShift

In the Kubernetes/OpenShift community everyone is talking about Istio service mesh, so I wanted to share my experience about the installation and running a sample microservice application with Istio on OpenShift 3.11 and 4.0. Service mesh on OpenShift is still at least a few month away from being available generally to run in production but this gives you the possibility to start testing and exploring Istio. I have found good documentation about installing Istio on OCP and OKD have a look for more information.

To install Istio on OpenShift 3.11 you need to apply the node and master prerequisites you see below; for OpenShift 4.0 and above you can skip these steps and go directly to the istio-operator installation:

sudo bash -c 'cat << EOF > /etc/origin/master/master-config.patch
admissionConfig:
  pluginConfig:
    MutatingAdmissionWebhook:
      configuration:
        apiVersion: apiserver.config.k8s.io/v1alpha1
        kubeConfigFile: /dev/null
        kind: WebhookAdmission
    ValidatingAdmissionWebhook:
      configuration:
        apiVersion: apiserver.config.k8s.io/v1alpha1
        kubeConfigFile: /dev/null
        kind: WebhookAdmission
EOF'
        
sudo cp -p /etc/origin/master/master-config.yaml /etc/origin/master/master-config.yaml.prepatch
sudo bash -c 'oc ex config patch /etc/origin/master/master-config.yaml.prepatch -p "$(cat /etc/origin/master/master-config.patch)" > /etc/origin/master/master-config.yaml'
sudo su -
master-restart api
master-restart controllers
exit       

sudo bash -c 'cat << EOF > /etc/sysctl.d/99-elasticsearch.conf 
vm.max_map_count = 262144
EOF'

sudo sysctl vm.max_map_count=262144

The Istio installation is straight forward by starting first to install the istio-operator:

oc new-project istio-operator
oc new-app -f https://raw.githubusercontent.com/Maistra/openshift-ansible/maistra-0.9/istio/istio_community_operator_template.yaml --param=OPENSHIFT_ISTIO_MASTER_PUBLIC_URL=<-master-public-hostname->

Verify the operator deployment:

oc logs -n istio-operator $(oc -n istio-operator get pods -l name=istio-operator --output=jsonpath={.items..metadata.name})

Once the operator is running we can start deploying Istio components by creating a custom resource:

cat << EOF >  ./istio-installation.yaml
apiVersion: "istio.openshift.com/v1alpha1"
kind: "Installation"
metadata:
  name: "istio-installation"
  namespace: istio-operator
EOF

oc create -n istio-operator -f ./istio-installation.yaml

Check and watch the Istio installation progress which might take a while to complete:

oc get pods -n istio-system -w

# The installation of the core components is finished when you see:
...
openshift-ansible-istio-installer-job-cnw72   0/1       Completed   0         4m

Afterwards, to finish off the Istio installation, we need to install the Kiali web console:

bash <(curl -L https://git.io/getLatestKialiOperator)
oc get route -n istio-system -l app=kiali

Verifying that all Istio components are running:

$ oc get pods -n istio-system
NAME                                          READY     STATUS      RESTARTS   AGE
elasticsearch-0                               1/1       Running     0          9m
grafana-74b5796d94-4ll5d                      1/1       Running     0          9m
istio-citadel-db879c7f8-kfxfk                 1/1       Running     0          11m
istio-egressgateway-6d78858d89-58lsd          1/1       Running     0          11m
istio-galley-6ff54d9586-8r7cl                 1/1       Running     0          11m
istio-ingressgateway-5dcf9fdf4b-4fjj5         1/1       Running     0          11m
istio-pilot-7ccf64f659-ghh7d                  2/2       Running     0          11m
istio-policy-6c86656499-v45zr                 2/2       Running     3          11m
istio-sidecar-injector-6f696b8495-8qqjt       1/1       Running     0          11m
istio-telemetry-686f78b66b-v7ljf              2/2       Running     3          11m
jaeger-agent-k4tpz                            1/1       Running     0          9m
jaeger-collector-64bc5678dd-wlknc             1/1       Running     0          9m
jaeger-query-776d4d754b-8z47d                 1/1       Running     0          9m
kiali-5fd946b855-7lw2h                        1/1       Running     0          2m
openshift-ansible-istio-installer-job-cnw72   0/1       Completed   0          13m
prometheus-75b849445c-l7rlr                   1/1       Running     0          11m

Let’s start to deploy the microservice application example by using the Google Hipster Shop, it contains multiple microservices which is great to test with Istio:

# Create new project
oc new-project hipster-shop

# Set permissions to allow Istio to deploy the Envoy-Proxy side-car container
oc adm policy add-scc-to-user anyuid -z default -n hipster-shop
oc adm policy add-scc-to-user privileged -z default -n hipster-shop

# Create Hipster Shop deployments and Istio services
oc create -f https://raw.githubusercontent.com/berndonline/openshift-ansible/master/examples/istio-hipster-shop.yml
oc create -f https://raw.githubusercontent.com/berndonline/openshift-ansible/master/examples/istio-manifest.yml

# Wait and check that all pods are running before creating the load generator
oc get pods -n hipster-shop -w

# Create load generator deployment
oc create -f https://raw.githubusercontent.com/berndonline/openshift-ansible/master/examples/istio-loadgenerator.yml

As you see below each pod has a sidecar container with the Istio Envoy proxy which handles pod traffic:

[centos@ip-172-26-1-167 ~]$ oc get pods
NAME                                     READY     STATUS    RESTARTS   AGE
adservice-7894dbfd8c-g4m9v               2/2       Running   0          49m
cartservice-758d66c648-79fj4             2/2       Running   4          49m
checkoutservice-7b9dc8b755-h2b2v         2/2       Running   0          49m
currencyservice-7b5c5f48fc-gtm9x         2/2       Running   0          49m
emailservice-79578566bb-jvwbw            2/2       Running   0          49m
frontend-6497c5f748-5fc4f                2/2       Running   0          49m
loadgenerator-764c5547fc-sw6mg           2/2       Running   0          40m
paymentservice-6b989d657c-klp4d          2/2       Running   0          49m
productcatalogservice-5bfbf4c77c-cw676   2/2       Running   0          49m
recommendationservice-c947d84b5-svbk8    2/2       Running   0          49m
redis-cart-79d84748cf-cvg86              2/2       Running   0          49m
shippingservice-6ccb7d8ff7-66v8m         2/2       Running   0          49m
[centos@ip-172-26-1-167 ~]$

The Kiali web console answers the question about what microservices are part of the service mesh and how are they connected which gives you a great level of detail about the traffic flows:

Detailed traffic flow view:

The Isito installation comes with Jaeger which is an open source tracing tool to monitor and troubleshoot transactions:

Enough about this, lets connect to our cool Hipster Shop and happy shopping:

Additionally there is another example, the Istio Bookinfo if you want to try something smaller and less complex:

oc new-project myproject

oc adm policy add-scc-to-user anyuid -z default -n myproject
oc adm policy add-scc-to-user privileged -z default -n myproject

oc apply -n myproject -f https://raw.githubusercontent.com/Maistra/bookinfo/master/bookinfo.yaml
oc apply -n myproject -f https://raw.githubusercontent.com/Maistra/bookinfo/master/bookinfo-gateway.yaml
export GATEWAY_URL=$(oc get route -n istio-system istio-ingressgateway -o jsonpath='{.spec.host}')
curl -o /dev/null -s -w "%{http_code}\n" http://$GATEWAY_URL/productpage

curl -o destination-rule-all.yaml https://raw.githubusercontent.com/istio/istio/release-1.0/samples/bookinfo/networking/destination-rule-all.yaml
oc apply -f destination-rule-all.yaml

curl -o destination-rule-all-mtls.yaml https://raw.githubusercontent.com/istio/istio/release-1.0/samples/bookinfo/networking/destination-rule-all-mtls.yaml
oc apply -f destination-rule-all-mtls.yaml

oc get destinationrules -o yaml

I hope this is a useful article for getting started with Istio service mesh on OpenShift.

Getting started with OpenShift 4.0 Container Platform

I had a first look at OpenShift 4.0 and I wanted to share some information from what I have seen so far. The installation of the cluster is super easy and RedHat did a lot to improve the overall experience of the installation process to the previous OpenShift v3.x Ansible based installation and moving towards ephemeral cluster deployments.

There are a many changes under the hood and it’s not as obvious as Bootkube for the self-hosted/healing control-plane, MachineSets and the many internal operators to install and manage the OpenShift components ( api serverscheduler, controller manager, cluster-autoscalercluster-monitoringweb-consolednsingressnetworkingnode-tuning, and authentication ).

For the OpenShift 4.0 developer preview you need an RedHat account because you require a pull-secret for the cluster installation. For more information please visit: https://cloud.openshift.com/clusters/install

First we need to download the openshift-installer binary:

wget https://github.com/openshift/installer/releases/download/v0.16.1/openshift-install-linux-amd64
mv openshift-install-linux-amd64 openshift-install
chmod +x openshift-install

Then we create the install-configuration, it is required that you already have AWS account credentials and an Route53 DNS domain set-up:

$ ./openshift-install create install-config
INFO Platform aws
INFO AWS Access Key ID *********
INFO AWS Secret Access Key [? for help] *********
INFO Writing AWS credentials to "/home/centos/.aws/credentials" (https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html)
INFO Region eu-west-1
INFO Base Domain paas.domain.com
INFO Cluster Name cluster1
INFO Pull Secret [? for help] *********

Let’s look at the install-config.yaml

apiVersion: v1beta4
baseDomain: paas.domain.com
compute:
- name: worker
  platform: {}
  replicas: 3
controlPlane:
  name: master
  platform: {}
  replicas: 3
metadata:
  creationTimestamp: null
  name: ew1
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineCIDR: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: eu-west-1
pullSecret: '{"auths":{...}'

Now we can continue to create the OpenShift v4 cluster which takes around 30mins to complete. At the end of the openshift-installer you see the auto-generate credentials to connect to the cluster:

$ ./openshift-install create cluster
INFO Consuming "Install Config" from target directory
INFO Creating infrastructure resources...
INFO Waiting up to 30m0s for the Kubernetes API at https://api.cluster1.paas.domain.com:6443...
INFO API v1.12.4+0ba401e up
INFO Waiting up to 30m0s for the bootstrap-complete event...
INFO Destroying the bootstrap resources...
INFO Waiting up to 30m0s for the cluster at https://api.cluster1.paas.domain.com:6443 to initialize...
INFO Waiting up to 10m0s for the openshift-console route to be created...
INFO Install complete!
INFO Run 'export KUBECONFIG=/home/centos/auth/kubeconfig' to manage the cluster with 'oc', the OpenShift CLI.
INFO The cluster is ready when 'oc login -u kubeadmin -p jMTSJ-F6KYy-mVVZ4-QVNPP' succeeds (wait a few minutes).
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.cluster1.paas.domain.com
INFO Login to the console with user: kubeadmin, password: jMTSJ-F6KYy-mVVZ4-QVNPP

The web-console has a very clean new design which I really like in addition to all the great improvements.

Under administration -> cluster settings you can explore the new auto-upgrade functionality of OpenShift 4.0:

You choose the new version to upgrade and everything else happens in the background which is a massive improvement to OpenShift v3.x where you had to run the ansible installer for this.

In the background the cluster operator upgrades the different platform components one by one.

Slowly you will see that the components move to the new build version.

Finished cluster upgrade:

You can only upgrade from one version 4.0.0-0.9 to the next version 4.0.0-0.10. It is not possible to upgrade and go straight from x-0.9 to x-0.11.

But let’s deploy the Google Hipster Shop example and expose the frontend-external service for some more testing:

oc login -u kubeadmin -p jMTSJ-F6KYy-mVVZ4-QVNPP https://api.cluster1.paas.domain.com:6443 --insecure-skip-tls-verify=true
oc new-project myproject
oc create -f https://raw.githubusercontent.com/berndonline/openshift-ansible/master/examples/hipster-shop.yml
oc expose svc frontend-external

Getting the hostname for the exposed service:

$ oc get route
NAME                HOST/PORT                                                   PATH      SERVICES            PORT      TERMINATION   WILDCARD
frontend-external   frontend-external-myproject.apps.cluster1.paas.domain.com             frontend-external   http                    None

Use the browser to connect to our Hipster Shop:

It’s also very easy to destroy the cluster as it is to create it, as you seen previously:

$ ./openshift-install destroy cluster
INFO Disassociated                                 arn="arn:aws:ec2:eu-west-1:552276840222:route-table/rtb-083e2da5d1183efa7" id=rtbassoc-01d27db162fa45402
INFO Disassociated                                 arn="arn:aws:ec2:eu-west-1:552276840222:route-table/rtb-083e2da5d1183efa7" id=rtbassoc-057f593640067efc0
INFO Disassociated                                 arn="arn:aws:ec2:eu-west-1:552276840222:route-table/rtb-083e2da5d1183efa7" id=rtbassoc-05e821b451bead18f
INFO Disassociated                                 IAM instance profile="arn:aws:iam::552276840222:instance-profile/ocp4-bgx4c-worker-profile" arn="arn:aws:ec2:eu-west-1:552276840222:instance/i-0f64a911b1ffa3eff" id=i-0f64a911b1ffa3eff name=ocp4-bgx4c-worker-profile role=ocp4-bgx4c-worker-role
INFO Deleted                                       IAM instance profile="arn:aws:iam::552276840222:instance-profile/ocp4-bgx4c-worker-profile" arn="arn:aws:ec2:eu-west-1:552276840222:instance/i-0f64a911b1ffa3eff" id=i-0f64a911b1ffa3eff name=0xc00090f9a8
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:instance/i-0f64a911b1ffa3eff" id=i-0f64a911b1ffa3eff
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:instance/i-00b5eedc186ba26a7" id=i-00b5eedc186ba26a7
...
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:security-group/sg-016d4c7d435a1c97f" id=sg-016d4c7d435a1c97f
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:subnet/subnet-076348368858e9a82" id=subnet-076348368858e9a82
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:vpc/vpc-00c611ae1b9b8e10a" id=vpc-00c611ae1b9b8e10a
INFO Deleted                                       arn="arn:aws:ec2:eu-west-1:552276840222:dhcp-options/dopt-0ce8b6a1c31e0ceac" id=dopt-0ce8b6a1c31e0ceac

The install experience is great for OpenShift 4.0 which makes it very easy for everyone to create and get started quickly with an enterprise container platform. From the operational perspective I still need to see how to run the new platform because all the operators are great and makes it an easy to use cluster but what happens when one of the operators goes rogue and debugging this I am most interested in.

Over the coming weeks I will look into more detail around OpenShift 4.0 and the different new features, I am especially interested in Service Mesh.

OpenShift Networking and Network Policies

This article is about OpenShift networking in general but I also want to look at the Kubernetes CNI feature NetworkPolicy in a bit more detail. The latest OpenShift version 3.11 comes with three SDN deployment models:

  • ovs-subnet – This creates a single large vxlan between all the namespace and everyone is able to talk to each other.
  • ovs-multitenant – As the name already says this separates the namespaces into separate vxlan’s and only resources within the namespace are able to talk to each other. You have the possibility to join or making namespaces global.
  • ovs-networkpolicy – The newest SDN deployment method for OpenShift to enabling micro-segmentation to control the communication between pods and namespaces.
  • ovs-ovn – Next generation SDN for OpenShift but not yet officially released for OpenShift. For more information visit the OpenvSwitch Github repository ovn-kubernetes.

Here an overview of the common ovs-multitenant software defined network:

On an OpenShift node the tun0 interfaces owns the default gateway and is forwarding traffic to external endpoints outside the OpenShift platform or routing internal traffic to the openvswitch overlay. Both openvswitch and iptables are central components which are very important for the networking  on the platform.

Read the official OpenShift documentation managing networking or configuring the SDN for more information.

NetworkPolicy in Action

Let me first explain the example I use to test NetworkPolicy. We will have one hello-openshift pod behind service, and a busybox pod for testing the internal communication. I will create a default ingress deny policy and specifically allow tcp port 8080 to my hello-openshift pod. I am not planning to restrict the busybox pod with an egress policy, so all egress traffic is allowed.

Here you find the example yaml files to replicate the layout: busybox.yml and hello-openshift.yml

Short recap about Kubernetes service definition, they are just simple iptables entries and for this reason you cannot restrict them with NetworkPolicy.

[root@master1 ~]# iptables-save | grep 172.30.231.77
-A KUBE-SERVICES ! -s 10.128.0.0/14 -d 172.30.231.77/32 -p tcp -m comment --comment "myproject/hello-app-http:web cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 172.30.231.77/32 -p tcp -m comment --comment "myproject/hello-app-http:web cluster IP" -m tcp --dport 80 -j KUBE-SVC-LFWXBQW674LJXLPD
[root@master1 ~]#

When you install OpenShift with ovs-networkpolicy, the default policy allows all traffic within a namespace. Let’s do a first test without a custom NetworkPolicy rule to see if I am able to connect to my hello-app-http service.

[root@master1 ~]# oc exec busybox-1-wn592 -- wget -S --spider http://hello-app-http
Connecting to hello-app-http (172.30.231.77:80)
  HTTP/1.1 200 OK
  Date: Tue, 19 Feb 2019 13:59:04 GMT
  Content-Length: 17
  Content-Type: text/plain; charset=utf-8
  Connection: close

[root@master1 ~]#

Now we add a default ingress deny policy to the namespace:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: deny-all-ingress
spec:
  podSelector:
  ingress: []

After applying the default deny policy you are not able to connect to the hello-app-http service. The connection is timing out because no flows entries are defined yet in the OpenFlow table:

[root@master1 ~]# oc exec busybox-1-wn592 -- wget -S --spider http://hello-app-http
Connecting to hello-app-http (172.30.231.77:80)
wget: can't connect to remote host (172.30.231.77): Connection timed out
command terminated with exit code 1
[root@master1 ~]#

Let’s add a new policy and allow tcp port 8080 and specifying a podSelector to match all pods with the label “role: web”.

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-tcp8080
spec:
  podSelector:
    matchLabels:
      role: web
  ingress:
  - ports:
    - protocol: TCP
      port: 8080

This alone doesn’t do anything, you still need to patch the deployment config and add the label “role: web” to your deployment config metadata information.

oc patch dc/hello-app-http --patch '{"spec":{"template":{"metadata":{"labels":{"role":"web"}}}}}'

To rollback the previous changes simply use the ‘oc rollback dc/hello-app-http’ command.

Now let’s check the openvswitch flow table and you will see that a new flow got added with the destination of my hello-openshift pod 10.128.0.103 on port 8080.

Afterwards we try again to connect to my hello-app-http service and you see that we get a succesful connect:

[root@master1 ~]# oc exec ovs-q4p8m -n openshift-sdn -- ovs-ofctl -O OpenFlow13 dump-flows br0 | grep '10.128.0.103.*8080'
 cookie=0x0, duration=221.251s, table=80, n_packets=15, n_bytes=1245, priority=150,tcp,reg1=0x2dfc74,nw_dst=10.128.0.103,tp_dst=8080 actions=output:NXM_NX_REG2[]
[root@master1 ~]#
[root@master1 ~]# oc exec busybox-1-wn592 -- wget -S --spider http://hello-app-http
Connecting to hello-app-http (172.30.231.77:80)
  HTTP/1.1 200 OK
  Date: Tue, 19 Feb 2019 14:21:57 GMT
  Content-Length: 17
  Content-Type: text/plain; charset=utf-8
  Connection: close

[root@master1 ~]#

The hello openshift container publishes two tcp ports 8080 and 8888, so finally let’s try to connect to the pod IP address on port 8888, and we will find out that I am not able to connect, the reason is that I only allowed 8080 in the policy.

[root@master1 ~]# oc exec busybox-1-wn592 -- wget -S --spider http://10.128.0.103:8888
Connecting to 10.128.0.103:8888 (10.128.0.103:8888)
wget: can't connect to remote host (10.128.0.103): Connection timed out
command terminated with exit code 1
[root@master1 ~]#

There are great posts on the RedHat OpenShift blog which you should checkout networkpolicies-and-microsegmentation and openshift-and-network-security-zones-coexistence-approaches. Otherwise I can recommend having a look at Ahmet Alp Balkan Github repository about Kubernetes network policy recipes, where you can find some good examples.

Host and Container Monitoring with SysDig

After my previous articles about troubleshooting and to validate OpenShift using Ansible, I wanted to continue and show how SysDig is helping you to identify potentials issues on your nodes or container platform before they occur.

The open source version is a simple but very powerful tool to inspect your linux host via the command line but it has no capabilities to centrally monitor or store capture information. The enterprise version provides these capabilities like a web console and centrally stores metrics, it is also able to trigger remote captures without the need to connect to the host.

Sysdig Open Source

Let’s install sysdig open source, here the official SysDig installation guide.

# Host install
curl -s https://s3.amazonaws.com/download.draios.com/stable/install-sysdig | sudo bash

# Alternatively the container based install
yum -y install kernel-devel-$(uname -r)
docker pull sysdig/sysdig
docker run -i -t --name sysdig --privileged -v /var/run/docker.sock:/host/var/run/docker.sock -v /dev:/host/dev -v /proc:/host/proc:ro -v /boot:/host/boot:ro -v /lib/modules:/host/lib/modules:ro -v /usr:/host/usr:ro sysdig/sysdig

The csysdig command is nice and user friendly menu driven interface to see real-time system call information of your host. To collect information from Kubernetes or OpenShift please use the option [-kK] like seen in the example below:

csysdig -k https://localhost:8443 -K /etc/origin/master/admin.crt:/etc/origin/master/admin.key

For more information about how to use csysdig please have a look at the manual or watch the short Youtube video.

The main sysdig command is showing output directly in the terminal session and you are able to apply filters (chisels) to more granularly see the system calls. Like with csysdig, the option [-kK] enabled Kubernetes integration:

sysdig -k https://localhost:8443 -K /etc/origin/master/admin.crt:/etc/origin/master/admin.key

Here some useful commands to inspect Kubernetes or OpenShift events:

# Monitor Kubernetes namespace ip communication:
sudo sysdig -A -s8192 "fd.type in (ipv4, ipv6) and (k8s.ns.name=<-NAMESPACE-NAME->)" -k https://localhost:8443 -K /etc/origin/master/admin.crt:/e/origin/master/admin.key

# Monitor namespace and pod name, the 2nd command filters to only show GET requests:
sudo sysdig -A -s8192 "fd.type in (ipv4, ipv6) and (k8s.ns.name=<-NAMESPACE-NAME-> and k8s.pod.name=<-POD-NAME->)" -k https://localhost:8443 -K /etc/origin/master/admin.crt:/etc/origin/master/admin.key
sudo sysdig -A -s8192 "fd.type in (ipv4, ipv6) and (k8s.ns.name=<-NAMESPACE-NAME-> and k8s.pod.name=<-POD-NAME->) and evt.buffer contai GET" -k https://localhost:8443 -K /etc/origin/master/admin.crt:/etc/origin/master/admin.key 

# Monitor ns and pod names and apply chisel echo_fds:
sudo sysdig -A -s8192 "fd.type in (ipv4, ipv6) and (k8s.ns.name=<-NAMESPACE-NAME-> and k8s.pod.name=<-POD-NAME->)" -c echo_fds -k https://localhost:8443 -K /etc/origin/master/admin.crt:/etc/origin/master/admin.key

SysDig example

This capture is an http request between an busybox pod (name: busybox-2-hjhq8 ip: 10.128.0.81) via service (name: hello-app-http ip: 172.30.43.111) to the hello-openshift pod (name: hello-app-http-1-8v57x ip: 10.128.0.77) in the namespace myproject. I use a simple “wget -S –spider http://hello-app-http/” to simulate the request:

# Command to capture ip communication in myproject namespace including dnsmasq and wget processes:
sudo sysdig -s2000 -A -pk "fd.type in (ipv4, ipv6) and (k8s.ns.name=myproject or proc.name=dnsmasq) or proc.name=wget" -k https://localhost:8443 -K /etc/origin/master/admin.crt:/etc/origin/master/admin.key

# Output:
70739 19:36:51.401062017 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < socket fd=3(<4>)
70741 19:36:51.401062878 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > connect fd=3(<4>)
70748 19:36:51.401072194 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < connect res=0 tuple=10.128.0.81:44993->172.26.11.254:53
70749 19:36:51.401074599 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > sendto fd=3(<4u>10.128.0.81:44993->172.26.11.254:53) size=60 tuple=NULL
71083 19:36:51.401575859 0  (host) dnsmasq (20933:20933) > recvmsg fd=6(<4u>172.26.11.254:53)
71087 19:36:51.401582008 0  (host) dnsmasq (20933:20933) < recvmsg res=60 size=60 data= hello-app-httpmyprojectsvcclusterlocal tuple=10.128.0.81:44993->172.26.11.254:53
71088 19:36:51.401584101 0  (host) dnsmasq (20933:20933) > ioctl fd=6(<4u>10.128.0.81:44993->172.26.11.254:53) request=8910 argument=7FFE208E30C0
71089 19:36:51.401586692 0  (host) dnsmasq (20933:20933) < ioctl res=0
71108 19:36:51.401623408 0  (host) dnsmasq (20933:20933) < socket fd=58(<4>)
71109 19:36:51.401624563 0  (host) dnsmasq (20933:20933) > fcntl fd=58(<4>) cmd=4(F_GETFL)
71110 19:36:51.401625584 0  (host) dnsmasq (20933:20933) < fcntl res=2(/dev/null)
71111 19:36:51.401626259 0  (host) dnsmasq (20933:20933) > fcntl fd=58(<4>) cmd=5(F_SETFL)
71112 19:36:51.401626825 0  (host) dnsmasq (20933:20933) < fcntl res=0(/dev/null)
71113 19:36:51.401627787 0  (host) dnsmasq (20933:20933) > bind fd=58(<4>)
71129 19:36:51.401680355 0  (host) dnsmasq (20933:20933) < bind res=0 addr=0.0.0.0:22969
71130 19:36:51.401681698 0  (host) dnsmasq (20933:20933) > sendto fd=58(<4u>0.0.0.0:22969) size=60 tuple=0.0.0.0:22969->127.0.0.1:53
71131 19:36:51.401715726 0  (host) dnsmasq (20933:20933) < sendto res=60 data=
hello-app-httpmyprojectsvcclusterlocal
71469 19:36:51.402632442 1  (host) dnsmasq (20933:20933) > recvfrom fd=58(<4u>127.0.0.1:53->127.0.0.1:22969) size=5131
71474 19:36:51.402636604 1  (host) dnsmasq (20933:20933) < recvfrom res=114 data=
hello-app-httpmyprojectsvcclusterlocal)<*nsdns)
hostmaster)\`tp :< tuple=127.0.0.1:53->0.0.0.0:22969
71479 19:36:51.402643363 1  (host) dnsmasq (20933:20933) > sendmsg fd=6(<4u>10.128.0.81:44993->172.26.11.254:53) size=114 tuple=172.26.11.254:53->10.128.0.81:44993
71492 19:36:51.402666311 1  (host) dnsmasq (20933:20933) < sendmsg res=114 data=
hello-app-httpmyprojectsvcclusterlocal)<*nsdns)
hostmaster)\`tp :<
71493 19:36:51.402668199 1  (host) dnsmasq (20933:20933) > close fd=58(<4u>127.0.0.1:53->127.0.0.1:22969)
71494 19:36:51.402669009 1  (host) dnsmasq (20933:20933) < close res=0
80786 19:36:51.430143868 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < sendto res=60 data= hello-app-httpmyprojectsvcclusterlocal 80793 19:36:51.430153453 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > recvfrom fd=3(<4u>10.128.0.81:44993->172.26.11.254:53) size=512
80794 19:36:51.430158626 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < recvfrom res=114 data=
hello-app-httpmyprojectsvcclusterlocal)<*nsdns)
hostmaster)\`tp :< tuple=NULL 80795 19:36:51.430160257 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > close fd=3(<4u>10.128.0.81:44993->172.26.11.254:53)
80796 19:36:51.430161712 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < close res=0
80835 19:36:51.430260103 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < socket fd=3(<4>)
80838 19:36:51.430261013 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > connect fd=3(<4>)
80840 19:36:51.430269080 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < connect res=0 tuple=10.128.0.81:41405->172.26.11.254:53
80841 19:36:51.430271011 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > sendto fd=3(<4u>10.128.0.81:41405->172.26.11.254:53) size=60 tuple=NULL
80874 19:36:51.430433333 1  (host) dnsmasq (20933:20933) > recvmsg fd=6(<4u>10.128.0.81:44993->172.26.11.254:53)
80879 19:36:51.430439631 1  (host) dnsmasq (20933:20933) < recvmsg res=60 size=60 data= hello-app-httpmyprojectsvcclusterlocal tuple=10.128.0.81:41405->172.26.11.254:53
80881 19:36:51.430454839 1  (host) dnsmasq (20933:20933) > ioctl fd=6(<4u>10.128.0.81:41405->172.26.11.254:53) request=8910 argument=7FFE208E30C0
80885 19:36:51.430457716 1  (host) dnsmasq (20933:20933) < ioctl res=0
80895 19:36:51.430493317 1  (host) dnsmasq (20933:20933) < socket fd=58(<4>)
80896 19:36:51.430494522 1  (host) dnsmasq (20933:20933) > fcntl fd=58(<4>) cmd=4(F_GETFL)
80897 19:36:51.430495527 1  (host) dnsmasq (20933:20933) < fcntl res=2(/dev/null)
80898 19:36:51.430496189 1  (host) dnsmasq (20933:20933) > fcntl fd=58(<4>) cmd=5(F_SETFL)
80899 19:36:51.430496769 1  (host) dnsmasq (20933:20933) < fcntl res=0(/dev/null)
80900 19:36:51.430497538 1  (host) dnsmasq (20933:20933) > bind fd=58(<4>)
80913 19:36:51.430551876 1  (host) dnsmasq (20933:20933) < bind res=0 addr=0.0.0.0:64640
80914 19:36:51.430553226 1  (host) dnsmasq (20933:20933) > sendto fd=58(<4u>0.0.0.0:64640) size=60 tuple=0.0.0.0:64640->127.0.0.1:53
80922 19:36:51.430581962 1  (host) dnsmasq (20933:20933) < sendto res=60 data=
:=hello-app-httpmyprojectsvcclusterlocal
81032 19:36:51.430806106 1  (host) dnsmasq (20933:20933) > recvfrom fd=58(<4u>127.0.0.1:53->127.0.0.1:64640) size=5131
81035 19:36:51.430809074 1  (host) dnsmasq (20933:20933) < recvfrom res=76 data= :=hello-app-httpmyprojectsvcclusterlocal+o tuple=127.0.0.1:53->0.0.0.0:64640
81040 19:36:51.430818116 1  (host) dnsmasq (20933:20933) > sendmsg fd=6(<4u>10.128.0.81:41405->172.26.11.254:53) size=76 tuple=172.26.11.254:53->10.128.0.81:41405
81051 19:36:51.430840305 1  (host) dnsmasq (20933:20933) < sendmsg res=76 data=
hello-app-httpmyprojectsvcclusterlocal+o
81052 19:36:51.430842129 1  (host) dnsmasq (20933:20933) > close fd=58(<4u>127.0.0.1:53->127.0.0.1:64640)
81053 19:36:51.430842956 1  (host) dnsmasq (20933:20933) < close res=0
84676 19:36:51.436248790 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < sendto res=60 data= hello-app-httpmyprojectsvcclusterlocal 84683 19:36:51.436254334 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > recvfrom fd=3(<4u>10.128.0.81:41405->172.26.11.254:53) size=512
84684 19:36:51.436256892 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < recvfrom res=76 data= hello-app-httpmyprojectsvcclusterlocal+o tuple=NULL 84685 19:36:51.436264998 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > close fd=3(<4u>10.128.0.81:41405->172.26.11.254:53)
84686 19:36:51.436265743 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < close res=0
85420 19:36:51.437492301 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < socket fd=3(<4>)
85421 19:36:51.437493337 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > connect fd=3(<4>)
86222 19:36:51.438494771 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < connect res=0 tuple=10.128.0.81:39656->172.30.43.111:80
86226 19:36:51.438497506 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > fcntl fd=3(<4t>10.128.0.81:39656->172.30.43.111:80) cmd=4(F_GETFL)
86228 19:36:51.438498484 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < fcntl res=2(/dev/pts/1)
86229 19:36:51.438499943 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > ioctl fd=3(<4t>10.128.0.81:39656->172.30.43.111:80) request=5401 argument=7FFDBF5E434C
86233 19:36:51.438501658 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < ioctl res=-25(ENOTTY) 86242 19:36:51.438509833 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > write fd=3(<4t>10.128.0.81:39656->172.30.43.111:80) size=105
86285 19:36:51.438557309 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < write res=105 data= GET / HTTP/1.1 Host: hello-app-http.myproject.svc.cluster.local User-Agent: Wget Connection: close 86291 19:36:51.438561615 1 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > read fd=3(<4t>10.128.0.81:39656->172.30.43.111:80) size=4096
107714 19:36:51.478518400 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) < accept fd=6(<4t>10.128.0.81:39656->10.128.0.77:8080) tuple=10.128.0.81:39656->10.128.0.77:8080 queuepct=0 queuelen=0 queuemax=128
107772 19:36:51.478636516 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) > read fd=6(<4t>10.128.0.81:39656->10.128.0.77:8080) size=4096
107773 19:36:51.478640241 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) < read res=105 data= GET / HTTP/1.1 Host: hello-app-http.myproject.svc.cluster.local User-Agent: Wget Connection: close 107857 19:36:51.478817861 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) > write fd=6(<4t>10.128.0.81:39656->10.128.0.77:8080) size=153
107869 19:36:51.478870349 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) < write res=153 data= HTTP/1.1 200 OK Date: Sun, 10 Feb 2019 19:36:51 GMT Content-Length: 17 Content-Type: text/plain; charset=utf-8 Connection: close Hello OpenShift! 107886 19:36:51.478892928 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) > close fd=6(<4t>10.128.0.81:39656->10.128.0.77:8080)
107887 19:36:51.478893676 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11185:7) < close res=0
107899 19:36:51.478998208 0 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < read res=153 data= HTTP/1.1 200 OK Date: Sun, 10 Feb 2019 19:36:51 GMT Content-Length: 17 Content-Type: text/plain; charset=utf-8 Connection: close Hello OpenShift! 108908 19:36:51.480114626 0 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) > close fd=3(<4t>10.128.0.81:39656->172.30.43.111:80)
108910 19:36:51.480115482 0 busybox-2-hjhq8 (4d84d98d46f1) wget (84856:26) < close res=0
112966 19:36:51.488041049 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11183:6) < accept fd=6(<4t>10.128.0.1:55052->10.128.0.77:8080) tuple=10.128.0.1:55052->10.128.0.77:8080 queuepct=0 queuelen=0 queuemax=128
113001 19:36:51.488096304 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11183:6) > read fd=6(<4t>10.128.0.1:55052->10.128.0.77:8080) size=4096
113002 19:36:51.488098693 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11183:6) < read res=0 data= 113005 19:36:51.488105730 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11183:6) > close fd=6(<4t>10.128.0.1:55052->10.128.0.77:8080)
113006 19:36:51.488106302 0 hello-app-http-1-8v57x (5145dc0ea61e) hello-openshift (11183:6) < close res=0

Below a list of some more useful sysdig cli examples:

# Sysdig Chisels and Filters:
sudo sysdig -cl

# To find out more information about a particular chisel:
sudo sysdig -i lscontainers

# To view a list of available field classes, fields and their description:
sudo sysdig -l

# Create and write sysdig trace files, 2nd option sets byte limit for trace file:
sudo sysdig -w mytrace.scap
sudo sysdig -s 8192 -w trace.scap 

# Read sysdig trace files, 2nd option read and filter based on proc.name:
sudo sysdig -r trace.scap
sudo sysdig -r trace.scap proc.name=dnsmasq

# Monitor linux processes:
sudo sysdig -c ps

# Monitor linux processes by CPU utilisation:
sudo sysdig -c topprocs_cpu

# Monitor network connections:
sudo sysdig -c netstat
sudo sysdig -c topconns
sudo sysdig -c topprocs_net

# Monitor system file i/o:
sudo sysdig -c echo_fds
sudo sysdig -c topprocs_file

# Troubleshoot system performance:
sudo sysdig -c bottlenecks

# Monitor process execution time
sudo sysdig -c proc_exec_time 

# Monitor network i/o performance
sudo sysdig -c netlower 1

# Watch log entries
sudo sysdig -c spy_logs

# Monitor http requests:
sudo sysdig -c httplog    
sudo sysdig -c httptop [Print Top HTTP Requests] 

SysDig Monitor Enterprise

The paid enterprise version provides a web console to centrally access metrics and events from your fleet of monitored nodes.

You can run SysDig enterprise directly on OpenShift as DaemonSet and deploy the agent to all nodes in the cluster. For more detailed information about Kubernetes or OpenShift installation, read the official documentation.

oc adm new-project sysdig-agent --node-selector='app=sysdig-agent'
oc project sysdig-agent
oc label node --all "app=sysdig-agent"
oc create serviceaccount sysdig-agent
oc adm policy add-scc-to-user privileged -n sysdig-agent -z sysdig-agent
oc adm policy add-cluster-role-to-user cluster-reader -n sysdig-agent -z sysdig-agent

wget https://raw.githubusercontent.com/draios/sysdig-cloud-scripts/master/agent_deploy/kubernetes/sysdig-agent-daemonset-v2.yaml
wget https://raw.githubusercontent.com/draios/sysdig-cloud-scripts/master/agent_deploy/kubernetes/sysdig-agent-configmap.yaml
oc create secret generic sysdig-agent --from-literal=access-key=<-YOUR-ACCESS-KEY->

# Edit sysdig-agent-daemonset-v2.yaml to uncomment the line: serviceAccount: sysdig-agent and edit sysdig-agent-configmap.yaml to uncomment the line: new_k8s: true
# This allows kube-state-metrics to be automatically detected, monitored, and displayed in Sysdig Monitor. 
# Edit sysdig-agent-configmap.yaml to uncomment the line: k8s_cluster_name: and add your cluster name.

oc create -f sysdig-agent-daemonset-v2.yaml
oc create -f sysdig-agent-configmap.yaml

SysDig is a great tool to monitor and even further provides you the possibility to troubleshoot in depth your linux hosts and container platforms.

Deploy OpenShift 3.11 Container Platform on Google Cloud Platform using Terraform

Over the past few days I have converted the OpenShift 3.11 infrastructure on Amazon AWS to run on Google Cloud Platform. I have kept the similar VPC network layout and instances to run OpenShift.

Before you start you need to create a project on Google Cloud Platform, then continue to create the service account and generate the private key and download the credential as JSON file.

Create the new project:

Create the service account:

Give the service account compute admin and storage object creator permissions:

Then create a storage bucket for the Terraform backend state and assign the correct bucket permission to the terraform service account:

Bucket permissions:

To start, clone my openshift-terraform github repository and checkout the google-dev branch:

git clone https://github.com/berndonline/openshift-terraform.git
cd ./openshift-terraform/ && git checkout google-dev

Add your previously downloaded credentials json file:

cat << EOF > ./credentials.json
{
  "type": "service_account",
  "project_id": "<--your-project-->",
  "private_key_id": "<--your-key-id-->",
  "private_key": "-----BEGIN PRIVATE KEY-----

...

}
EOF

There are a few things you need to modify in the main.tf and variables.tf before you can start:

...
terraform {
  backend "gcs" {
    bucket    = "<--your-bucket-name-->"
    prefix    = "openshift-311"
    credentials = "credentials.json"
  }
}
...
...
variable "gcp_region" {
  description = "Google Compute Platform region to launch servers."
  default     = "europe-west3"
}
variable "gcp_project" {
  description = "Google Compute Platform project name."
  default     = "<--your-project-name-->"
}
variable "gcp_zone" {
  type = "string"
  default = "europe-west3-a"
  description = "The zone to provision into"
}
...

Add the needed environment variables to apply changes to CloudFlare DNS:

export TF_VAR_email='<-YOUR-CLOUDFLARE-EMAIL-ADDRESS->'
export TF_VAR_token='<-YOUR-CLOUDFLARE-TOKEN->'
export TF_VAR_domain='<-YOUR-CLOUDFLARE-DOMAIN->'
export TF_VAR_htpasswd='<-YOUR-OPENSHIFT-DEMO-USER-HTPASSWD->'

Let’s start creating the infrastructure and verify afterwards the created resources on GCP.

terraform init && terraform apply -auto-approve

VPC and public and private subnets in region europe-west3:

Created instances:

Created load balancers for master and infra nodes:

Copy the ssh key and ansible-hosts file to the bastion host from where you need to run the Ansible OpenShift playbooks.

scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -r ./helper_scripts/id_rsa centos@$(terraform output bastion):/home/centos/.ssh/
scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -r ./inventory/ansible-hosts  centos@$(terraform output bastion):/home/centos/ansible-hosts

I recommend waiting a few minutes as the cloud-init script prepares the bastion host. Afterwards continue with the pre and install playbooks. You can connect to the bastion host and run the playbooks directly.

ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -l centos $(terraform output bastion) -A "cd /openshift-ansible/ && ansible-playbook ./playbooks/openshift-pre.yml -i ~/ansible-hosts"
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ./helper_scripts/id_rsa -l centos $(terraform output bastion) -A "cd /openshift-ansible/ && ansible-playbook ./playbooks/openshift-install.yml -i ~/ansible-hosts"

After the installation is completed, continue to create your project and applications:

When you are finished with the testing, run terraform destroy.

terraform destroy -force 

Please share your feedback and leave a comment.