How to validate OpenShift using Ansible

You might have seen my previous article about the OpenShift troubleshooting guide. With this blog post I want to show how to validate that an OpenShift container platform is fully functional. This is less of an issue when you do a fresh install but becomes more important when you apply changes or do in-place upgrades your cluster. And because we all like automation; I show how to do this with Ansible in an automated way.

Let’s jump right into it and look at the different steps. Here the links to the Ansible role and playbook for more details:

  • Prepare workspace – create a temp directories and copy admin.kubeconfig there.
  • Check node state – run command to check for Not Ready nodes:
  • oc get nodes --no-headers=true | grep -v ' Ready' | true
    
  • Check node scheduling – run command to check for nodes where scheduling is disabled:
  • oc get nodes --no-headers=true | grep 'SchedulingDisabled' | awk '{ print $1 }'
    
  • Check master certificates – check validity of master API, controller and etcd certificates:
  • cat /etc/origin/master/ca.crt | openssl x509 -text | grep -i Validity -A2
    cat /etc/origin/master/master.server.crt | openssl x509 -text | grep -i Validity -A2
    cat /etc/origin/master/admin.crt | openssl x509 -text | grep -i Validity -A2
    cat /etc/etcd/ca.crt | openssl x509 -text | grep -i Validity -A2
    
  • Check nodes certificates – check validity of worker node certificate. This needs to run on all compute and infra nodes, not masters:
  • cat /etc/origin/node/server.crt | openssl x509 -text | grep -i Validity -A2
    
  • Check etcd health – run command for cluster health check:
  • /usr/bin/etcdctl --cert-file /etc/etcd/peer.crt --key-file /etc/etcd/peer.key --ca-file /etc/etcd/ca.crt -C https://{{ hostname.stdout }}:2379 cluster-health
    
  • Check important default projects for failed pods – look out for failed pods:
  • oc get pods -o wide --no-headers=true -n {{ item }} | grep -v " Running\|Completed" || true
    
  • Check registry health – run GET docker-registry healthz path and expect http 200:
  • curl -kv https://{{ registry_ip.stdout }}/healthz
    
  • Check SkyDNS resolution – try to resolve internal hostnames:
  • nslookup docker-registry.default.svc.cluster.local
    nslookup docker-registry.default.svc
    
  • Check upstream DNS resolution – try and resolve cluster external dns names.
  • Create test project
  • Run persistent volume test – create busybox container and claim volume
    1. apply imagestream, deploymentconfig and persistent volume claim configuration
    2. synchronise testfile to container pv
    3. check content of testfile
    4. delete testfile
  • Run test build – run the following steps to create multiple application pods on the OpenShift cluster:
    1. apply buildconfig, imagestream, deploymentconfig, service and route configuration
    2. check pods are running
    3. get route hostnames
    4. connect to routes and show output
    5. trigger new-build and check new build is created
    6. check if pods are running
    7. connect to routes and show output
  • Delete test project
  • Delete workspace folder – at the end the validation the role deletes the temporary folder and all its contents.

Next we need to run the playbook and see the output below:

PLAY [Check OpenShift cluster installation] ***********************************************************************************************************************

TASK [check : make temp directory] ********************************************************************************************************************************
changed: [master1]

TASK [check : create temp directory] ******************************************************************************************************************************
ok: [master1]

TASK [check : create template folder] *****************************************************************************************************************************
changed: [master1]

TASK [check : create test file folder] ****************************************************************************************************************************
changed: [master1]

TASK [check : get hostname] ***************************************************************************************************************************************
changed: [master1]

TASK [check : copy admin config] **********************************************************************************************************************************
ok: [master1]

TASK [check : check for not ready nodes] **************************************************************************************************************************
changed: [master1]

TASK [check : not ready nodes] ************************************************************************************************************************************
ok: [master1] => {
    "notready.stdout_lines": []
}

TASK [check : check for scheduling disabled nodes] ****************************************************************************************************************
changed: [master1]

TASK [check : scheduling disabled nodes] **************************************************************************************************************************
ok: [master1] => {
    "schedulingdisabled.stdout_lines": []
}

TASK [check : validate master certificates] ***********************************************************************************************************************
changed: [master1] => (item=cat /etc/origin/master/ca.crt | openssl x509 -text | grep -i Validity -A2)
changed: [master1] => (item=cat /etc/origin/master/master.server.crt | openssl x509 -text | grep -i Validity -A2)
changed: [master1] => (item=cat /etc/origin/master/admin.crt | openssl x509 -text | grep -i Validity -A2)
changed: [master1] => (item=cat /etc/etcd/ca.crt | openssl x509 -text | grep -i Validity -A2)

TASK [check : ca certificate] *************************************************************************************************************************************
ok: [master1] => {
    "msg": [
        [
            "        Validity",
            "            Not Before: Jan 31 17:13:52 2019 GMT",
            "            Not After : Jan 30 17:13:53 2024 GMT"
        ],
        [
            "        Validity",
            "            Not Before: Jan 31 17:13:53 2019 GMT",
            "            Not After : Jan 30 17:13:54 2021 GMT"
        ],
        [
            "        Validity",
            "            Not Before: Jan 31 17:13:53 2019 GMT",
            "            Not After : Jan 30 17:13:54 2021 GMT"
        ],
        [
            "        Validity",
            "            Not Before: Jan 31 17:11:09 2019 GMT",
            "            Not After : Jan 30 17:11:09 2024 GMT"
        ]
    ]
}

TASK [check : check etcd state] ***********************************************************************************************************************************
changed: [master1]

TASK [check : show etcd state] ************************************************************************************************************************************
ok: [master1] => {
    "etcdstate.stdout_lines": [
        "member 335450512aab5650 is healthy: got healthy result from https://172.26.7.132:2379",
        "cluster is healthy"
    ]
}

TASK [check : check default openshift-infra and logging projects for failed pods] *********************************************************************************
ok: [master1] => (item=default)
ok: [master1] => (item=kube-system)
ok: [master1] => (item=kube-service-catalog)
ok: [master1] => (item=openshift-logging)
ok: [master1] => (item=openshift-infra)
ok: [master1] => (item=openshift-console)
ok: [master1] => (item=openshift-web-console)
ok: [master1] => (item=openshift-monitoring)
ok: [master1] => (item=openshift-node)
ok: [master1] => (item=openshift-sdn)

TASK [check : failed failedpods] **********************************************************************************************************************************
ok: [master1] => {
    "msg": [
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        []
    ]
}

TASK [check : get container registry ip] **************************************************************************************************************************
changed: [master1]

TASK [check : check container registry health] ********************************************************************************************************************
ok: [master1]

TASK [check : check internal SysDNS resolution for cluster.local] *************************************************************************************************
changed: [master1] => (item=docker-registry.default.svc.cluster.local)
changed: [master1] => (item=docker-registry.default.svc)

TASK [check : check external DNS upstream resolution] *************************************************************************************************************
changed: [master1] => (item=www.google.com)
changed: [master1] => (item=www.google.co.uk)
changed: [master1] => (item=www.google.de)

TASK [check : create test project] ********************************************************************************************************************************
changed: [master1]

TASK [check : run test persistent volume] *************************************************************************************************************************
included: /var/jenkins_home/workspace/openshift/ansible/roles/check/tasks/pv.yml for master1

TASK [check : create sequence number list for pv] *****************************************************************************************************************
ok: [master1] => (item=1)
ok: [master1] => (item=2)
ok: [master1] => (item=3)
ok: [master1] => (item=4)
ok: [master1] => (item=5)

TASK [check : copy build templates] *******************************************************************************************************************************
changed: [master1] => (item={u'dest': u'busybox.yml', u'src': u'busybox.j2'})
changed: [master1] => (item={u'dest': u'pv.yml', u'src': u'pv.j2'})

TASK [check : copy testfile] **************************************************************************************************************************************
changed: [master1]

TASK [check : create pvs] *****************************************************************************************************************************************
ok: [master1]

TASK [check : deploy busybox pod] *********************************************************************************************************************************
changed: [master1]

TASK [check : check if all pods are running] **********************************************************************************************************************
FAILED - RETRYING: check if all pods are running (10 retries left).
changed: [master1] => (item=busybox)

TASK [check : get busybox pod name] *******************************************************************************************************************************
changed: [master1]

TASK [check : sync testfile to pod] *******************************************************************************************************************************
changed: [master1]

TASK [check : check testfile in pv] *******************************************************************************************************************************
changed: [master1]

TASK [check : delete testfile in pv] ******************************************************************************************************************************
changed: [master1]

TASK [check : run test build] *************************************************************************************************************************************
included: /var/jenkins_home/workspace/openshift/ansible/roles/check/tasks/build.yml for master1

TASK [check : create sequence number list for hello openshift] ****************************************************************************************************
ok: [master1] => (item=0)
ok: [master1] => (item=1)
ok: [master1] => (item=2)
ok: [master1] => (item=3)
ok: [master1] => (item=4)
ok: [master1] => (item=5)
ok: [master1] => (item=6)
ok: [master1] => (item=7)
ok: [master1] => (item=8)
ok: [master1] => (item=9)

TASK [check : create pod list] ************************************************************************************************************************************
ok: [master1] => (item=[u'0', {u'svc': u'http'}])
ok: [master1] => (item=[u'1', {u'svc': u'http'}])
ok: [master1] => (item=[u'2', {u'svc': u'http'}])
ok: [master1] => (item=[u'3', {u'svc': u'http'}])
ok: [master1] => (item=[u'4', {u'svc': u'http'}])
ok: [master1] => (item=[u'5', {u'svc': u'http'}])
ok: [master1] => (item=[u'6', {u'svc': u'http'}])
ok: [master1] => (item=[u'7', {u'svc': u'http'}])
ok: [master1] => (item=[u'8', {u'svc': u'http'}])
ok: [master1] => (item=[u'9', {u'svc': u'http'}])

TASK [check : copy build templates] *******************************************************************************************************************************
changed: [master1] => (item={u'dest': u'hello-openshift.yml', u'src': u'hello-openshift.j2'})

TASK [check : deploy hello openshift pods] ************************************************************************************************************************
changed: [master1]

TASK [check : check if all pods are running] **********************************************************************************************************************
FAILED - RETRYING: check if all pods are running (10 retries left).
changed: [master1] => (item={u'name': u'hello-http-0'})
FAILED - RETRYING: check if all pods are running (10 retries left).
changed: [master1] => (item={u'name': u'hello-http-1'})
changed: [master1] => (item={u'name': u'hello-http-2'})
changed: [master1] => (item={u'name': u'hello-http-3'})
changed: [master1] => (item={u'name': u'hello-http-4'})
changed: [master1] => (item={u'name': u'hello-http-5'})
changed: [master1] => (item={u'name': u'hello-http-6'})
changed: [master1] => (item={u'name': u'hello-http-7'})
changed: [master1] => (item={u'name': u'hello-http-8'})
changed: [master1] => (item={u'name': u'hello-http-9'})

TASK [check : get hello openshift pod hostnames] ******************************************************************************************************************
changed: [master1]

TASK [check : convert check_route string to json] *****************************************************************************************************************
ok: [master1]

TASK [check : set query to get pod hostname] **********************************************************************************************************************
ok: [master1]

TASK [check : get hostname list] **********************************************************************************************************************************
ok: [master1]

TASK [check : connect to route via curl] **************************************************************************************************************************
changed: [master1] => (item=hello-http-0-test.paas.domain.com)
changed: [master1] => (item=hello-http-1-test.paas.domain.com)
changed: [master1] => (item=hello-http-2-test.paas.domain.com)
changed: [master1] => (item=hello-http-3-test.paas.domain.com)
changed: [master1] => (item=hello-http-4-test.paas.domain.com)
changed: [master1] => (item=hello-http-5-test.paas.domain.com)
changed: [master1] => (item=hello-http-6-test.paas.domain.com)
changed: [master1] => (item=hello-http-7-test.paas.domain.com)
changed: [master1] => (item=hello-http-8-test.paas.domain.com)
changed: [master1] => (item=hello-http-9-test.paas.domain.com)

TASK [check : set json query] *************************************************************************************************************************************
ok: [master1]

TASK [check : show route http response] ***************************************************************************************************************************
ok: [master1] => {
    "msg": [
        "hello-http-0",
        "hello-http-1",
        "hello-http-2",
        "hello-http-3",
        "hello-http-4",
        "hello-http-5",
        "hello-http-6",
        "hello-http-7",
        "hello-http-8",
        "hello-http-9"
    ]
}

TASK [check : trigger new build] **********************************************************************************************************************************
changed: [master1]

TASK [check : check new build is created] *************************************************************************************************************************
changed: [master1]

TASK [check : check if all pods with new build are running] *******************************************************************************************************
FAILED - RETRYING: check if all pods with new build are running (10 retries left).
FAILED - RETRYING: check if all pods with new build are running (9 retries left).
changed: [master1] => (item={u'name': u'hello-http-0'})
changed: [master1] => (item={u'name': u'hello-http-1'})
changed: [master1] => (item={u'name': u'hello-http-2'})
changed: [master1] => (item={u'name': u'hello-http-3'})
changed: [master1] => (item={u'name': u'hello-http-4'})
changed: [master1] => (item={u'name': u'hello-http-5'})
changed: [master1] => (item={u'name': u'hello-http-6'})
changed: [master1] => (item={u'name': u'hello-http-7'})
changed: [master1] => (item={u'name': u'hello-http-8'})
changed: [master1] => (item={u'name': u'hello-http-9'})

TASK [check : connect to route via curl] **************************************************************************************************************************
changed: [master1] => (item=hello-http-0-test.paas.domain.com)
changed: [master1] => (item=hello-http-1-test.paas.domain.com)
changed: [master1] => (item=hello-http-2-test.paas.domain.com)
changed: [master1] => (item=hello-http-3-test.paas.domain.com)
changed: [master1] => (item=hello-http-4-test.paas.domain.com)
changed: [master1] => (item=hello-http-5-test.paas.domain.com)
changed: [master1] => (item=hello-http-6-test.paas.domain.com)
changed: [master1] => (item=hello-http-7-test.paas.domain.com)
changed: [master1] => (item=hello-http-8-test.paas.domain.com)
changed: [master1] => (item=hello-http-9-test.paas.domain.com)

TASK [check : show route http response] ***************************************************************************************************************************
ok: [master1] => {
    "msg": [
        "hello-http-0",
        "hello-http-1",
        "hello-http-2",
        "hello-http-3",
        "hello-http-4",
        "hello-http-5",
        "hello-http-6",
        "hello-http-7",
        "hello-http-8",
        "hello-http-9"
    ]
}

TASK [check : delete test project] ********************************************************************************************************************************
changed: [master1]

TASK [check : delete temp directory] ******************************************************************************************************************************
ok: [master1]

PLAY RECAP ********************************************************************************************************************************************************
master1                : ok=52   changed=30   unreachable=0    failed=0

The cluster validation playbook successfully finished without errors and this is just a simple way to do a basic check of your OpenShift platform. Check out the OpenShift documentation about environment_health_checks.

Automate Ansible AWX configuration using Tower-CLI

Some time has gone by since my article about Getting started with Ansible AWX (Open Source Tower version) , and I wanted to continue focusing on AWX and show how to automate the configuration of an AWX Tower server.

Before we configure AWX we should install the tower-cli. You can find more information about the Tower CLI here: https://github.com/ansible/tower-cli. I also recommend having a look at the tower-cli documentation: https://tower-cli.readthedocs.io/en/latest/

sudo pip install ansible-tower-cli

The tower-cli is very useful when you want to monitor the running jobs. The web console is not that great when it comes to large playbook and is pretty slow at showing the running job state. See below the basic configuration before you start using the tower-cli:

berndonline@lab:~$ tower-cli config host 94.130.51.22
Configuration updated successfully.
berndonline@lab:~$ tower-cli login admin
Password:
{
 "id": 1,
 "type": "o_auth2_access_token",
 "url": "/api/v2/tokens/1/",
 "created": "2018-09-15T17:41:23.942572Z",
 "modified": "2018-09-15T17:41:23.955795Z",
 "description": "Tower CLI",
 "user": 1,
 "refresh_token": null,
 "application": null,
 "expires": "3018-01-16T17:41:23.937872Z",
 "scope": "write"
}
Configuration updated successfully.
berndonline@lab:~$ 

But now let’s continue and show how we can use the tower-cli to configure and monitor Ansible AWX Tower.

Create a project:

tower-cli project create --name "My Project" --description "My project description" --organization "Default" --scm-type "git" --scm-url "https://github.com/ansible/ansible-tower-samples"

Create an inventory:

tower-cli inventory create --name "My Inventory" --organization "Default"

Add hosts to an inventory:

tower-cli host create --name "localhost" --inventory "My Inventory" --variables "ansible_connection: local"

Create credentials:

tower-cli credential create --name "My Credential" --credential-type "Machine" --user "admin"

Create a Project Job Template:

tower-cli job_template create --name "My Job Template" --project "My Project" --inventory "My Inventory" --job-type "run" --credential "My Credential" --playbook "hello_world.yml" --verbosity "default"

After we successfully created everything let’s now run the job template and monitor the output via the tower-cli:

tower-cli job launch --job-template "My Job Template"
tower-cli job monitor <ID>

Command line output:

berndonline@lab:~$ tower-cli job launch --job-template "My Job Template"
Resource changed.
== ============ =========================== ======= =======
id job_template           created           status  elapsed
== ============ =========================== ======= =======
26           15 2018-10-12T12:22:48.599748Z pending 0.0
== ============ =========================== ======= =======
berndonline@lab:~$ tower-cli job monitor 26
------Starting Standard Out Stream------


PLAY [Hello World Sample] ******************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [Hello Message] ***********************************************************
ok: [localhost] => {
    "msg": "Hello World!"
}

PLAY RECAP *********************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=0

------End of Standard Out Stream--------
Resource changed.
== ============ =========================== ========== =======
id job_template           created             status   elapsed
== ============ =========================== ========== =======
26           15 2018-10-12T12:22:48.599748Z successful 8.861
== ============ =========================== ========== =======
berndonline@lab:~$

With the tower-cli commands we can write a simple playbook using the Ansible Shell module.

Playbook site.yml:

---
- hosts: localhost
  gather_facts: 'no'

  tasks:
    - name: Add tower project
      shell: |
        tower-cli project create \
        --name "My Project" \
        --description "My project description" \
        --organization "Default" \
        --scm-type "git" \
        --scm-url "https://github.com/ansible/ansible-tower-samples"

    - name: Add tower inventory
      shell: |
        tower-cli inventory create \
        --name "My Inventory" \
        --organization "Default"

    - name: Add host to inventory
      shell: |
        tower-cli host create \
        --name "localhost" \
        --inventory "My Inventory" \
        --variables "ansible_connection: local"
    
    - name: Add credential
      shell: |
        tower-cli credential create \
        --name "My Credential" \
        --credential-type "Machine" \
        --user "admin"
        
    - name: wait 15 seconds to pull project SCM content
      wait_for: timeout=15
      delegate_to: localhost
 
    - name: Add job template
      shell: |
        tower-cli job_template create \
        --name "My Job Template" \
        --project "My Project" \
        --inventory "My Inventory" \
        --job-type "run" \
        --credential "My Credential" \
        --playbook "hello_world.yml" \
        --verbosity "default"

Let’s run the playbook:

berndonline@lab:~/awx-provision$ ansible-playbook site.yml

PLAY [localhost] **************************************************************************************************************************************************

TASK [Add tower project] ******************************************************************************************************************************************
changed: [localhost]

TASK [Add tower inventory] ****************************************************************************************************************************************
changed: [localhost]

TASK [Add host to inventory] **************************************************************************************************************************************
changed: [localhost]

TASK [Add credential] *********************************************************************************************************************************************
changed: [localhost]

TASK [wait 15 seconds to pull project SCM content] ****************************************************************************************************************
ok: [localhost -> localhost]

TASK [Add job template] *******************************************************************************************************************************************
changed: [localhost]

PLAY RECAP ********************************************************************************************************************************************************
localhost : ok=6 changed=5 unreachable=0 failed=0

berndonline@lab:~/awx-provision$

If you like this article, please share your feedback and leave a comment.

Getting started with Ansible AWX (Open Source Tower version)

Ansible released AWX a few weeks ago, an open source (community supported) version of their commercial Ansible Tower product. This is a web-based graphical interface to manage Ansible playbooks, inventories, and schedule jobs to run playbooks.

The Github repository you find here: https://github.com/ansible/awx

Let’s start with the installation of Ansible AWX, very easy because everything is dockerized and see the install guide for more information.

Modify the inventory file under the installer folder and change the Postgres data folder which is otherwise located under /tmp, also change Postgres DB username and password if needed. I would recommend binding AWX to localhost and put an Nginx reverse proxy in front with SSL encryption.

Changes in the inventory file:

postgres_data_dir=/var/lib/postgresql/data/
host_port=127.0.0.1:8052

Start the build of the Docker container:

ansible-playbook -i inventory install.yml

After the Ansible Playbook run completes, you see the following Docker container:

berndonline@lab:~/awx/installer$ docker ps
CONTAINER ID        IMAGE                     COMMAND                  CREATED             STATUS              PORTS                                NAMES
26a73c91cb04        ansible/awx_task:latest   "/tini -- /bin/sh ..."   2 days ago          Up 24 hours         8052/tcp                             awx_task
07774696a7f2        ansible/awx_web:latest    "/tini -- /bin/sh ..."   2 days ago          Up 24 hours         127.0.0.1:8052->8052/tcp             awx_web
981f4f02c759        memcached:alpine          "docker-entrypoint..."   2 days ago          Up 24 hours         11211/tcp                            memcached
4f4a3141b54d        rabbitmq:3                "docker-entrypoint..."   2 days ago          Up 24 hours         4369/tcp, 5671-5672/tcp, 25672/tcp   rabbitmq
faf07f7b4682        postgres:9.6              "docker-entrypoint..."   2 days ago          Up 24 hours         5432/tcp                             postgres
berndonline@lab:~/awx/installer$

Install Nginx:

sudo apt-get update
sudo apt-get install nginx
sudo rm /etc/nginx/sites-enabled/default

Create Nginx vhosts configuration:

sudo vi /etc/nginx/sites-available/awx
server {
    listen 443 ssl;
    server_name awx.domain.com;

    ssl on;
    ssl_certificate /etc/nginx/ssl/awx.domain.com-cert.pem;
    ssl_certificate_key /etc/nginx/ssl/awx.domain.com-key.pem;

    location / {
        proxy_pass http://127.0.0.1:8052;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Create symlink in sites enable to point to awx config:

sudo ln -s /etc/nginx/sites-available/awx /etc/nginx/sites-enabled/awx

Reload Nginx to apply configuration:

sudo systemctl reload nginx

Afterwards you are able to login with username “admin” and password “password”:

I created a simple job for testing with AWX, you first start to create a project, credentials and inventories. The project points to your Git repository:

Under the job you configure which project, credentials and inventories to use:

Once saved you can manually trigger the job, it first pulls the latest playbook from your version control repository and afterwards executes the configured Ansible playbook:

The job details look very similar if you run an playbook on the CLI:

Ansible AWX is a very useful tool if you need to manage different Ansible playbooks and do job scheduling if you are not already using other tools like Jenkins or Gitlab-CI. But even then it is a good addition to use AWX to run ad-hoc playbooks.

Check out my new articles about Automate Ansible AWX configuration using Tower-CLI and Build Ansible Tower Container.

Leave a comment