Why Productising Your Platform Is Critical for Engineering Success

It’s been a while since I’ve written anything publicly—like many in platform engineering, I’ve been head down, building, scaling and supporting. But after presenting at the OpenShift Commons Gathering alongside KubeCon 2025 in London, I felt it was time to share some of that journey—particularly the importance of product thinking in platform engineering.

In today’s cloud-native world, building an internal platform isn’t enough. If your platform isn’t treated like a product—with real users, feedback loops and continuous iteration—it will fail to gain adoption, cause friction and ultimately slow down delivery.

I’ve seen this firsthand. In 2020, I led the design and build of a multi-tenant Kubernetes platform—long before “platform as a product” was a common concept. What started as a response to infrastructure bottlenecks became a mature self-service platform that is now used globally by dozens of engineering teams.

This blog isn’t just about what we built—it’s about why thinking like a product team is the foundation of modern platform engineering success.

The Case for Product Thinking in Platform Engineering

Too often internal platforms are built with an infrastructure-first mindset: lots of YAML, few users and almost no UX. But modern engineering teams expect more:

  • APIs that just work

  • Documentation that’s up to date

  • Rapid self-service delivery

  • Immediate feedback

  • A clear path from idea to production

That’s why your platform needs to be compelling, self-service, and accessible—just like a great product.

How Success Looks in Practice

In 2020, my team and I set out to solve serious pain points:

  • 4–8 month compute lead times

  • TicketOps and manual change gates

  • External dependencies and missing knowledge

  • No clear self-service paths

  • Frequent config drift, duplication and lack of consistency

We didn’t want to patch these problems—we wanted to reimagine how a platform should work.

The result was our internal Platform Product, a multi-tenant OpenShift-based Kubernetes platform built as a product, not just infrastructure.

Here’s what made it successful:

1. Self-Service and GitOps by Default

Developers create and manage their own workloads without opening tickets. Platform features are exposed via Kubernetes-native APIs and Custom Resources. All requests are declarative and go through GitOps with automated validation.

2. Multi-Tenancy with Guard Rails

Each team is onboarded via a simple pull request that provisions their namespace and registers ownership. All changes are subject to automated checks using validating webhooks, object schema linting and RBAC enforcement.

3. Focus on Developer Experience

We designed the platform to be the path of least resistance:

  • Free sandbox environments

  • Pre-filled onboarding templates

  • Built-in best practices

  • Documentation and a developer portal

  • Persistent support channels and short feedback loops

4. Observability and Policy Everywhere

The product is highly secure (PCI-DSS compliant), fully observable and tightly integrated with cloud infrastructure. Each feature includes sensible defaults and field-tested configuration to help users succeed without needing deep Kubernetes expertise.

The Benefits of Productising Your Platform

Treating your platform like a product brings clear, measurable advantages:

Platform as Infrastructure Platform as a Product
Ticket-driven operations Self-service automation
One-size-fits-all features Iterative roadmap based on user needs
Inconsistent experience Standardized onboarding & guardrails
Siloed knowledge Published docs & support channels
Siloed teams Transparent ownership & feedback

In our case, this mindset has helped:

  • Onboard tenants in minutes, not weeks

  • Cut incident volume due to misconfiguration

  • Empower teams to deploy faster and more safely

  • Scale the platform globally with a lean team

Final Thoughts: Platform Teams Need Product DNA

Success in platform engineering isn’t just about Kubernetes, pipelines or YAML. It’s about building something people want to use. That’s what product thinking brings to the table.

If you’re leading or contributing to an internal platform today, start thinking like a product team:

  • Understand your users deeply

  • Design for ease of use and autonomy

  • Invest in documentation, automation and support

  • Create feedback loops and iterate often

You’re not just shipping infrastructure—you’re building a product that enables every other product in your company. Treat it like one.

Using Kubernetes Impersonate (sudo) for least-privilege

It has become very easy and simple to deploy Kubernetes services using the various cloud offerings like EKS or GKE, after you created your cluster and have the cluster-admin privileges to apply changes as you like. This model is great for development because you can start consuming Kubernetes services right away but this doesn’t work well for production clusters and gets more challenging when running PCI compliant workloads.

I want to explain a bit how to apply a least-privilege principle for Elastic Kubernetes Services (EKS) using the AWS integrated IAM. The diagram below is a simple example showing two IAM roles for admin and reader privileges for AWS resources. On the Kubernetes cluster the IAM roles are bound to the k8s cluster-admin and reader roles. The k8s sudoer role allows to impersonate cluster-admin privileges for cluster readers:

Normally you would add your DevOps team to the IAM reader role. This way the DevOps team has the default read permissions for AWS and Kubernetes resources but they can also elevate Kubernetes permissions to cluster-admin level when required without having full access to the AWS resources.

Let’s look at the EKS aws-auth ConfigMap where you need to define the IAM role mapping for admin and reader to internal Kubernetes groups:

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: arn:aws:iam::xxx:role/admin
      username: cluster-admin
      groups:
        - system:masters
    - rolearn: arn:aws:iam::xxx:role/reader
      username: cluster-reader
      groups:
        - cluster-reader
    - rolearn: arn:aws:iam::555555555555:role/devel-worker-nodes-NodeInstanceRole-74RF4UBDUKL6
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes
  mapUsers: |
    []

The system:masters group is a Kubernetes default role and rolebinding and requires no additional configuration. For the cluster-reader we need to apply a ClusterRole and a ClusterRoleBinding:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-reader
rules:
- apiGroups:
  - ""
  resources:
  - componentstatuses
  - nodes
  - nodes/status
  - persistentvolumeclaims/status
  - persistentvolumes
  - persistentvolumes/status
  - pods/binding
  - pods/eviction
  - podtemplates
  - securitycontextconstraints
  - services/status
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - admissionregistration.k8s.io
  resources:
  - mutatingwebhookconfigurations
  - validatingwebhookconfigurations
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - apps
  resources:
  - controllerrevisions
  - daemonsets/status
  - deployments/status
  - replicasets/status
  - statefulsets/status
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  - customresourcedefinitions/status
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - apiregistration.k8s.io
  resources:
  - apiservices
  - apiservices/status
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers/status
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - batch
  resources:
  - cronjobs/status
  - jobs/status
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - coordination.k8s.io
  resources:
  - leases
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - extensions
  resources:
  - daemonsets/status
  - deployments/status
  - horizontalpodautoscalers
  - horizontalpodautoscalers/status
  - ingresses/status
  - jobs
  - jobs/status
  - podsecuritypolicies
  - replicasets/status
  - replicationcontrollers
  - storageclasses
  - thirdpartyresources
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - events.k8s.io
  resources:
  - events
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - policy
  resources:
  - poddisruptionbudgets/status
  - podsecuritypolicies
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - rbac.authorization.k8s.io
  resources:
  - clusterrolebindings
  - clusterroles
  - rolebindings
  - roles
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - settings.k8s.io
  resources:
  - podpresets
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - storage.k8s.io
  resources:
  - storageclasses
  - volumeattachments
  - volumeattachments/status
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - scheduling.k8s.io
  resources:
  - priorityclasses
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - certificates.k8s.io
  resources:
  - certificatesigningrequests
  - certificatesigningrequests/approval
  - certificatesigningrequests/status
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - authorization.k8s.io
  resources:
  - localsubjectaccessreviews
  - selfsubjectaccessreviews
  - selfsubjectrulesreviews
  - subjectaccessreviews
  verbs:
  - create
- apiGroups:
  - authentication.k8s.io
  resources:
  - tokenreviews
  verbs:
  - create
- apiGroups:
  - ""
  resources:
  - podsecuritypolicyreviews
  - podsecuritypolicyselfsubjectreviews
  - podsecuritypolicysubjectreviews
  verbs:
  - create
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  - nodes/spec
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - nodes/stats
  verbs:
  - create
  - get
- nonResourceURLs:
  - '*'
  verbs:
  - get
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses/status
  verbs:
  - get
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses/status
  verbs:
  - list
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses/status
  verbs:
  - watch
- apiGroups:
  - node.k8s.io
  resources:
  - runtimeclasses
  verbs:
  - get
- apiGroups:
  - node.k8s.io
  resources:
  - runtimeclasses
  verbs:
  - list
- apiGroups:
  - node.k8s.io
  resources:
  - runtimeclasses
  verbs:
  - watch
- apiGroups:
  - storage.k8s.io
  resources:
  - csidrivers
  verbs:
  - get
- apiGroups:
  - storage.k8s.io
  resources:
  - csidrivers
  verbs:
  - list
- apiGroups:
  - storage.k8s.io
  resources:
  - csidrivers
  verbs:
  - watch
- apiGroups:
  - storage.k8s.io
  resources:
  - csinodes
  verbs:
  - get
- apiGroups:
  - storage.k8s.io
  resources:
  - csinodes
  verbs:
  - list
- apiGroups:
  - storage.k8s.io
  resources:
  - csinodes
  verbs:
  - watch
- apiGroups:
  - operators.coreos.com
  resources:
  - clusterserviceversions
  - catalogsources
  - installplans
  - subscriptions
  - operatorgroups
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - packages.operators.coreos.com
  resources:
  - packagemanifests
  - packagemanifests/icon
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - packages.operators.coreos.com
  resources:
  - packagemanifests
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - namespaces
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - configmaps
  - endpoints
  - persistentvolumeclaims
  - pods
  - replicationcontrollers
  - replicationcontrollers/scale
  - serviceaccounts
  - services
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - bindings
  - events
  - limitranges
  - namespaces/status
  - pods/log
  - pods/status
  - replicationcontrollers/status
  - resourcequotas
  - resourcequotas/status
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - namespaces
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - apps
  resources:
  - controllerrevisions
  - daemonsets
  - deployments
  - deployments/scale
  - replicasets
  - replicasets/scale
  - statefulsets
  - statefulsets/scale
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - batch
  resources:
  - cronjobs
  - jobs
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - extensions
  resources:
  - daemonsets
  - deployments
  - deployments/scale
  - ingresses
  - networkpolicies
  - replicasets
  - replicasets/scale
  - replicationcontrollers/scale
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - policy
  resources:
  - poddisruptionbudgets
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - networking.k8s.io
  resources:
  - networkpolicies
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses
  verbs:
  - get
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses
  verbs:
  - list
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses
  verbs:
  - watch
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - resourcequotausages
  verbs:
  - get
  - list
  - watch

After you created the ClusterRole you need to create the ClusterRoleBinding:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cluster-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-reader
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: cluster-reader

To give a cluster-reader impersonate permissions you need to create the sudoer ClusterRole with the right to impersonate system:admin:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: sudoer
rules:
- apiGroups:
  - ""
  resourceNames:
  - system:admin
  resources:
  - systemusers
  - users
  verbs:
  - impersonate
- apiGroups:
  - ""
  resourceNames:
  - system:masters
  resources:
  - groups
  - systemgroups
  verbs:
  - impersonate

Create the ClusterRoleBinding:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: sudoer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: sudoer
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: cluster-reader

For a cluster-reader to impersonate and get cluster-admin privileges you use the following kubectl options –as-group and –as:

kubectl get nodes --as-group system:masters --as system:admin

You want to restrict the membership of the IAM admin role as much as possible as everyone should only use the read permissions to not accidentally delete Kubernetes or AWS resources.

Running WordPress on OpenShift

This here is just a simple example deploying a web application like WordPress on an OpenShift cluster.

I am doing this via the command line because it is much quicker but you can also do this via the WebUI. First, we need to log in:

[vagrant@origin-master ~]$ oc login https://console.paas.domain.com:8443
The server is using a certificate that does not match its hostname: x509: certificate is not valid for any names, but wanted to match console.paas.domain.com
You can bypass the certificate check, but any data you send to the server could be intercepted by others.
Use insecure connections? (y/n): y

Authentication required for https://console.paas.domain.com:8443 (openshift)
Username: demo
Password:
Login successful.

You don't have any projects. You can try to create a new project, by running

    oc new-project 

[vagrant@origin-master ~]$

OpenShift tells us that we have no project, so let’s create a new project with the name testing:

[vagrant@origin-master ~]$ oc new-project testing
Now using project "testing" on server "https://console.paas.domain.com:8443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git

to build a new example application in Ruby.
[vagrant@origin-master ~]$

After we have created the project we need to start creating the first app, in our case for WordPress we need MySQL for the database. This is just an example because the MySQL is not using persistent storage and I use sample DB information it created:

[vagrant@origin-master ~]$ oc new-app mysql-ephemeral
--> Deploying template "openshift/mysql-ephemeral" to project testing2

     MySQL (Ephemeral)
     ---------
     MySQL database service, without persistent storage. For more information about using this template, including OpenShift considerations, see https://github.com/sclorg/mysql-container/blob/master/5.7/README.md.

     WARNING: Any data stored will be lost upon pod destruction. Only use this template for testing

     The following service(s) have been created in your project: mysql.

            Username: user1M5
            Password: 5KmiOFJ4UH2UIKbG
       Database Name: sampledb
      Connection URL: mysql://mysql:3306/

     For more information about using this template, including OpenShift considerations, see https://github.com/sclorg/mysql-container/blob/master/5.7/README.md.

     * With parameters:
        * Memory Limit=512Mi
        * Namespace=openshift
        * Database Service Name=mysql
        * MySQL Connection Username=user1M5
        * MySQL Connection Password=5KmiOFJ4UH2UIKbG # generated
        * MySQL root user Password=riPsYFaVEpHBYAWf # generated
        * MySQL Database Name=sampledb
        * Version of MySQL Image=5.7

--> Creating resources ...
    secret "mysql" created
    service "mysql" created
    deploymentconfig "mysql" created
--> Success
    Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
     'oc expose svc/mysql'
    Run 'oc status' to view your app.
[vagrant@origin-master ~]$

Next, let us create a PHP app and pulling the latest WordPress install from Github.

[vagrant@origin-master ~]$ oc new-app php~https://github.com/wordpress/wordpress
-->; Found image fa73ae7 (5 days old) in image stream "openshift/php" under tag "7.0" for "php"

    Apache 2.4 with PHP 7.0
    -----------------------
    PHP 7.0 available as docker container is a base platform for building and running various PHP 7.0 applications and frameworks. PHP is an HTML-embedded scripting language. PHP attempts to make it easy for developers to write dynamically generated web pages. PHP also offers built-in database integration for several commercial and non-commercial database management systems, so writing a database-enabled webpage with PHP is fairly simple. The most common use of PHP coding is probably as a replacement for CGI scripts.

    Tags: builder, php, php70, rh-php70

    * A source build using source code from https://github.com/wordpress/wordpress will be created
      * The resulting image will be pushed to image stream "wordpress:latest"
      * Use 'start-build' to trigger a new build
    * This image will be deployed in deployment config "wordpress"
    * Ports 8080/tcp, 8443/tcp will be load balanced by service "wordpress"
      * Other containers can access this service through the hostname "wordpress"

-->; Creating resources ...
    imagestream "wordpress" created
    buildconfig "wordpress" created
    deploymentconfig "wordpress" created
    service "wordpress" created
-->; Success
    Build scheduled, use 'oc logs -f bc/wordpress' to track its progress.
    Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
     'oc expose svc/wordpress'
    Run 'oc status' to view your app.
[vagrant@origin-master ~]$

Last but not least I need to expose the PHP WordPress app:

[vagrant@origin-master ~]$ oc expose service wordpress
route "wordpress" exposed
[vagrant@origin-master ~]$

Here is a short overview of how it looks in the OpenShift web console, you see the deployed MySQL pod and the PHP pod which run WordPress:

Next, you open the URL http://wordpress-testing.paas.domain.com/ and see the WordPress install menu where you start configuring the database connection using the info I got from OpenShift when I created the MySQL pod:

I can customise the DB connection settings if I want to have a more permanent solution:

A voilà, your WordPress is deployed in OpenShift:

This is just a very simple example but it took me less than a minute to deploy the web application. Read also my other post about Deploying OpenShift Origin Cluster using Ansible.

Please share your feedback.

Leave a comment

Deploy OpenShift 3.7 Origin Cluster with Ansible

Something completely different to my more network related posts, this time it is about Platform as a Service with OpenShift Origin. There is a big push for containerized platform services from development.

I was testing the official OpenShift Origin Ansible Playbook to install a small 5 node cluster and created an OpenShift Vagrant environment for this.

Cluster overview:

I recommend having a look at the official RedHat OpenShift documentation to understand the architecture because it is quite a complex platform.

As a pre-requisite, you need to install the vagrant hostmanager because Openshift needs to resolve hostnames and I don’t want to install a separate DNS server. Here you find more information: https://github.com/devopsgroup-io/vagrant-hostmanager

vagrant plugin install vagrant-hostmanager

sudo bash -c 'cat << EOF > /etc/sudoers.d/vagrant_hostmanager2
Cmnd_Alias VAGRANT_HOSTMANAGER_UPDATE = /bin/cp <your-home-folder>/.vagrant.d/tmp/hosts.local /etc/hosts
%sudo ALL=(root) NOPASSWD: VAGRANT_HOSTMANAGER_UPDATE
EOF'

Next, clone my Vagrant repository and the official OpenShift Origin ansible:

git clone [email protected]:berndonline/openshift-origin-vagrant.git
git clone [email protected]:openshift/openshift-ansible.git

Let’s start first by booting the OpenShift vagrant environment:

cd openshift-origin-vagrant/
./vagrant_up.sh

The vagrant host manager will update dynamically the /etc/hosts file on both the Guest and the Host machine:

...
## vagrant-hostmanager-start id: 55ed9acf-25e9-4b19-bfab-e0812a292dc0
10.255.1.81	origin-master

10.255.1.231	origin-etcd

10.255.1.182	origin-infra

10.255.1.72	origin-node-1

10.255.1.145	origin-node-2

## vagrant-hostmanager-end
...

Let’s have a quick look at the OpenShift inventory file. This has settings for the different node types and custom OpenShift and Vagrant variables. You need to modify a few things like public hostname and default subdomain:

OSEv3:children]
masters
nodes
etcd

[OSEv3:vars]
ansible_ssh_user=vagrant
ansible_become=yes

deployment_type=origin
openshift_release=v3.7.0
containerized=true
openshift_install_examples=true
enable_excluders=false
openshift_check_min_host_memory_gb=4
openshift_disable_check=docker_image_availability,docker_storage,disk_availability

# use htpasswd authentication with demo/demo
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
openshift_master_htpasswd_users={'demo': '$apr1$.MaA77kd$Rlnn6RXq9kCjnEfh5I3w/.'}

# put the router on dedicated infra node
openshift_hosted_router_selector='region=infra'
openshift_master_default_subdomain=origin.paas.domain.com

# put the image registry on dedicated infra node
openshift_hosted_registry_selector='region=infra'

# project pods should be placed on primary nodes
osm_default_node_selector='region=primary'

# Vagrant variables
ansible_port='22' 
ansible_user='vagrant'
ansible_ssh_private_key_file='/home/berndonline/.vagrant.d/insecure_private_key'

[masters]
origin-master  openshift_public_hostname="console.paas.domain.com"

[etcd]
origin-etcd

[nodes]
# master needs to be included in the node to be configured in the SDN
origin-master
origin-infra openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
origin-node-[1:2] openshift_node_labels="{'region': 'primary', 'zone': 'default'}"

Now that we are ready, we need to check out the latest release and execute the Ansible Playbook:

cd openshift-ansible/
git checkout release-3.7
ansible-playbook ./playbooks/byo/config.yml -i ../openshift-origin-vagrant/inventory

The playbook takes forever to run, so do something else for the next 10 to 15 mins.

...

PLAY RECAP **********************************************************************************************************************************************************
localhost                  : ok=13   changed=0    unreachable=0    failed=0
origin-etcd                : ok=147  changed=47   unreachable=0    failed=0
origin-infra               : ok=202  changed=61   unreachable=0    failed=0
origin-master              : ok=561  changed=224  unreachable=0    failed=0
origin-node                : ok=201  changed=61   unreachable=0    failed=0


INSTALLER STATUS ****************************************************************************************************************************************************
Initialization             : Complete
Health Check               : Complete
etcd Install               : Complete
Master Install             : Complete
Master Additional Install  : Complete
Node Install               : Complete
Hosted Install             : Complete
Service Catalog Install    : Complete

Sunday 21 January 2018  20:55:16 +0100 (0:00:00.011)       0:11:56.549 ********
===============================================================================
etcd : Pull etcd container ---------------------------------------------------------------------------------------------------------------------------------- 79.51s
openshift_hosted : Ensure OpenShift pod correctly rolls out (best-effort today) ----------------------------------------------------------------------------- 31.54s
openshift_node : Pre-pull node image when containerized ----------------------------------------------------------------------------------------------------- 31.28s
template_service_broker : Verify that TSB is running -------------------------------------------------------------------------------------------------------- 30.87s
docker : Install Docker ------------------------------------------------------------------------------------------------------------------------------------- 30.41s
docker : Install Docker ------------------------------------------------------------------------------------------------------------------------------------- 26.32s
openshift_cli : Pull CLI Image ------------------------------------------------------------------------------------------------------------------------------ 23.03s
openshift_service_catalog : wait for api server to be ready ------------------------------------------------------------------------------------------------- 21.32s
openshift_hosted : Ensure OpenShift pod correctly rolls out (best-effort today) ----------------------------------------------------------------------------- 16.27s
restart master api ------------------------------------------------------------------------------------------------------------------------------------------ 10.69s
restart master controllers ---------------------------------------------------------------------------------------------------------------------------------- 10.62s
openshift_node : Start and enable node ---------------------------------------------------------------------------------------------------------------------- 10.42s
openshift_node : Start and enable node ---------------------------------------------------------------------------------------------------------------------- 10.30s
openshift_master : Start and enable master api on first master ---------------------------------------------------------------------------------------------- 10.21s
openshift_master : Start and enable master controller service ----------------------------------------------------------------------------------------------- 10.19s
os_firewall : Install iptables packages --------------------------------------------------------------------------------------------------------------------- 10.15s
os_firewall : Wait 10 seconds after disabling firewalld ----------------------------------------------------------------------------------------------------- 10.07s
os_firewall : need to pause here, otherwise the iptables service starting can sometimes cause ssh to fail --------------------------------------------------- 10.05s
openshift_node : Pre-pull node image when containerized ------------------------------------------------------------------------------------------------------ 7.85s
openshift_service_catalog : oc_process ----------------------------------------------------------------------------------------------------------------------- 7.44s

To publish both the openshift_public_hostname and openshift_master_default_subdomain, I have a Nginx reverse proxy running and publish 8443 from the origin-master and 80, 443 from the origin-infra nodes.

Here a Nginx example:

server {
  listen 8443 ssl;
  listen [::]:8443 ssl;
  server_name console.paas.domain.com;

  ssl on;
  ssl_certificate /etc/nginx/ssl/paas.domain.com-cert.pem;
  ssl_certificate_key /etc/nginx/ssl/paas.domain.com-key.pem;

  access_log  /var/log/nginx/openshift-console_access.log;
  error_log   /var/log/nginx/openshift-console_error.log;

location / {
  proxy_pass https://10.255.1.81:8443;
  proxy_http_version 1.1;
  proxy_set_header Upgrade $http_upgrade;
  proxy_set_header Connection 'upgrade';
  proxy_set_header Host $host;
  proxy_cache_bypass $http_upgrade;

  }
}

I will try to write more about OpenShift and Platform as a Service and how to deploy small applications like WordPress.

Have fun testing OpenShift and please share your feedback.

Leave a comment

Ansible Playbook for VyOS and BGP Routing

I am currently looking into different possibilities for Open Source alternatives to commercial routers from Cisco or Juniper to use in Amazon AWS Transit VPCs. One option is to completely build the software router by myself with a Debian Linux, FRR (Free Range Routing) and StrongSwan, read my post about the self-build software router: Open Source Routing GRE over IPSec with StrongSwan and Cisco IOS-XE

A few years back I was working with Juniper JunOS routers and I thought I’d give VyOS a try because the command line which is very similar.

I replicated the same Vagrant topology for my Ansible Playbook for Cisco BGP Routing Topology but used VyOS instead of Cisco.

Network overview:

Here are the repositories for the Vagrant topology https://github.com/berndonline/vyos-lab-vagrant and the Ansible Playbook https://github.com/berndonline/vyos-lab-provision.

The Ansible Playbook site.yml is very simple, using the Ansible vyos_system for changing the hostname and the module vyos_config for interface and routing configuration:

---

- hosts: all

  connection: local
  user: '{{ ansible_ssh_user }}'
  gather_facts: 'no'

  roles:
    - hostname
    - interfaces
    - routing

Here is an example from host_vars rtr-1.yml:

---

hostname: rtr-1
domain_name: lab.local

loopback:
  dum0:
    alias: dummy loopback0
    address: 10.255.0.1
    mask: /32

interfaces:
  eth1:
    alias: connection rtr-2
    address: 10.0.255.1
    mask: /30

  eth2:
    alias: connection rtr-3
    address: 10.0.255.5
    mask: /30

bgp:
  asn: 65001
  neighbor:
    - {address: 10.0.255.2, remote_as: 65000}
    - {address: 10.0.255.6, remote_as: 65000}
  networks:
    - {network: 10.0.255.0, mask: /30}
    - {network: 10.0.255.4, mask: /30}
    - {network: 10.255.0.1, mask: /32}
  maxpath: 2

The template interfaces.j2 for the interface configuration:

{% if loopback is defined %}
{% for port, value in loopback.items() %}
set interfaces dummy {{ port }} address '{{ value.address }}{{ value.mask }}'
set interfaces dummy {{ port }} description '{{ value.alias }}'
{% endfor %}
{% endif %}

{% if interfaces is defined %}
{% for port, value in interfaces.items() %}
set interfaces ethernet {{ port }} address '{{ value.address }}{{ value.mask }}'
set interfaces ethernet {{ port }} description '{{ value.alias }}'
{% endfor %}
{% endif %}

This is the template routing.j2 for the routing configuration:

{% if bgp is defined %}
{% if bgp.maxpath is defined %}
set protocols bgp {{ bgp.asn }} maximum-paths ebgp '{{ bgp.maxpath }}'
{% endif %}
{% for item in bgp.neighbor %}
set protocols bgp {{ bgp.asn }} neighbor {{ item.address }} ebgp-multihop '2'
set protocols bgp {{ bgp.asn }} neighbor {{ item.address }} remote-as '{{ item.remote_as }}'
{% endfor %}
{% for item in bgp.networks %}
set protocols bgp {{ bgp.asn }} network '{{ item.network }}{{ item.mask }}'
{% endfor %}
set protocols bgp {{ bgp.asn }} parameters router-id '{{ loopback.dum0.address }}'
{% endif %}

The output of the running Ansible Playbook:

PLAY [all] *********************************************************************

TASK [hostname : write hostname and domain-name] *******************************
changed: [rtr-3]
changed: [rtr-2]
changed: [rtr-4]
changed: [rtr-1]

TASK [interfaces : write interfaces config] ************************************
changed: [rtr-4]
changed: [rtr-1]
changed: [rtr-3]
changed: [rtr-2]

TASK [routing : write routing config] ******************************************
changed: [rtr-2]
changed: [rtr-4]
changed: [rtr-3]
changed: [rtr-1]

PLAY RECAP *********************************************************************
rtr-1                      : ok=3    changed=3    unreachable=0    failed=0   
rtr-2                      : ok=3    changed=3    unreachable=0    failed=0   
rtr-3                      : ok=3    changed=3    unreachable=0    failed=0   
rtr-4                      : ok=3    changed=3    unreachable=0    failed=0   

Like in all my other Ansible Playbooks I use some kind of validation, a simple ping check vyos_check_icmp.yml to see if the configuration is correctly deployed:

---

- hosts: all

  connection: local
  user: '{{ ansible_ssh_user }}'
  gather_facts: 'no'

  tasks:
    - name: validate connection from rtr-1
      vyos_command:
        commands: 'ping {{ item }} count 4'
      when: "'rtr-1' in inventory_hostname"
      with_items:
        - '10.0.255.2'
        - '10.0.255.6'

    - name: validate connection from rtr-2
      vyos_command:
        commands: 'ping {{ item }} count 4'
      when: "'rtr-2' in inventory_hostname"
      with_items:
        - '10.0.255.1'
        - '10.0.254.1'
        - '10.0.253.2'
...

The output of the icmp validation Playbook:

PLAY [all] *********************************************************************

TASK [validate connection from rtr-1] ******************************************
skipping: [rtr-3] => (item=10.0.255.2) 
skipping: [rtr-3] => (item=10.0.255.6) 
skipping: [rtr-2] => (item=10.0.255.2) 
skipping: [rtr-2] => (item=10.0.255.6) 
skipping: [rtr-4] => (item=10.0.255.2) 
skipping: [rtr-4] => (item=10.0.255.6) 
ok: [rtr-1] => (item=10.0.255.2)
ok: [rtr-1] => (item=10.0.255.6)

TASK [validate connection from rtr-2] ******************************************
skipping: [rtr-3] => (item=10.0.255.1) 
skipping: [rtr-3] => (item=10.0.254.1) 
skipping: [rtr-1] => (item=10.0.255.1) 
skipping: [rtr-3] => (item=10.0.253.2) 
skipping: [rtr-1] => (item=10.0.254.1) 
skipping: [rtr-1] => (item=10.0.253.2) 
skipping: [rtr-4] => (item=10.0.255.1) 
skipping: [rtr-4] => (item=10.0.254.1) 
skipping: [rtr-4] => (item=10.0.253.2) 
ok: [rtr-2] => (item=10.0.255.1)
ok: [rtr-2] => (item=10.0.254.1)
ok: [rtr-2] => (item=10.0.253.2)

TASK [validate connection from rtr-3] ******************************************
skipping: [rtr-1] => (item=10.0.255.5) 
skipping: [rtr-1] => (item=10.0.254.5) 
skipping: [rtr-2] => (item=10.0.255.5) 
skipping: [rtr-1] => (item=10.0.253.1) 
skipping: [rtr-2] => (item=10.0.254.5) 
skipping: [rtr-2] => (item=10.0.253.1) 
skipping: [rtr-4] => (item=10.0.255.5) 
skipping: [rtr-4] => (item=10.0.254.5) 
skipping: [rtr-4] => (item=10.0.253.1) 
ok: [rtr-3] => (item=10.0.255.5)
ok: [rtr-3] => (item=10.0.254.5)
ok: [rtr-3] => (item=10.0.253.1)

TASK [validate connection from rtr-4] ******************************************
skipping: [rtr-3] => (item=10.0.254.2) 
skipping: [rtr-3] => (item=10.0.254.6) 
skipping: [rtr-1] => (item=10.0.254.2) 
skipping: [rtr-1] => (item=10.0.254.6) 
skipping: [rtr-2] => (item=10.0.254.2) 
skipping: [rtr-2] => (item=10.0.254.6) 
ok: [rtr-4] => (item=10.0.254.2)
ok: [rtr-4] => (item=10.0.254.6)

TASK [validate bgp connection from rtr-1] **************************************
skipping: [rtr-3] => (item=10.255.0.4) 
skipping: [rtr-2] => (item=10.255.0.4) 
skipping: [rtr-4] => (item=10.255.0.4) 
ok: [rtr-1] => (item=10.255.0.4)

TASK [validate bgp connection from rtr-4] **************************************
skipping: [rtr-3] => (item=10.255.0.1) 
skipping: [rtr-1] => (item=10.255.0.1) 
skipping: [rtr-2] => (item=10.255.0.1) 
ok: [rtr-4] => (item=10.255.0.1)

PLAY RECAP *********************************************************************
rtr-1                      : ok=2    changed=0    unreachable=0    failed=0   
rtr-2                      : ok=1    changed=0    unreachable=0    failed=0   
rtr-3                      : ok=1    changed=0    unreachable=0    failed=0   
rtr-4                      : ok=2    changed=0    unreachable=0    failed=0   

As you see, the configuration is successfully deployed and BGP connectivity between the nodes.

Leave a comment