Network Monitoring with Prometheus and Cumulus Linux

As promised in my previous article Install Prometheus and Grafana, this post is about how to monitor Cumulus Linux switches with Prometheus.

Let’s start directly by installing the Prometheus Node_Exporter:

sudo useradd --no-create-home --shell /bin/false node_exporter

tar xvf node_exporter-0.15.1.linux-amd64.tar.gz
sudo cp node_exporter-0.15.1.linux-amd64/node_exporter /usr/local/bin
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

sudo bash -c 'cat << EOF > /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF'

sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl status node_exporter

Check that the Node_Exporter service is correctly running and listing on tcp 9100 for the Prometheus server to collect the metrics from the switches:

[email protected]:~$ sudo systemctl status node_exporter
● node_exporter.service - Node Exporter
   Loaded: loaded (/etc/systemd/system/node_exporter.service; disabled)
   Active: active (running) since Thu 2018-03-22 13:41:26 UTC; 958ms ago
 Main PID: 5620 (node_exporter)
   CGroup: /system.slice/node_exporter.service
           └─5620 /usr/local/bin/node_exporter

Mar 22 13:41:26 spine-2 node_exporter[5620]: time="2018-03-22T13:41:26Z" level=info msg=" - sockstat" source="node_exporter.go:52"
Mar 22 13:41:26 spine-2 node_exporter[5620]: time="2018-03-22T13:41:26Z" level=info msg=" - bcache" source="node_exporter.go:52"
Mar 22 13:41:26 spine-2 node_exporter[5620]: time="2018-03-22T13:41:26Z" level=info msg=" - hwmon" source="node_exporter.go:52"
Mar 22 13:41:26 spine-2 node_exporter[5620]: time="2018-03-22T13:41:26Z" level=info msg=" - cpu" source="node_exporter.go:52"
Mar 22 13:41:26 spine-2 node_exporter[5620]: time="2018-03-22T13:41:26Z" level=info msg=" - stat" source="node_exporter.go:52"
Mar 22 13:41:26 spine-2 node_exporter[5620]: time="2018-03-22T13:41:26Z" level=info msg=" - timex" source="node_exporter.go:52"
Mar 22 13:41:26 spine-2 node_exporter[5620]: time="2018-03-22T13:41:26Z" level=info msg=" - textfile" source="node_exporter.go:52"
Mar 22 13:41:26 spine-2 node_exporter[5620]: time="2018-03-22T13:41:26Z" level=info msg=" - conntrack" source="node_exporter.go:52"
Mar 22 13:41:26 spine-2 node_exporter[5620]: time="2018-03-22T13:41:26Z" level=info msg=" - edac" source="node_exporter.go:52"
Mar 22 13:41:26 spine-2 node_exporter[5620]: time="2018-03-22T13:41:26Z" level=info msg="Listening on :9100" source="node_exporter.go:76"
[email protected]:~$

I created a simple dashboard in Grafana for the switches running Cumulus Linux, where you can find important metrics like throughput of the network interfaces, CPU load, Memory and disk related information:

On the top right corner you can select the switch where you want to see metrics from:

You can also have a central monitoring dashboard where all performance metrics are shown:

Here are detailed views with information about all interfaces from the different switch groups:

This is a very simple solution to monitor your Cumulus Linux switches and in combination with Cumulus NetQ enough to monitor your switch fabric.

FYI, I have used the following virtual topology BGP EVPN and VXLAN with Cumulus Linux.

Please share your feedback and leave a comment.

Install Prometheus and Grafana

Moving away from Cisco and using Open Networking whitebox switches with Cumulus Linux made me think about performance monitoring. In the past I was a fan of Solarwinds NPM but the traditional SNMP based monitoring is pretty outdated and not standard anymore when using Linux based operating systems. I was exploring different other options and came across Prometheus and Grafana.

This is post about how to install Prometheus and Grafana on a central monitoring server, the next post will be about how to integrate Cumulus Linux switches and report metrics to Prometheus and then visualise them with Grafana.

Let’s start installing Prometheus base packages:

sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus

cd ~
curl -LO https://github.com/prometheus/prometheus/releases/download/v2.0.0/prometheus-2.0.0.linux-amd64.tar.gz
tar xvf prometheus-2.0.0.linux-amd64.tar.gz
sudo cp prometheus-2.0.0.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.0.0.linux-amd64/promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
sudo cp -r prometheus-2.0.0.linux-amd64/consoles /etc/prometheus
sudo cp -r prometheus-2.0.0.linux-amd64/console_libraries /etc/prometheus
sudo chown -R prometheus:prometheus /etc/prometheus/consoles
sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries
rm -rf prometheus-2.0.0.linux-amd64.tar.gz prometheus-2.0.0.linux-amd64

sudo touch /etc/prometheus/prometheus.yml 
sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml

sudo bash -c 'cat << EOF > /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries

[Install]
WantedBy=multi-user.target
EOF'

We have now installed the Prometheus base package but to collect metrics you also need to install the Prometheus Node Exporter:

sudo useradd --no-create-home --shell /bin/false node_exporter

cd ~
curl -LO https://github.com/prometheus/node_exporter/releases/download/v0.15.1/node_exporter-0.15.1.linux-amd64.tar.gz
tar xvf node_exporter-0.15.1.linux-amd64.tar.gz
sudo cp node_exporter-0.15.1.linux-amd64/node_exporter /usr/local/bin
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
rm -rf node_exporter-0.15.1.linux-amd64.tar.gz node_exporter-0.15.1.linux-amd64

sudo bash -c 'cat << EOF > /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF'

Configure Prometheus and define node_exporter targets:

sudo bash -c 'cat << EOF > /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'node_exporter'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9100']  
EOF'

Start services and access the web console:

sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl start node_exporter

Access the Prometheus web console via http://localhost:9090:

Under “Status -> Targets” you can check if the services state is up:

Let’s continue by installing Grafana:

curl https://packagecloud.io/gpg.key | sudo apt-key add -
sudo add-apt-repository "deb https://packagecloud.io/grafana/stable/debian/ stretch main"
sudo apt-get update
sudo apt-get install grafana
sudo systemctl start grafana-server
sudo systemctl status grafana-server
sudo systemctl enable grafana-server

Now you can access Grafana via http://localhost:3000/. I would recommend putting a Ngnix reverse proxy in-front for SSL encryption.

In the web console we need to configure the data source and point it to Prometheus. To do that go to “settings” and select “data source”:

You should import the following Prometheus dashboard for Grafana otherwise you need to manually configure your dashboard:

For the install of Prometheus and the Node_Exporter I will write two Ansible roles which I will share later. Read my new post about Network Monitoring with Prometheus and Cumulus Linux!

Please share your feedback and leave a comment.

Cumulus Networks NetQ telemetry-based validation system

I had some time to play around with the new NetQ tool from Cumulus which checks your Cumulus Linux switch fabric.

I did some testing with my Cumulus Layer 2 Fabric example: Ansible Playbook for Cumulus Linux (Layer 2 Fabric)

You need to download the NetQ VM from Cumulus as VMware or VirtualBox template: here

It is a great tool to centrally check your Cumulus switches and keep history about changes in your environment. NetQ can send out notification about changes in your fabric which is nice because you are always up-to-date what is going on in your network.

Installing NetQ agent on a Cumulus Linux Switch:

[email protected]:~$ sudo apt-get update
[email protected]:~$ sudo apt-get install cumulus-netq -y

Configuring the NetQ Agent on a switch:

[email protected]:~$ sudo systemctl restart rsyslog
[email protected]:~$ netq add server 192.168.100.133
[email protected]:~$ netq agent restart

I will write a small Ansible script in the next days to automate the agent installation and configuration.

Connect to Cumulus NetQ VM and check agent connectivity

[email protected]:~$ netq-shell

Welcome to Cumulus (R) NetQ Command Line Interface
TIP: Type `netq help` to get started.

[email protected]:/$ netq show agents
Node     Status    Sys Uptime    Agent Uptime
-------  --------  ------------  --------------
leaf-1   Fresh     1h ago        1h ago
leaf-2   Fresh     1h ago        1h ago
spine-1  Fresh     1h ago        1h ago
spine-2  Fresh     1h ago        1h ago
[email protected]:/$

Basic Show Commands:

[email protected]:/$ netq show clag
Matching CLAG session records are:
Node             Peer             SysMac            State Backup #Links #Dual Last Changed
---------------- ---------------- ----------------- ----- ------ ------ ----- --------------
leaf-1           leaf-2(P)        44:38:39:ff:40:93 up    up     1      1     8m ago
leaf-2(P)        leaf-1           44:38:39:ff:40:93 up    up     1      1     8m ago
spine-1(P)       spine-2          44:38:39:ff:40:94 up    up     1      1     8m ago
spine-2          spine-1(P)       44:38:39:ff:40:94 up    up     1      1     9m ago
[email protected]:/$
 
[email protected]:/$ netq show lldp
LLDP peer info for *:*
Node     Interface    LLDP Peer    Peer Int    Last Changed
-------  -----------  -----------  ----------  --------------
leaf-1   eth0         cumulus      eth0        1h ago
leaf-1   eth0         leaf-2       eth0        1h ago
leaf-1   eth0         spine-1      eth0        1h ago
leaf-1   eth0         spine-2      eth0        1h ago
leaf-1   swp1         spine-1      swp1        1h ago
leaf-1   swp11        leaf-2       swp11       9m ago
leaf-1   swp2         spine-2      swp1        1h ago
leaf-2   eth0         cumulus      eth0        1h ago
leaf-2   eth0         leaf-1       eth0        1h ago
leaf-2   eth0         spine-1      eth0        1h ago
leaf-2   eth0         spine-2      eth0        1h ago
leaf-2   swp1         spine-2      swp2        1h ago
leaf-2   swp11        leaf-1       swp11       8m ago
leaf-2   swp2         spine-1      swp2        1h ago
spine-1  eth0         cumulus      eth0        1h ago
spine-1  eth0         leaf-1       eth0        1h ago
spine-1  eth0         leaf-2       eth0        1h ago
spine-1  eth0         spine-2      eth0        1h ago
spine-1  swp1         leaf-1       swp1        1h ago
spine-1  swp11        spine-2      swp11       1h ago
spine-1  swp2         leaf-2       swp2        8m ago
spine-2  eth0         cumulus      eth0        1h ago
spine-2  eth0         leaf-1       eth0        1h ago
spine-2  eth0         leaf-2       eth0        1h ago
spine-2  eth0         spine-1      eth0        1h ago
spine-2  swp1         leaf-1       swp2        1h ago
spine-2  swp11        spine-1      swp11       1h ago
spine-2  swp2         leaf-2       swp1        8m ago
[email protected]:/$
 
[email protected]:/$ netq show interfaces type bond
Matching interface records are:
Node             Interface        Type     State Details                     Last Changed
---------------- ---------------- -------- ----- --------------------------- --------------
leaf-1           bond1            bond     up    Slave: swp1(spine-1:swp1),  10m ago
                                                 Slave: swp2(spine-2:swp1),
                                                 VLANs:  100-199, PVID: 1,
                                                 Master: bridge, MTU: 1500
leaf-1           peerlink         bond     up    Slave: swp11(leaf-2:swp11), 10m ago
                                                 VLANs: , PVID: 0,
                                                 Master: peerlink, MTU: 1500
leaf-2           bond1            bond     up    Slave: swp1(spine-2:swp2),  10m ago
                                                 Slave: swp2(spine-1:swp2),
                                                 VLANs:  100-199, PVID: 1,
                                                 Master: bridge, MTU: 1500
leaf-2           peerlink         bond     up    Slave: swp11(leaf-1:swp11), 10m ago
                                                 VLANs: , PVID: 0,
                                                 Master: peerlink, MTU: 1500
spine-1          bond1            bond     up    Slave: swp1(leaf-1:swp1),   10m ago
                                                 Slave: swp2(leaf-2:swp2),
                                                 VLANs:  100-199, PVID: 1,
                                                 Master: bridge, MTU: 1500
spine-1          peerlink         bond     up    Slave: swp11(spine-2:swp11) 1h ago
                                                 , VLANs:  100-199,
                                                 PVID: 1, Master: bridge,
                                                 MTU: 1500
spine-2          bond1            bond     up    Slave: swp1(leaf-1:swp2),   10m ago
                                                 Slave: swp2(leaf-2:swp1),
                                                 VLANs:  100-199, PVID: 1,
                                                 Master: bridge, MTU: 1500
spine-2          peerlink         bond     up    Slave: swp11(spine-1:swp11) 1h ago
                                                 , VLANs:  100-199,
                                                 PVID: 1, Master: bridge,
                                                 MTU: 1500
[email protected]:/$
 
[email protected]:/$ netq show ip routes
Matching IP route records are:
Origin Table            IP               Node             Nexthops                   Last Changed
------ ---------------- ---------------- ---------------- -------------------------- ----------------
1      default          169.254.1.0/30   leaf-1           peerlink.4093              11m ago
1      default          169.254.1.0/30   leaf-2           peerlink.4093              11m ago
1      default          169.254.1.0/30   spine-1          peerlink.4094              1h ago
1      default          169.254.1.0/30   spine-2          peerlink.4094              1h ago
1      default          169.254.1.1/32   leaf-1           peerlink.4093              11m ago
1      default          169.254.1.1/32   spine-1          peerlink.4094              1h ago
1      default          169.254.1.2/32   leaf-2           peerlink.4093              11m ago
1      default          169.254.1.2/32   spine-2          peerlink.4094              1h ago
1      default          192.168.100.0/24 leaf-1           eth0                       1h ago
1      default          192.168.100.0/24 leaf-2           eth0                       1h ago
1      default          192.168.100.0/24 spine-1          eth0                       1h ago
1      default          192.168.100.0/24 spine-2          eth0                       1h ago
1      default          192.168.100.205/ spine-1          eth0                       1h ago
                        32
1      default          192.168.100.206/ spine-2          eth0                       1h ago
                        32
1      default          192.168.100.207/ leaf-1           eth0                       1h ago
                        32
1      default          192.168.100.208/ leaf-2           eth0                       1h ago
                        32
0      vrf-prod         0.0.0.0/0        spine-1          Blackhole                  1h ago
0      vrf-prod         0.0.0.0/0        spine-2          Blackhole                  1h ago
1      vrf-prod         10.1.0.0/24      spine-1          bridge.100                 1h ago
1      vrf-prod         10.1.0.0/24      spine-2          bridge.100                 1h ago
1      vrf-prod         10.1.0.252/32    spine-1          bridge.100                 1h ago
1      vrf-prod         10.1.0.253/32    spine-2          bridge.100                 1h ago
1      vrf-prod         10.1.0.254/32    spine-1          bridge.100                 1h ago
1      vrf-prod         10.1.0.254/32    spine-2          bridge.100                 1h ago
1      vrf-prod         10.1.1.0/24      spine-1          bridge.101                 1h ago
1      vrf-prod         10.1.1.0/24      spine-2          bridge.101                 1h ago
1      vrf-prod         10.1.1.252/32    spine-1          bridge.101                 1h ago
1      vrf-prod         10.1.1.253/32    spine-2          bridge.101                 1h ago
1      vrf-prod         10.1.1.254/32    spine-1          bridge.101                 1h ago
1      vrf-prod         10.1.1.254/32    spine-2          bridge.101                 1h ago
1      vrf-prod         10.1.2.0/24      spine-1          bridge.102                 1h ago
1      vrf-prod         10.1.2.0/24      spine-2          bridge.102                 1h ago
1      vrf-prod         10.1.2.252/32    spine-1          bridge.102                 1h ago
1      vrf-prod         10.1.2.253/32    spine-2          bridge.102                 1h ago
1      vrf-prod         10.1.2.254/32    spine-1          bridge.102                 1h ago
1      vrf-prod         10.1.2.254/32    spine-2          bridge.102                 1h ago
[email protected]:/$

See Changes in Switch Fabric:

[email protected]:/$ netq leaf-1 show interfaces type bond
Matching interface records are:
Node             Interface        Type     State Details                     Last Changed
---------------- ---------------- -------- ----- --------------------------- --------------
leaf-1           bond1            bond     up    Slave: swp1(spine-1:swp1),  2s ago
                                                 Slave: swp2(spine-2:swp1),
                                                 VLANs:  100-199, PVID: 1,
                                                 Master: bridge, MTU: 1500
leaf-1           peerlink         bond     up    Slave: swp11(leaf-2:swp11), 21m ago
                                                 VLANs: , PVID: 0,
                                                 Master: peerlink, MTU: 1500
[email protected]:/$
 
[email protected]:~$ sudo ifdown bond1
[email protected]:~$
 
[email protected]:/$ netq leaf-1 show interfaces type bond
Matching interface records are:
Node             Interface        Type     State Details                     Last Changed
---------------- ---------------- -------- ----- --------------------------- --------------
leaf-1           peerlink         bond     up    Slave: swp11(leaf-2:swp11), 22m ago
                                                 VLANs: , PVID: 0,
                                                 Master: peerlink, MTU: 1500
[email protected]:/$
 
[email protected]:/$ netq leaf-1 show interfaces type bond changes
Matching interface records are:
Node             Interface        Type     State Details                     DbState Last Changed
---------------- ---------------- -------- ----- --------------------------- ------- --------------
leaf-1           bond1            bond     down  VLANs: , PVID: 0,           Del     21s ago
                                                 Master: bridge, MTU: 1500
leaf-1           bond1            bond     down  Slave: swp1(),              Add     21s ago
                                                 Slave: swp2(),
                                                 VLANs:  100-199, PVID: 1,
                                                 Master: bridge, MTU: 1500
leaf-1           bond1            bond     up    Slave: swp1(spine-1:swp1),  Add     1m ago
                                                 Slave: swp2(spine-2:swp1),
                                                 VLANs:  100-199, PVID: 1,
                                                 Master: bridge, MTU: 1500 

More information you can find in the Cumulus NetQ documentation: https://docs.cumulusnetworks.com/display/NETQ/NetQ

Uptime – simple http monitoring utility

I found an very interesting http monitoring tool called Uptime using node.js and mongoDB. I directly installed Uptime on one of my Linux servers and from the first look I find it really cool 🙂 before you start you need to get node.js and mongoDB installed on your server and the rest is then very easy.

Ones the Uptime is running you can access the web interface and create the first checks, here some screenshots:

Here you create your http checks and define some settings:

Detailed check overview with graphs:

If you are interested then have a look here: http://fzaninotto.github.com/uptime/