nmon - top on steroids


What is a nmon?

nmon is best described (IMHO) as top on steroids. In fact, in the AIX world (where I came from), their version of top (topas) became nmon. nmon is free (source code has now been released). It works to various degrees on most linux-like OSes.


nmon History

In 2003, Nigel Griffiths published the first public version of nmon. Nigel worked for IBM. In his work on AIX, he found topas insufficient for really seeing what was going on. He worte nmon to allow him to see more information about what was going on in the OS.

When he originally published it at IBM developerWorks it was originally available only as a binary for AIX. As with all developerWorks projects, it came with no support. nmon worked by accessing hooks in the AIX kernel to access performance metrics.

There was a desire for nmon to work on Linux (by admins that lived in a mixed OS world). To enable nmon to work with Linux, all of the information acquisition code had to change. Fortunately, most of the needed data was available in /proc on a Linux system. Once Nigel added this support he released both AIX and Linux binaries of nmon.

In 2009, the source code was released and the Linux nmon project was formed. This is an open source project that has continued the further development on nmon.


The Technology of nmon

With over 10 years of development, nmon is very stable. nmon has two distinct modes: interactive and capture. These two modes make nmon a very powerful tool in the arsenal of a 'nix admin.


In interactive mode, nmon uses curses to display a screen that will remind you of Linux top. Various keystrokes alter the display. A ? or h can be used to get a list of the various commands. These keystrokes mostly work as toggles (press once to enable, a second time to disable). Think of the nmon screen as porthole into the OS and the keystrokes as choosing what things are visible. Take note that case matters.

        h   = Online help information
        r   = Machine type, machine name, cache details and OS version + LPAR
        c   = CPU by processor stats with bar graphs
        l   = long term CPU (over 75 snapshots) with bar graphs
        m   = Memory stats
        V   = Virtual Memory and Swap stats
        k   = Kernel Internal stats
        n   = Network stats and errors
        N   = NFS Network File System
        d   = Disk I/O Graphs
        D   = Disk I/O Stats
        o   = Disk I/O Map (one character per disk showing how busy it is)
        b   = black and white mode (or use -b option)
        .   = minimum mode i.e. only busy disks and processes

        key --- Other Controls ---
        +   = double the screen refresh time
        -   = halves the screen refresh time
        q   = quit (also x, e or control-C)
        0   = reset peak counts to zero (peak = ">")
        space = refresh screen now

When you start nmon up it displays a mostly blank screen (unless you passed it switches or set up defaults). Most nmon users get in the habit of hitting certain keys as soon as they start nmon (I press ckmn).

Pressing c shows a simple bar graph of how "busy" each core is as well as the globbed together system. Please note that it is possible for the sums to exceed 100%. The information is taken from /proc and the OS does not guarantee that these percentages are exact. You can see a longer term view of this information by pressing l. Pressing a 0 will reset the peaks.

Pressing the m key shows information about memory and swap usage. A lot of neat stuff can be found here if you are fighting a machine or an application that has memory issues.

The k key is my favorite. This tells nmon to show kernel related information. Here is where you find your normal uptime information. You also get RunQueue, ContextSwitch, Forks and Interrupts. RunQueue tells you how many processes were waiting for cpu over the last refresh interval (on average). As long as the RunQueue is not greater than the number of cores you have, the machine will feel "snappy." ContextSwitch tells you how many times the cpu was switched from one process to another. I feel like this gives me a real feel for how busy the machine is. Forks tells you how many processes were forked in the last time period. Interrupts tells you how many interrupts were handled by the system in the last interval. I sometimes find these numbers scary...

The n key gives you some brief network information. You get receive and transmit information for each network interface.

The V key will give you lots of information about virtual memory and how it is behaving.

The r key will give you information about the machine and the OS.

The N key will provide information about NFS. If you have an NFS server that you are concerned about this may help.

The d and D keys are related. Both show you disk information. Lower case shows a graph and uppercase shows raw statistics. If you have a lot of disks this can be long. The . key causes the displays to only show disks that are doing something. The o key puts a single line display that shows busy percentage for the first 63 disks.

The b key sets the display to black and white (removes ansi color sequences).

The t key invokes something that looks similar to the view you are used to seeing in top. This view can be sorted 4 different ways: 1 is the Basic display, 3 sorts by cpu usage, 4 sorts by memory usage and 5 sorts by number of I/O operations.

With all of these different keystrokes you can dynamically change your view until you see what you are looking for. As you can see, nmon provides a lot of potential information in one interface.


The interactive mode is neat. In fact many people are content to use nmon in only this way. I believe that thing that really makes nmon a top notch tools is that it can be run in capture mode. In this mode, nmon does not display to the screen. Instead it writes performance information out to a text file.

By default the data is written out in comma separated value format. This could be loaded into a spread sheet. You can specify a number of parmeters like interval, number of snapshots, filename and disk related limits.

The output can seem big. In capture mode, my default is to take a snapshot every 15 minutes for a day. I then start a new file each day. With these settings, my output files are 50-60k a day (they do compress really nicely).

What do you do with these files? You can pull it up in less and just browse. Maybe something will catch your eye. You can load it into a spreadsheet for a more structured view. There is even a set of Excel macros that can graph the data. I find myself grepping a certain type of info and looking at its trend.

Anything more than this requires work though.


nmon Addons

Like most useful tools, nmon has grown a number of addons that make it even more powerful. Most are oriented around what to do with the captured data.


Addon: nmon Analyser

nmon Analyser is an Excel spreadsheet with macros that can take the nmon capture files and display graphs. You have to use Windows Excel and you have to enable all macros (have fun).

In spite of these drawbacks, this can be a very useful tool.


Addon: nmonVisualizer

nmonVisualizer is a Java application that does about the same thing the Execl spreadsheet does except that it works across platforms and does not require Excel. It is not beautiful, but it works. For dealing with a single machine at a time this may be all that you need.
Try This one


Addon: Java nmon Analyzer

This is another Java program that will display graphs of nmon data. This one is a little prettier than nmonVisualizer. It is slower and shows less data.
Try This one


Addon: pyNmonAnalyzer

nmon producs a csv file that is a little painful to load directly into a spreadsheet. pyNmonAnalyzer was designed to prepocess the nmon file to make it easier to load and make sense of. It also claims to build static html based reports with graphs from the data.

This one does not produce the most beautiful output but it is the only one I have found that lends itself to batch processing.

It takes a few steps to install all the parts for this package to work. See the HPC nmon archive section below.


Addon:


Addon:


Addon: HPC nmon Archive

The goal was to create a central place to archive nmon data for all machines. In addition to this the dream included automated generation of graphs fro the raw data. Here was the basic plan:


Addon: HPC nmon Archive - Client Install

Installing nmon was not hard but led to a little work. It turns out that I have 32 and 64 bit systems ranging from RHEL/CentOS 4 through 7. This meant I had to use cat /etc/redhat-release and uname -a to determine which nmon binary to install (older OSes needed nmon compiled without some modern librarires).

I chose to install nmon in /usr/local/bin and to leave the executable named the way it was from the download site. In the same directory I created a symlink to it called nmon (to make invoking it consistent across all machines).

I chose to store the log files (nmon data) in /var/log/nmon. Since I planned to run nmon as nobody, I had to make nobody the owner of this directory. The goal was that every day a new file would be created here. The old file(s) would be uploaded to the archive server and then compressed. I probably should add some code to delete them after N days. The script that I use to do this is called log-nmon.bash (download).

Since I wanted to run this each night just after midnight, putting a link in /etc/cron.daily was not acceptable (since it runs about 4:00 AM). I choose to create the file /etc/cron.d/nmon-script that contains the following line:

0 0 * * * nobody /usr/local/bin/log-nmon.bash

To make sure all permissions were fine, I used this line to test log-nmon.bash:

su -s /bin/bash -c '/usr/local/bin/log-nmon.bash' nobody


Addon: HPC nmon Archive - Archive Server Install

Setting up the server took a number of steps:

  1. install pyNmonAnalyzer
  2. create nmon user
  3. setup/configure rsync daemon
  4. create cname and configure apache
  5. create cron script to autogenerate html for "new" data


1) Install pyNmonAnalyzer

Since pyNmonANalyzer is python, it meant that it was not going to be trivial. After quite a bit of experimenting, I discovered that the next four steps will install pyNmonAnalyser on a CentOS 6.x system:

  1. yum install python-pip
  2. pip install pyNmonAnalyzer
  3. yum install python-argparse
  4. yum install python-ipython

This could be the end of this step. Instead I tweaked pyNmonAnalyzer to create additional graphs. I had to edit to python scripts and add some code to do this. pip installed pyNmonAnalyzer in /usr/lib/python2.6/site-packages/pynmonanalyzer/. I updated the following files:

You can search for IWT to find my changes/alterations.

After making changes, I need to compile these scripts. I used the following commands:

python
> import py_compile
> py_compile.compile('pyNmonPlotter.py')
> py_compile.compile('pyNmonAnalyzer.py')
> <ctrl-d>


2) Create nmon user

The first step was to use the adduser command to add a user called nmon. In the users account, I created two subdirectories:


3) Setup/configure rsync Daemon

The server I was using already had rsync running as a daemon. The goal was to provide a unique writable place for each node to dump their data. I wound up writing a script to gernerate the /etc/rsyncd.conf that used a stub sample file and a data file listing all hosts that need to be processed (actually the code does stanzas for nmon and for rpms [a different talk]). Here is the script, its stub and sample data file:

The process of adding a new node is as follows:

The last step actually does something also. Along with generating the /etc/rsyncd.conf file, it also creates the place for the node to upload its files with appropriate ownership. Also nootice that the conf file only allows nodes to upload files. They cannot list what is there or pull files down.


4) Create User and Configure Apache

The next step was to create the cname nmon.hpc.lsu.edu pointing to the server. Contact your local TSP for help.

Then apache need to be configured. I added the following text to the /etc/httpd/conf.d/virtual-hosts.conf file:

<VirtualHost 204.90.43.213:80>
  ServerName nmon.hpc.lsu.edu
  DocumentRoot /home/nmon/public_html

  <Directory /home/nmon/pulic_html>
    Options FollowSymLinks
  </Directory>
</VirtualHost>


5) Create cron script to autogenerate html

I needed a script that would invoke pyNmonAnalyzer in just the perfect way and that would process all "new" data files that showed up. Basically the script walks the data directory. When it finds a data file it checks to see if the corresponding html stuff exists. If it does not exist, it is created.

This gets a little complicated in that pyNmonAnalyzer created an html file and a directory with all the corresponding graphs (as image files) in it. Each time it is run, I then rename stuff and move it to the correct place. The magic script below is /root/bin/nmon-make-plot.bash and has a symbolic link in /etc/cron.daily to automatically execute it each day.

nmon-make-plot.bash (download)

Unfortunately, there is still one more step. pyNmonAnalyzer processes nmon files and uses report.confg to tell it what to do. This means each node needs a report.config file. Even more annoying, it is not "easy" to generically create. I find myself working from a base file and editing each one manually.

The file includes graph names and the devices to plot. On the ethernet graphs you need to list eth0 or eth1 or whatever devices you wish. You can figure this out by doing an ifconfig on the node in question.

The disk graphs are more complicated. nmon will autodetect may devices. For instance you might have a sda1 (/boot) and a sda2 (physical volume containing one or more logical volumes). These logical volumes are seen as device mapper disks (say dm-0 is / and dm-1 is /home). By default nmon logs I/O info for sda1, sda2, dm-0 and dm-1. Plotting sda2 is meaningless (it is the union of dm-0 and dm-1 traffic).

You can sort this out by grep DISK nmon.txt | head. This will give you a list of all devices that nmon is saving data on. Now you can use fdisk -l and dmsetup ls to deduce which partitions you care about. You can then update the report.config appropriately.

Here is a sample report.configfile:

CPU_ALL=user,sys,wait{stackedGraph: true, fillGraph: true}
DISKBUSY=dm-0,dm-1,dm-2{}
DISKREAD=dm-0,dm-1,dm-2{}
DISKWRITE=dm-0,dm-1,dm-2{}
DISKXFER=dm-0,dm-1,dm-2{}
DISKBSIZE=dm-0,dm-1,dm-2{}
MEM=memtotal,active{}
NET=eth1{}
NETPACKET=eth1{}

Please feel free to visit nmon.hpc.lsu.edu to see my current instance of the dream this section outlines.


Links

Here are some useful links: