Writing custom nagios check scripts (plugins)

This document is the third in my series on nagios. In the first one I discussed what Nagios is, the second one discussed the installation of nagios. This one will discuss writing custom check scripts (plugins). Finally a fourth is a howto on creating event handlers.

One of the beauties of Nagios, is that there is a large amount of freely available plugins available online, http://exchange.nagios.org/directory/Plugins

However in the unlikely event of you not being able to find what you require, it is quite easy to write your own custom plugins.

In the howto I will show you how to write a plugin to monitor the number of huge_pages in use on your system, so as to alert with a warning once a certain threshold is breached, followed by a critical alert once a higher threshold is reached. These thresholds will be passed into the script as parameters, so that you can re-use this plugin across multiple systems with differing threshold requirements.

First I will show you the whole script, and then afterwards explain its operation:

#!/bin/ksh
#
# Author: Matthew Harman
# Date: 08/03/12
# Purpose: Checks the hugepages in use are within limits
#

PROGNAME=`basename $0`
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`
REVISION=`echo '$Revision: 1.00 $' | sed -e 's/[^0-9.]//g'`

. $PROGPATH/utils.sh

print_usage() {
	echo "Usage: check_huge_pages  "
	echo "e.g. check_huge_pages 80 90"
	} 

# Make sure the correct number of command line
# arguments have been supplied

if [ $# -lt 2 ]; then
    print_usage
    exit $STATE_UNKNOWN
fi

TOTALPAGES=`cat /proc/meminfo |grep "HugePages_Total"|awk -F":" '{print $2}'`
FREEPAGES=`cat /proc/meminfo |grep "HugePages_Free"|awk -F":" '{print $2}'`

# Work out percentage
USED_SPACE=`echo "scale=0; (${TOTALPAGES}-${FREEPAGES})*100/${TOTALPAGES}"|/usr/bin/bc`

if [ $USED_SPACE -gt $2 ] ; then
   echo "Huge Pages free: $FREEPAGES critical ${USED_SPACE}% used"
   exit $STATE_CRITICAL
elif [ $USED_SPACE -gt $1 ] ; then
   echo "Huge Pages free: $FREEPAGES warning ${USED_SPACE}% used"
   exit $STATE_WARNING
else
   echo "Huge Pages free: $FREEPAGES ok ${USED_SPACE}% used"
   exit $STATE_OK
fi

If we now look down the script from the top, you will see we first define information about the script and a call to utils.sh – this script sets us standard nagios variables such as $STATE_CRITICAL, $STATE_WARNING and $STATE_OK which we will use as return values later in the script. We then define a usage section, this explains the operation of the script along with what parameters we are expecting the script to be called with, i.e. it is check_huge_pages followed by the warning percentage (as a number), followed by the critical percentage (again as a number), i.e.

./check_huge_pages 80 90

Next we check how many parameters have been passed to the script, and if it is less than two we raise an error as we will not know how to operate the script correctly, without having 2 parameters! Should this happen, we will output the usage message so that the user knows how to correct things!

We are now onto the actual workings of the script. We first work out the value for the total number of pages, along with the number of free pages. Next we work out the used space by doing a calculation with the Total Pages and the Free pages. This gives us a number that represents the amount of huge pages we have used as a percentage, i.e. in the same format as the parameters being passed to the script.

Finally we compare our calculated value with the parameters being passed, first checking if we are over the critical value, if so, $STATE_CRITICAL is returned. Next we check the warning value and return $STATE_WARNING if we are over, otherwise a $STATE_OK is returned.

So that is the main nagios check script defined. Place this script file into the libexec directory of the nagios installation and make sure that it has executable permission for the nagios user.

The nagios client file (nrpe.cfg).

We can now move onto the section in the nrpe.cfg file in the nagios etc directory on the client machine, i.e. the machine being monitored. The file should already exist, so we can append the following line to the end of the file:

command[check_huge_pages]=/usr/local/nagios/libexec/check_huge_pages 80 90

Notice here we are making a call the script we have just created, change the path if nagios is installed in a different location on your system. Also note the values 80 and 90 at the end of the script, these are the parameters being passed to the script, i.e. it will warn at 80% and go critical at 90% used. Change these values to suitable figures to suit your environment.

This completes the configuration on the client end, so we can run three tests:

First run the script from the command line as the nagios user, passing in a suitable values, so as to get a state ok response, e.g. if your huge pages in use is 40% at present run:

/usr/local/nagios/libexec/check_huge_pages 50 60
0

As you can see the script returned a value of 0, i.e. $STATE_OK

We can now repeat the test to generate a warning response:

/usr/local/nagios/libexec/check_huge_pages 30 50
1

As you can see the script returned a value of 1, i.e. $STATE_WARNING

Finally lets test to generate a critical value:

/usr/local/nagios/libexec/check_huge_pages 30 35
2

As you can see the script returned a value of 2, i.e. $STATE_CRITICAL

Server side configuration file:

We can now move to the nagios server installation and modify the file that defines the services for the machine we are monitoring. Add the following lines to the relevant client file:

define service{
	use generic-service
	host_name your_host_name
	service_description Huge Page usage
	check_command check_nrpe!check_huge_pages
}

Repace your_host_name with the correct host name.

Finally run the nagios command to check the config files and if this reports 0 errors, restart nagios.

You should now find that the huge_pages is being monitored correctly through nagios.

That concludes the writing of custom nagios check scripts – as you can see it really is quite a simple task!

If you have found this article useful, please consider making a microdonation.

Matthew Harman

Matthew’s day job sees him running the Oracle and Server Team for the European side of a multi-national Automotive Outsourcing company headquartered in Warren, Michigan.

Matthew is a keen open-source advocate and is always keen to help others, through blogging, etc..

Matthew is always keen to connect via linkedin with similar minded professionals.

LinkedIn 


All rights reserved ©

2 Responses to Writing custom nagios check scripts (plugins)

  1. Pingback: Prevention is better than cure | Tuxers

  2. Pingback: Nagios installation on OpenSuse | Tuxers

Leave a Reply

Your email address will not be published. Required fields are marked *

*


+ 6 = 14

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="">