This document is the third in my series on nagios. In the first one I discussed what Nagios is, the second one discussed the installation of nagios. This one will discuss writing custom check scripts (plugins). Finally a fourth is a howto on creating event handlers.
One of the beauties of Nagios, is that there is a large amount of freely available plugins available online, http://exchange.nagios.org/directory/Plugins
However in the unlikely event of you not being able to find what you require, it is quite easy to write your own custom plugins.
In the howto I will show you how to write a plugin to monitor the number of huge_pages in use on your system, so as to alert with a warning once a certain threshold is breached, followed by a critical alert once a higher threshold is reached. These thresholds will be passed into the script as parameters, so that you can re-use this plugin across multiple systems with differing threshold requirements.
First I will show you the whole script, and then afterwards explain its operation:
This is the second post in the series on Nagios. In the first post, I explained what Nagios is.
This post goes through the installation and initial configuration. The third post in the series covers writing custom monitoring scripts; Finally a fourth post covers creating event handlers.
So here goes with the installation and configuration!
Before installing nagios we need to make sure the server has the following prerequisite packages installed:
C/C++ development libraries
Download Nagios from www.nagios.org/download
Click on the Get Nagios Core link
Download the latest stable release as a .tar.gz file and place it on the server, in a suitable directory, e.g. /tmp
For the purposes of this guide, I will download version 3.3.1
NOTE: The rest of the installation will continue as the root user.
Extract the file:
#> cd /tmp
#> tar -xvzf nagios-3.3.1.tar.gz
Enter the nagios directory:
#> cd nagios
Create a nagios user:
It is a fact of life that an unmanaged linux server, will eventually do something to bite you. I.e. it will generate sufficient logs to fill up a filesystem and thereby preventing your running application to continue working. Or a process will suddenly die and again your users are unable to work.
You can also guarantee that if this does not cause embarassment, i.e. with a system down during core business hours, it will happen at the most inconvenient time, i.e. in the very early hours of the morning, giving you a early morning wake-up call that is not exactly desired.
So what can you do to prevent this. The answer lies in automated server monitoring, these allow you to create a series of checks against each server, along with a set of thresholds, when checks are outside of the defined thresholds, the monitoring software can notify via email/SMS so that the fault can be fixed before it becomes a more serious problem with users noticing. Advanced configuration even allows event-handlers to be created, which can automate fixes saving you even touching the server.