Custom Script for Nagios
monitoring with Graph (NagiosGraph & Pnp4Nagios)
Nagios
provide some default plug-in (scripts), few of this plug-in generate
performance data format supported by NagiosGraph and Pnp4Nagios by default
(like Ping, Current User). But sometime you may need to write your own custom
plugin in production based on your requirement and represent the output in
Nagios Web frontend as well in graphs (NagiosGraph or Pnp4Nagios).
I will
show you how to write a custom plug-in for nagios and represent the data
through graph.
- Memory analysis plug-in to be shown in NagiosGraph
- Log File checking pugin to be shown in Pnp4Nagios
Memory
analysis Plugin: This a shell scripts that will capture the free memory, used
memory and % of memory used.
Create
a shell script in nagios plug-in directory
sudo vi
check_overall_mem.sh
TODO:
put the script or link to github
Now
give executable permission with chmod +x check_overall_mem.sh
Now
define the command for this script in any config file of nagios. We will use a
config file name mixed.config
# check
Memory Process
define
command {
command_name
check_overall_mem
command_line
/usr/lib/nagios/plugins/check_overall_mem.sh
}
Now use
this command in your host config file as shown below:
define
service{
use
generic-service,nagiosgraph
; Name of service template to use
host_name localhost
service_description Overall-Memory
check_command check_overall_mem
}
Modifying
your NagiosGraph Map file to get the performance data of this plugin and show
the graph.
Edit
Map file of NagiosGraph located at /usr/local/nagiosgraph/etc
sudo vi
/usr/local/nagiosgraph/map
And add the following rule for
our memory plugin:
#Memory
Usage Custom
(/output:MEMALL
(\d+) percentage (\d+) used (\d+) free/
and
push @s, ['MemAll',
['Percentage', GAUGE, $1],
['Used', GAUGE, $2],
['Total', GAUGE, $3] ]);
Now
restart Nagios and Go to the service tab and click the graph icon to open a new
tab for graph. Finally you will see the memory graph as shown below:
Log
File checking pug-in: This shell script will search for particular pattern
(passed as argument) in a log file and show the total number of match count
after last run along with the last matched search line. This script also
searches within the archived log. We will be using this script to find ERROR
messages in the log file.
Create
a shell script in nagios plug-in directory
sudo vi
check_logging.sh
TODO:
put the script or link to github
Now
give executable permission with chmod +x check_logging.sh
Now
define the command for this script in any config file of nagios. We will use a
config file name mixed.config
#Check
Log file
define
command {
command_name
check_logging
command_line
/usr/lib/nagios/plugins/check_logging.sh /home/kuntal/Kuntal/test.log ERROR
/home/kuntal/Kuntal/linecount /home/kuntal/Kuntal/countercount
}
Note:
Please go through the scripts for find the arguments required to pass.
Now use
this command in your host config file as shown below:
define
service{
use
generic-service,srv-pnp ;
Name of service template to use
host_name localhost
service_description Check Logging
check_command check_logging
}
Now you
need to create a command file in the check_commands directory of Pnp4Nagios
sudo vi
/etc/pnp4nagios/check_commands/check_logging
#
# Adapt
the Template if check_command should not be the PNP Template
#
#
check_command check_nrpe!check_disk!20%!10%
#
________0__________| | |
|
#
________1_____________________|
| |
#
________2____________________________| |
# ________3_______________________________|
#
CUSTOM_TEMPLATE
= 0
#
#
Change the RRD Datatype based on the check_command Name.
#
Defaults to GAUGE.
#
#
DATATYPE = COUNTER
Now
create a custom Php template for your logging command:
sudo vi
/etc/pnp4nagios/templates/check_logging.php
<?php
$ds_name[1]
= "$NAGIOS_AUTH_SERVICEDESC";
$opt[1]
= "--vertical-label \"$UNIT[1]\" --title \"$hostname /
$servicedesc\" ";
$def[1] = rrd::def("var1", $RRDFILE[1],
$DS[1], "AVERAGE");
if
($WARN[1] != "") {
$def[1] .= "HRULE:$WARN[1]#FFFF00
";
}
if
($CRIT[1] != "") {
$def[1] .= "HRULE:$CRIT[1]#FF0000
";
}
$def[1]
.= rrd::line1("var1", "#99ccff", "$NAME[1]") ;
$def[1]
.= rrd::gprint("var1", array("LAST", "AVERAGE",
"MAX"), "%6.2lf");
?>
Note:
The command file in check_commands directory and Cusutom Php template in
templates directory of pnp4nagios shouls have exact name that matches with the
command name of nagios for which you want to create the graph.Also note that
pnp4nagios required performance data in particular format specified by Nagios
for plugin as shown below:
In short Nagios Plugin
Script output syntax must be as follows::
TEXT_OUTPUT_SEEN_ON_NAGIOS_WEB |
label=value[UOM];[warn];[crit];[min];[max]
Custom metric with Ganglia
Custom
metric can be send to Ganglia through gmetric utility.
For
example we can pass the number of current user on the host machine as follows:
Move to
gmetric directory ( you can find gmetric at /usr/bin on ubuntu)
cd /usr/bin
Now
execute
./gmetric --name Current_User
--value ‘who |wc –l’ --type int32
Here,
name
=> Name of the graph
value=>
Value to be shown in the graph
unit=>
unit representation of the value on the graph
You can
create script and execute it that will push custom Current_User metric to
Ganglia after every 10 seconds.
while
true;
do
/usr/bin/gmetric --name Current_User --value ‘who |wc –l’ --type int32;
sleep
10; done
Just
like Current_User count custom metric, you can pass any other custom metric to
ganglia (memory analysis, Log anaylysis-error count etc.) using the above
syntax and gmetric utility.
You
will now see the graph for Current_User on Ganglia as shown below:
Common issues:
1) Pnp4Nagios XML file not found Error – No Graphs
The problem:
Let’s
say we have written a custom Nagios Plugin Bash Script and
configured Nagios Server combined with Pnp4Nagios to create graphs. When we try
to open Pnp4Nagios performance data graphs for custom Nagios Plugin we get
“Pnp4Nagios XML file not found. Read FAQ online” error:
The cause:
To get
Pnp4Nagios performance data graph every Nagios Plugin Script must put out the
correct output which is correctly understood by Pnp4Nagios. You can read more
about this HERE andHERE. So the cause for
“Pnp4Nagios XML file not found Error” is wrong Nagios Plugin Script output.
In short Nagios Plugin
Script output syntax must be as follows::
TEXT_OUTPUT_SEEN_ON_NAGIOS_WEB |
label=value[UOM];[warn];[crit];[min];[max]
All of the data after the pipe
(|) will be hidden in Nagios web GUI. This data is only needed for Pnp4Nagios
performance data graphs and is not visible to Nagios web GUI users. However
this data will be visible if you will run Nagios Plugin Bash script locally on
the Nagios Client or Nagios Server from command line. Note
warn,critical,min,max are optional.For details please refer to:
Check the above check logging
script output format above for correct format:
Earlier the Script was producing
Output like:
Modified to be used for
PNP4Nagios & Output like:
2) PHP Warning:
include_once(./version.php): failed to open stream...
If you get something like this
PHP Warning:
include_once(./version.php): failed to open stream: No such file or
directory in /var/www/html/ganglia/conf.php on line 7
while configuring ganglia metrics with nagios and
testing the php scripts,then
Modify your /usr/share/ganglia-webfrontend/conf.php
to include full path to the version.php include ie.
include_once
"/usr/share/ganglia-webfrontend/version.php";
3) Nagios unable to send external Command:
Problem: When you try to Re-schedule the
next check of this service from Nagios web UI.
Solution:
sudo dpkg-statoverride --update --add nagios
www-data 2710 /var/lib/nagios3/rw
sudo dpkg-statoverride --update --add nagios nagios
751 /var/lib/nagios3
sudo vi /etc/nagios3/nagios.cfg
update check_external_commands=0 to 1
sudo /etc/init.d/nagios3 restart
Thank you! This was so helpful. The step by step approach you took made trying to replicate this with my variables very understandable.
ReplyDeleteBest Java Training in Chennai