Friday, 11 July 2014

Part4 :BigData Cluster Monitoring (Custom Script for Nagios monitoring with Graph)

Custom Script for Nagios monitoring with Graph (NagiosGraph & Pnp4Nagios)

Nagios provide some default plug-in (scripts), few of this plug-in generate performance data format supported by NagiosGraph and Pnp4Nagios by default (like Ping, Current User). But sometime you may need to write your own custom plugin in production based on your requirement and represent the output in Nagios Web frontend as well in graphs (NagiosGraph or Pnp4Nagios).
I will show you how to write a custom plug-in for nagios and represent the data through graph.
  1. Memory analysis plug-in to be shown in NagiosGraph
  2. Log File checking pugin to be shown in Pnp4Nagios

Memory analysis Plugin: This a shell scripts that will capture the free memory, used memory and % of memory used.
Create a shell script in nagios plug-in directory
sudo vi check_overall_mem.sh
TODO: put the script or link to github

Now give executable permission with chmod +x check_overall_mem.sh

Now define the command for this script in any config file of nagios. We will use a config file name mixed.config

# check Memory Process
define command {
command_name check_overall_mem
command_line /usr/lib/nagios/plugins/check_overall_mem.sh
}

Now use this command in your host config file as shown below:
define service{
        use                             generic-service,nagiosgraph         ; Name of service template to use
        host_name                       localhost
        service_description             Overall-Memory
        check_command                   check_overall_mem
        }

Modifying your NagiosGraph Map file to get the performance data of this plugin and show the graph.
Edit Map file of NagiosGraph located at /usr/local/nagiosgraph/etc
sudo vi /usr/local/nagiosgraph/map

And add the following rule for our memory plugin:
#Memory Usage Custom
(/output:MEMALL (\d+) percentage (\d+) used (\d+) free/
and push @s, ['MemAll',
        ['Percentage', GAUGE, $1],
        ['Used', GAUGE, $2],
        ['Total', GAUGE, $3] ]);

Now restart Nagios and Go to the service tab and click the graph icon to open a new tab for graph. Finally you will see the memory graph as shown below:

Log File checking pug-in: This shell script will search for particular pattern (passed as argument) in a log file and show the total number of match count after last run along with the last matched search line. This script also searches within the archived log. We will be using this script to find ERROR messages in the log file.

Create a shell script in nagios plug-in directory
sudo vi check_logging.sh
TODO: put the script or link to github

Now give executable permission with chmod +x check_logging.sh

Now define the command for this script in any config file of nagios. We will use a config file name mixed.config

#Check Log file
define command {
command_name check_logging
command_line /usr/lib/nagios/plugins/check_logging.sh /home/kuntal/Kuntal/test.log ERROR /home/kuntal/Kuntal/linecount /home/kuntal/Kuntal/countercount
}
Note: Please go through the scripts for find the arguments required to pass.

Now use this command in your host config file as shown below:
define service{
        use                             generic-service,srv-pnp         ; Name of service template to use
        host_name                       localhost
        service_description             Check Logging
                check_command                   check_logging
        }

Now you need to create a command file in the check_commands directory of Pnp4Nagios

sudo vi /etc/pnp4nagios/check_commands/check_logging
#
# Adapt the Template if check_command should not be the PNP Template
#
# check_command check_nrpe!check_disk!20%!10%
# ________0__________|          |      |  |
# ________1_____________________|      |  |
# ________2____________________________|  |
# ________3_______________________________|
#
CUSTOM_TEMPLATE = 0
#
# Change the RRD Datatype based on the check_command Name.
# Defaults to GAUGE.
#
# DATATYPE = COUNTER

Now create a custom Php template for your logging command:
sudo vi /etc/pnp4nagios/templates/check_logging.php

<?php

$ds_name[1] = "$NAGIOS_AUTH_SERVICEDESC";
$opt[1] = "--vertical-label \"$UNIT[1]\" --title \"$hostname / $servicedesc\" ";

$def[1]  = rrd::def("var1", $RRDFILE[1], $DS[1], "AVERAGE");

if ($WARN[1] != "") {
    $def[1] .= "HRULE:$WARN[1]#FFFF00 ";
}
if ($CRIT[1] != "") {
    $def[1] .= "HRULE:$CRIT[1]#FF0000 ";      
}
$def[1] .= rrd::line1("var1", "#99ccff", "$NAME[1]") ;
$def[1] .= rrd::gprint("var1", array("LAST", "AVERAGE", "MAX"), "%6.2lf");

?>

Note: The command file in check_commands directory and Cusutom Php template in templates directory of pnp4nagios shouls have exact name that matches with the command name of nagios for which you want to create the graph.Also note that pnp4nagios required performance data in particular format specified by Nagios for plugin as shown below:
In short Nagios Plugin Script output syntax must be as follows::
TEXT_OUTPUT_SEEN_ON_NAGIOS_WEB | label=value[UOM];[warn];[crit];[min];[max]


Custom metric with Ganglia
Custom metric can be send to Ganglia through gmetric utility.

For example we can pass the number of current user on the host machine as follows:

Move to gmetric directory ( you can find gmetric at /usr/bin on ubuntu)
cd /usr/bin

Now execute
./gmetric --name Current_User --value ‘who |wc –l’ --type int32

Here,
name => Name of the graph
value=> Value to be shown in the graph
unit=> unit representation of the value on the graph

You can create script and execute it that will push custom Current_User metric to Ganglia after every 10 seconds.


while true;
do /usr/bin/gmetric --name Current_User --value ‘who |wc –l’ --type int32;
sleep 10; done

Just like Current_User count custom metric, you can pass any other custom metric to ganglia (memory analysis, Log anaylysis-error count etc.) using the above syntax and gmetric utility.

You will now see the graph for Current_User on Ganglia as shown below:

  
Common issues:
1) Pnp4Nagios XML file not found Error – No Graphs

The problem:

Let’s say we have written a custom Nagios Plugin Bash Script and configured Nagios Server combined with Pnp4Nagios to create graphs. When we try to open Pnp4Nagios performance data graphs for custom Nagios Plugin we get “Pnp4Nagios XML file not found. Read FAQ online” error:

The cause:

To get Pnp4Nagios performance data graph every Nagios Plugin Script must put out the correct output which is correctly understood by Pnp4Nagios. You can read more about this HERE andHERE. So the cause for “Pnp4Nagios XML file not found Error” is wrong Nagios Plugin Script output.
In short Nagios Plugin Script output syntax must be as follows::
TEXT_OUTPUT_SEEN_ON_NAGIOS_WEB | label=value[UOM];[warn];[crit];[min];[max]
All of the data after the pipe (|) will be hidden in Nagios web GUI. This data is only needed for Pnp4Nagios performance data graphs and is not visible to Nagios web GUI users. However this data will be visible if you will run Nagios Plugin Bash script locally on the Nagios Client or Nagios Server from command line. Note warn,critical,min,max are optional.For details please refer to:

Check the above check logging script output format above for correct format:
Earlier the Script was producing Output like:

Modified to be used for PNP4Nagios & Output like:

2) PHP Warning:  include_once(./version.php): failed to open stream...
If you get something like this
            PHP Warning:  include_once(./version.php): failed to open stream: No such file or directory in /var/www/html/ganglia/conf.php on line 7

while configuring ganglia metrics with nagios and testing the php scripts,then      
Modify your /usr/share/ganglia-webfrontend/conf.php to include full path to the version.php include ie.
include_once "/usr/share/ganglia-webfrontend/version.php";

3) Nagios unable to send external Command:
Problem: When you try  to Re-schedule the next check of this service from Nagios web UI.
Solution:
sudo dpkg-statoverride --update --add nagios www-data 2710 /var/lib/nagios3/rw
sudo dpkg-statoverride --update --add nagios nagios 751 /var/lib/nagios3
sudo vi /etc/nagios3/nagios.cfg
update check_external_commands=0 to 1

sudo /etc/init.d/nagios3 restart

1 comment:

  1. Thank you! This was so helpful. The step by step approach you took made trying to replicate this with my variables very understandable.


    Best Java Training in Chennai

    ReplyDelete