Friday 11 July 2014

Part1 :BigData Cluster Monitoring ( Nagios and Ganglia set up)

 Nagios Overview
Nagios is a powerful monitoring system that enables organizations to identify and resolve IT infrastructure problems before they affect critical business processes.
Designed with scalability and flexibility in mind, Nagios gives you the peace of mind that comes from knowing your organization's business processes won't be affected by unknown outages.
Nagios is a powerful tool that provides you with instant awareness of your organization's mission-critical IT infrastructure. Nagios allows you to detect and repair problems and mitigate future issues before they affect end-users and customers.

What Nagios Provides

By using Nagios, you can:
·         Plan for infrastructure upgrades before outdated systems cause failures
·         Respond to issues at the first sign of a problem
·         Automatically fix problems when they are detected
·         Coordinate technical team responses
·         Ensure your organization's SLAs are being met
·         Ensure IT infrastructure outages have a minimal effect on your organization's bottom line
·         Monitor your entire infrastructure and business processes

How It Works

Monitoring

IT staff configure Nagios to monitor critical IT infrastructure components, including system metrics, network protocols, applications, services, servers, and network infrastructure.

Alerting

Nagios sends alerts when critical infrastructure components fail and recover, providing administrators with notice of important events. Alerts can be delivered via email, SMS, or custom script.

Response

IT staff can acknowledge alerts and begin resolving outages and investigating security alerts immediately. Alerts can be escalated to different groups if alerts are not acknowledged in a timely manner.

Reporting

Reports provide a historical record of outages, events, notifications, and alert response for later review. Availability reports help ensure your SLAs are being met.

Maintenance

Scheduled downtime prevents alerts during scheduled maintenance and upgrade windows.

Planning

Trending and capacity planning graphs and reports allow you to identify necessary infrastructure upgrades before failures occur.


Ganglia Overview

What is Ganglia:
In a simple manner, “Ganglia is a real time cluster monitoring tool that collects information from each computers in the cluster and provides and interactive way to view the performance of computers and cluster a whole.”

·         It is a highly scalable monitoring system for high performance computing.
·         It can monitor a system or clusters of systems or grid of clusters.
·         It uses the XML technology for data representation.
·         It uses the RRDtool for the data storage and visualization..
·         The implementation of ganglia is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world.
·         It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.

Like other monitoring tool ganglia only provide a way to view but not control the performance of the cluster.

Architecture of Ganglia:
The Ganglia system consists of, two daemons gmond and gmetad, a PHP based web frontend, and two other utilities gmetric and gstat.

Gmond:
 Gmond runs on every node of the cluster and gather the information like CPU, memory, network, disk, swap etc.

Gmetad:
Gmetad runs on head node. It gathers data from all other nodes and stores them in round robin database. It can poll multiple clusters and aggregate the metrics. It is also used by the web frontend in generating the UI.

PHP Web Frontend:
 The Ganglia web front-end provides a view of the gathered information via real-time dynamic web pages. Most importantly, it displays Ganglia data in a meaningful way for system administrators and computer users. It should be installed on the same machine where gmetad is installed.


III) Nagios Set up

First on monitoring server install the following:
sudo apt-get install nagios3 nagios-nrpe-plugin

You will be asked to enter a password for the nagiosadmin user. The user's credentials are stored in/etc/nagios3/htpasswd.users.
To change the password for the nagiosadmin user enter:
sudo htpasswd /etc/nagios3/htpasswd.users nagiosadmin
To add a user:
sudo htpasswd /etc/nagios3/htpasswd.users steve
Configuration Overview
There are a couple of directories containing Nagios configuration and check files.
/etc/nagios3: contains configuration files for the operation of the nagios daemon, CGI files, hosts, etc.
/etc/nagios-plugins: houses configuration files for the service checks.
/etc/nagios: on the remote host contains the nagios-nrpe-server configuration files.
/usr/lib/nagios/plugins/: where the check binaries are stored.

Follow the steps to configure Remote Host with nrpe in Ubuntu systems:
sudo apt-get install openssl nagios-nrpe-server nagios-plugins nagios-plugins-basic nagios-plugins-standard
Then configure the nrpe config file with the plugin and performance data you want to send to monitoring server.
sudo vi /etc/nagios/nrpe.cfg
Restart NRPE
sudo /etc/init.d/nagios-nrpe-server restart

Nagios Email
sudo apt-get install sendemail

VI) Ganglia Setup

Installation of ganglia on master node:
sudo apt-get install ganglia-monitor rrdtool gmetad ganglia-webfrontend
The above command will install the gmond, gmetad and ganglia web UI on the node. The ganglia web frontend package also installs the required apache server and php modules. In order to deploy and run Ganglia in Apache server, it is required to copy the apache.conf file from /etc/ganglia-webfrontend/apache.conf to /etc/apache2/sites-enabled/:
sudo cp /etc/ganglia-webfrontend/apache.conf /etc/apache2/sites-enabled/ganglia.conf

The /etc/ganglia-webfrontend/apache.conf contains a simple alias for /ganglia towards /usr/share/ganglia-webfrontend.

Installation of ganglia on other nodes:
sudo  apt-get install ganglia-monitor
The above command will install the ganglia monitor.

Gmond configuration on master node:
 There are two type of configuration ganglia supports, one is multicast and other is unicast. Here I am taking an example of a cluster to configure the ganglia in unicast mode. I have a cluster named “ARC” with the 10.10.87.90 as a master node and 10.10.87.91 as slave nodes.

 globals {                  
  daemonize = yes            
  setuid = yes           
  user = ganglia            
  debug_level = 0             
  max_udp_msg_len = 1472      
  mute = no            
  deaf = no        
  allow_extra_data = yes 
  host_dmax = 0 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no           
  send_metadata_interval = 30                                       
}
cluster {
  name =”ARC”
  owner = “BigDataOwner”
  latlong = “unspecified”
  url = "unspecified"
}

udp_send_channel {
  host = 10.10.87.90
  port = 8649
  ttl = 1
}
udp_recv_channel {
  port = 8649
}
tcp_accept_channel {
  port = 8649
}


Gmond configuration on other nodes (Slaves):
globals {                  
  daemonize = yes            
  setuid = yes           
  user = ganglia            
  debug_level = 0             
  max_udp_msg_len = 1472      
  mute = no            
  deaf = no        
  allow_extra_data = yes 
  host_dmax = 0 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no           
  send_metadata_interval = 30
}


cluster {
 name =”ARC”
  owner = “BigDataOwner”
  latlong = "unspecified"
  url = "unspecified"
}
udp_send_channel {
   # mcast_join = 239.2.11.71
  host = 10.10.87.90
  port = 8649
  ttl = 1
}
tcp_accept_channel {
  port = 8649
}


Gmetad Configuration:

 data_source "ARC" 15 10.10.87.90:8649

The gmetad configuration defines the data source configuration with cluster name, pooling interval and the gmond running ip and port. In data source configuration“ARC” is the cluster name, 15 is the gmetad polling interval for metrics and “10.10.87.90:8649” is the gmond ip and port of head node.

40 comments:

  1. Thank you for the useful article. It has helped a lot in training my students. Keep writing more.
    big data and hadoop
    training in Chennai

    ReplyDelete
  2. With smartphone and other handheld devices are mostly used for accessing internet, it is important to have responsive website for your business that go comfy on all devices with dissimilar screen resolution. You can check my PHP Training site for more details.

    ReplyDelete
  3. Thank you for sharing more valuable information on nagios to learn more about this check it once at Devops Online Training Bangalore.

    ReplyDelete
  4. Nice tutorial. Thanks for sharing the valuable information. it’s really helpful. Who want to learn this blog most helpful. Keep sharing on updated tutorials…
    python training in chennai | python training in bangalore

    python online training | python training in pune

    python training in chennai | python training in bangalore

    python training in tambaram |

    ReplyDelete
  5. It seems you are so busy in last month. The detail you shared about your work and it is really impressive that's why i am waiting for your post because i get the new ideas over here and you really write so well.

    Hadoop Training in Chennai

    Hadoop Training in Bangalore

    Big data training in tambaram

    Big data training in Sholinganallur

    Big data training in annanagar

    Big data training in Velachery

    ReplyDelete
  6. Nice post. By reading your blog, i get inspired and this provides some useful information. Thank you for posting this exclusive post for our vision. 


    rpa training in Chennai

    rpa training in pune

    rpa online training

    rpa training in bangalore

    rpa training in Chennai

    rpa training in Chennai

    rpa training in velachery

    rpa training in tambaram

    ReplyDelete
  7. The knowledge of technology you have been sharing thorough this post is very much helpful to develop new idea. here by i also want to share this.
    Devops Training in pune|Devops training in tambaram|Devops training in velachery|Devops training in annanagar
    DevOps online Training

    ReplyDelete
  8. I am a regular reader of your blog and being students it is great to read that your responsibilities have not prevented you from continuing your study and other activities. Love
    python training institute in chennai
    python training in velachery
    python training institute in chennai

    ReplyDelete
  9. This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me.. 
    java training in omr | oracle training in chennai

    java training in annanagar | java training in chennai

    ReplyDelete
  10. Really you have done great job,There are may person searching about that now they will find enough resources by your post
    DevOps online Training

    ReplyDelete
  11. I found your blog while searching for the updates, I am happy to be here. Very useful content and also easily understandable providing.. Believe me I did wrote an post about tutorials for beginners with reference of your blog. 


    angularjs Training in chennai
    angularjs Training in chennai

    angularjs-Training in tambaram

    angularjs-Training in sholinganallur

    angularjs-Training in velachery

    ReplyDelete
  12. Hello! Someone in my Facebook group shared this website with us, so I came to give it a look. I’m enjoying the information.
    safety courses in chennai

    ReplyDelete
  13. Really I Appreciate The Effort You Made To Share The Knowledge. This Is Really A Great Stuff For Sharing. Keep It Up . Thanks For Sharing.

    Oracle PLSQL Training in Chennai
    Oracle PLSQL Training

    ReplyDelete
  14. I get so much lately it’s driving me insane, so any assistance is very much appreciated.
    safety course in chennai

    ReplyDelete
  15. Thanks for the good words! Really appreciated. Great post. I’ve been commenting a lot on a few blogs recently, but I hadn’t thought about my approach until you brought it up. 

    angularjs online training

    apache spark online training

    informatica mdm online training

    devops online training

    aws online training

    ReplyDelete
  16. Your info is really amazing with impressive content..Excellent blog with informative concept. Really I feel happy to see this useful blog, Thanks for sharing such a nice blog..
    If you are looking for any Big data Hadoop Related information please visit our website Hadoop Training Pune page!

    ReplyDelete
  17. Nice Article.very impressed for this informative
    artificial intelligence course

    ReplyDelete
  18. Excellent Blog! I would like to thank for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well. I wanted to thank you for this websites! Thanks for sharing. Great websites!

    artificial intelligence course in mumbai

    ReplyDelete
  19. This is an awesome post. Really very informative and creative contents. oracle training in chennai

    ReplyDelete



  20. Great to become visiting your weblog once more, it has been a very long time for me. Pleasantly this article i've been sat tight for such a long time. I will require this post to add up to my task in the school, and it has identical subject along with your review. Much appreciated, great offer. data science course in nagpur

    ReplyDelete
  21. Get enrolled now for this best-rated Machine Learning Training in Hyderabad program by offered by AI Patasala & also avail free informative workshop for the same.
    Machine Learning Course Hyderabad

    ReplyDelete
  22. It is extremely nice to see the greatest details presented in an easy and understanding manner.
    data science training institute in hyderabad

    ReplyDelete
  23. I truly like your style of blogging. I added it to my preferred's blog webpage list and will return soon…
    data analytics courses in hyderabad with placements

    ReplyDelete
  24. It to quickly recover deleted files and recover empty files from the recycle bin. It allows users to recover data lost due to partition rearrangement.Easeus Data Recovery Activation Key 2022

    ReplyDelete
  25. Medical kind research growth without. Moment yet whole model forward four. Same yeah low thought before lead.religious

    ReplyDelete