EzDevInfo.com

nagios interview questions

Top nagios frequently asked interview questions

Nagios NRPE: Command not defined

In my nrpe_local.cfg added following command:

command[check_mycommand]=/usr/lib/nagios/plugins/check_command 30 35

and then restarted nrpe daemon.

When I execute this command using nrpe I'm getting the following error:

NRPE: Command 'check_mycommand' not defined

I used following command to execute:

/usr/lib/nagios/plugins/check_nrpe -H hostname -c check_mycommand

I am unable to get any clue.

In my nrpe_local.cfg there are 10 more commands added and they are working properly.

Source: (StackOverflow)

Nagios: CRITICAL - Socket timeout after 10 seconds

I've been running nagios for about two years, but recently this problem started appearing with one of my services.

I'm getting

CRITICAL - Socket timeout after 10 seconds

for a check_http -H my.host.com -f follow -u /abc/def check, which used to work fine. No other services are reporting this problem. The remote site is up and healthy, and I can do a wget http://my.host.com/abc/def from the nagios server, and it downloads the response just fine. Also, doing a check_http -H my.host.com -f follow works just fine, i.e. it's only when I use the -u argument that things break. I also tried passing it a different user agent string, no difference. I tried increasing the timeout, no luck. I tried with -v, but all it get is:

GET /abc/def HTTP/1.0
User-Agent: check_http/v1861 (nagios-plugins 1.4.11)
Connection: close
Host: my.host.com


CRITICAL - Socket timeout after 10 seconds

... which does not tell me what's going wrong.

Any ideas how I could resolve this?

Thanks!

Source: (StackOverflow)

How to print out Nagios Service UP Time Percentage from Nagios-Report Perl Module

I can print out Host UP Time percentage from Nagios-Report Perl Module with following code:

#!/usr/bin/perl
use strict ;
use Nagios::Report ;
my $x = Nagios::Report->new(q<local_cgi localhost nagiosadmin>)
  or die "Can't construct Nagios::Report object." ;
$x->mkreport(
                [ qw(HOST_NAME PERCENT_TOTAL_TIME_UP) ],

                sub {
                        my %F = @_; my $u = $F{PERCENT_TOTAL_TIME_UP}; $u =~ s/%//;
                    },
                        0,

                sub {
                        my $F = shift @_ ;
                }
) ;
$x->debug_dump ;

But How can I only print out Service UP Time Percentage? I mean only output the percentage value.

I tried many options but couldn't get it right.

Source: (StackOverflow)

Changing Process Name using Shell for nagios monitoring with check_procs

I have a python script to start a process which I want to monitor using Nagios. When I run that script and perform ps -ef on my ubuntu EC2 instance, it shows process as python <filename>.py --arguments. For Nagios to monitor that process using check_procs, we need to supply process name. Here process name becomes 'python'.

/usr/lib/nagios/plugins/check_procs -C python

It returns the output that one python process is running. This is fine when I'm running one python process. But If I'm running multiple python scripts and monitor only few, then I have to give that particular process name. If in the above command, I give python script name, it throws an error. So I want to mask whole python <filename>.py --arguments to some other name so that while performing check_procs, I can give that new name.

If anyone have any idea, please let me know. I have checked other stackoverflow questions which suggest changing python process name using setproctitle but I want to perform it using shell.

Regards,

Sanket

Source: (StackOverflow)

How can I install DBD::Pg if postgres is not installed?

I have a separate servers running with postgres and Nagios. I want to use "psql_replication_check.pl" with nagios to monitor the postgres replication status. This check script requires DBD::Pg module to connect to database. Installation of DBD::Pg asking for the path of pg_config file.

#perl Makefile.PL 
Configuring DBD::Pg 2.17.1
Path to pg_config?

I don't have permissions to install postgres on Nagios machine. Is there anyone who has fixed this issue before.

I have CentOS 5.4 on both systems.

Source: (StackOverflow)

Integrate different Nagios webservers

I have different sites running with 4 to 5 server at each location. All the locations have one monitoring server with Nagios. Now I want to create a central location and want to combine all the nagios services running at each location. Can anyone please point me to some documentation for these type of jobs.

Source: (StackOverflow)

SNMP OID for network traffic

i'm working on a script that will monitor traffic on specific hosts from nagios. I have studied some scripts already made and have gathered almost all the info i need to do it but i have encountered a problem in identifying the OID's necessary for the traffic. I wanted to use IF-MIB::ifOutOctets.1 and IF-MIB::ifInOctets.1 to get the incoming and outgoing traffic but when i tested with the following line:

snmpwalk -v 1 -c public myComputer OID

i got the same result for both the OID's and that doesn't seem right. I'm wandering if there are other variables i could try instead of those i'm using now.

It would be useful even if you can point me to where i could find some info on the IF-MIB, because i can get all the values with snmpwalk but i don't know how to interpret them

Source: (StackOverflow)

Equal strings are not equal in Perl

Please help me to understand one strange problem in equality of strings. This is the code I'm talking about:

my $test=undef;
foreach my $List (@o_descrL) {
  if (!($test)) {
    $test = defined($o_noreg)
      ? $descr_d eq $List
      : $descr_d =~ /$List/i;
      printf("$descr_d = $List\t\t==> $test\n");
   }
}

Unfortunately I didn't write it but I have to understand it. $List is always "SQL Server (C4)", $descr_d is changing according to actual item in array. Part of the printed output is here:

Power = SQL Server (C4)         ==>
SQL Server (C4) = SQL Server (C4)               ==>
SNMP Service = SQL Server (C4)          ==>
Network Connections = SQL Server (C4)           ==>

As you can see, strings in the second line of the output are equal. So why isn't $test true?

EDIT: I've printed some more output and found out that when $descr_d eq $List, it equals, but not if $descr_d =~ $List. Could you please explain what is actually putting to the $test variable? I don't understand what does defined() ? : mean in here.

EDIT2: For a string "SQL Server Agent" the script works just fine, there is a problem only when (C4) is attached. Quite strange, isn't it?

Source: (StackOverflow)

AWK: Are these statements required?

I have the following line in a Nagios bash script. It is being used to get the up and down error rates for the specified network cards:

if=`awk -v interface="$INTERFACE" '$1 ~ "^" interface ":" { split($0, a, /: */); $0 = a[2]; print $3 " " $11 }' /proc/net/dev`

I've never worked with awk before today, so I'm finding my way a bit.

As I see it, we pass the value $INTERFACE into the awk script as interface, and then filter for lines beginning interface: (eg eth0:). Then, we split the line using colon-space as a separator. Then, for some reason we assign the third entry in the array to $0 before actually extracting the values we want.

It seems to me that the statements split($0, a, /: */) and $0 = a[2] are unecessary but I may be wrong! Does the assigning of a[2] to $0 change anything when we then refer to $3 and $11? I've tried the script without the first two statements and the output is the same, but perhaps there's a corner case I've missed...

Thanks in advance

Rich

Source: (StackOverflow)

Nagios not sending emails

I want to setup nagios to send email notifications. I can send email notifications manually clicking the "Send custom service notification" in nagios web interface. The notification is being created and the email is being sent and delivered successfully. But nagios doesn't send notifications automatically. I have tested it turning off PING service on the machine (echo 1 >/proc/sys/net/ipv4/icmp_echo_ignore_all). Nagios sets PING service to CRITICAL state, but doesn't send notification email.

These are my config files:

Part of templates.cfg

define contact{
        name                            generic-contact     ; The name of this contact template
        service_notification_period     24x7            ; service notifications can be sent anytime
        host_notification_period        24x7            ; host notifications can be sent anytime
        service_notification_options    w,u,c,r,f,s     ; send notifications for all service states, flapping events, and scheduled downtime events
        host_notification_options       d,u,r,f,s       ; send notifications for all host states, flapping events, and scheduled downtime events
        service_notification_commands   notify-service-by-email ; send service notifications via email
        host_notification_commands      notify-host-by-email    ; send host notifications via email
        register                        0               ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
}

Part of contacts.cfg

define contact{
        contact_name                    nagiosadmin     ; Short name of user
        use                             generic-contact     ; Inherit default values from generic-contact template (defined above)
        alias                           Nagios Admin        ; Full name of user
        service_notification_options    w,u,c,r,f,s
        host_notification_options       d,u,r,f,s
        email                           MY-EMAIL@gmail.com      ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
}

define contactgroup{
        contactgroup_name       admins
        alias                   Nagios Administrators
        members                 nagiosadmin
}

generic-host_nagios2.cfg

define host{
        name                            generic-host    ; The name of this host template
        notifications_enabled           1       ; Host notifications are enabled
        event_handler_enabled           1       ; Host event handler is enabled
        flap_detection_enabled          1       ; Flap detection is enabled
        failure_prediction_enabled      1       ; Failure prediction is enabled
        process_perf_data               1       ; Process performance data
        retain_status_information       1       ; Retain status information across program restarts
        retain_nonstatus_information    1       ; Retain non-status information across program restarts
        check_command                   check-host-alive
        max_check_attempts              10
        notification_interval           1
        notification_period             24x7
        notification_options            d,u,r,f,s
        contact_groups                  admins
        register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}

generic-service_nagios2.cfg

define service{
        name                            generic-service ; The 'name' of this service template
        active_checks_enabled           1       ; Active service checks are enabled
        passive_checks_enabled          1       ; Passive service checks are enabled/accepted
        parallelize_check               1       ; Active service checks should be parallelized (disabling this can lead to major performance problems)
        obsess_over_service             1       ; We should obsess over this service (if necessary)
        check_freshness                 0       ; Default is to NOT check service 'freshness'
        notifications_enabled           1       ; Service notifications are enabled
        event_handler_enabled           1       ; Service event handler is enabled
        flap_detection_enabled          1       ; Flap detection is enabled
        failure_prediction_enabled      1       ; Failure prediction is enabled
        process_perf_data               1       ; Process performance data
        retain_status_information       1       ; Retain status information across program restarts
        retain_nonstatus_information    1       ; Retain non-status information across program restarts
        notification_interval           1       ; Only send notifications on status change by default.
        is_volatile                     0
        check_period                    24x7
        normal_check_interval           5
        retry_check_interval            1
        max_check_attempts              4
        notification_period             24x7
        notification_options            w,u,c,r,f,s
        contact_groups                  admins
        register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}

How can I force nagios to send notification emails?

Source: (StackOverflow)

cannot find ssl libraries?

I am trying to install nrpe plugin on ubuntu 12.04, however I am facing issue with ssl libraries. I tried installing "libcurl3-openssl-dev" package, however when I try to compile nrpe plugin after installing this package I am facing issue saying "cannot find ssl libraries".

Could anyone please shade some light on this?

Thanking you,

Regards, Gaurav.

Source: (StackOverflow)

Nagios notification when no message is received within 48 hours

In Nagios it is easy to check that a LogMessage happened in the last 48 hours and sound an alarm. What I would like, though, is to instead configure Nagios to sound an alarm when a specific message did not occur within 48 hours.

Can anyone point me in the right direction?

I am using the "Check WMI Plus" plugin (no agent required) in order to check the event log on a windows box.

Source: (StackOverflow)

Monitoring WCF with nagios

I'm not familiar with Nagios and I'm still half way looking through the plugins and documentation but our client is currently using this and they want to use it to monitor our WCF too. Is there a way for Nagios to consume WCF methods or at least monitor the errors thrown by the WCF?

Source: (StackOverflow)

Replacing Nagios HTTP with custom (select/poll driven) daemon?

I have a a Nagios configuration which is performing a number of tests on a few hundred nodes; one of these is a variant of check_http. It's not configured to --enable-embedded-perl (ePN) but we'll be changing that soon. Even with ePN enabled I'm concerned about the model where each execution of this Perl HTTP+SSL check will be handling only a single target.

I'd like to write a simple select() (or poll() / epoll()) driven daemon which creates connections to multiple targets concurrently, reads the results and spits out results in a form that's useable to Nagios as if it were results from a passive check.

Is there a guide to how one could accomplish this? What's the interface or API for providing batched check updates to Nagios?

One hack I'm considering would be to have my daemon update a Redis store (with a key for each target, and a short expiration time) and replace check_http with a very small, lightweight GET of the local Redis instance on the key (the GET would either get the actual results for Nagios or a "(nil)" response which will be treated as if the HTTP connection had timed out.

However, I'm also a bit skeptical of my idea since I'd think someone has already something like this by now.

(BTW: I'm ready to be convinced to switch to something like Icinga or Zabbix or Zenoss or OpenNMS ... pretty much anything that will scale better).

Source: (StackOverflow)

Nagios/NRPE giving a "No output returned from plugin" error

Getting a "No output returned from plugin" error message from a Nagios/NRPE script

1) Running Nagios v3.2.3 and NRPE v2.12

2) The script:

   OK_STATE=0
   UNAME=/bin/uname -r       
   echo "OK: Kernel Version=$UNAME"       
   exit $OK_STATE

2) Command line results on the Nagios Server using NRPE

Same OK results for both the root and nagios users:

[nagios@cmonmm03 libexec]$ ./check_nrpe -H dappsi01b.dev.screenscape.local -c check_kernel OK: Kernel Version=2.6.18-194.11.3.el5

When I run the check_kernel.sh script on the machine's local command line it works there to.

Help, any thoughts or known solution regarding this would be appreciated?

Thank you

Source: (StackOverflow)