nagios interview questions
Top nagios frequently asked interview questions
In my nrpe_local.cfg
added following command:
command[check_mycommand]=/usr/lib/nagios/plugins/check_command 30 35
and then restarted nrpe daemon.
When I execute this command using nrpe
I'm getting the following error:
NRPE: Command 'check_mycommand' not defined
I used following command to execute:
/usr/lib/nagios/plugins/check_nrpe -H hostname -c check_mycommand
I am unable to get any clue.
In my nrpe_local.cfg
there are 10 more commands added and they are working properly.
Source: (StackOverflow)
I've been running nagios for about two years, but recently this problem started appearing with one of my services.
I'm getting
CRITICAL - Socket timeout after 10 seconds
for a check_http -H my.host.com -f follow -u /abc/def
check, which used to work fine. No other services are reporting this problem. The remote site is up and healthy, and I can do a wget http://my.host.com/abc/def
from the nagios server, and it downloads the response just fine. Also, doing a check_http -H my.host.com -f follow
works just fine, i.e. it's only when I use the -u
argument that things break. I also tried passing it a different user agent string, no difference. I tried increasing the timeout, no luck. I tried with -v, but all it get is:
GET /abc/def HTTP/1.0
User-Agent: check_http/v1861 (nagios-plugins 1.4.11)
Connection: close
Host: my.host.com
CRITICAL - Socket timeout after 10 seconds
... which does not tell me what's going wrong.
Any ideas how I could resolve this?
Thanks!
Source: (StackOverflow)
I can print out Host UP Time percentage from Nagios-Report Perl Module with following code:
#!/usr/bin/perl
use strict ;
use Nagios::Report ;
my $x = Nagios::Report->new(q<local_cgi localhost nagiosadmin>)
or die "Can't construct Nagios::Report object." ;
$x->mkreport(
[ qw(HOST_NAME PERCENT_TOTAL_TIME_UP) ],
sub {
my %F = @_; my $u = $F{PERCENT_TOTAL_TIME_UP}; $u =~ s/%//;
},
0,
sub {
my $F = shift @_ ;
}
) ;
$x->debug_dump ;
But How can I only print out Service UP Time Percentage? I mean only output the percentage value.
I tried many options but couldn't get it right.
Source: (StackOverflow)
I have a python script to start a process which I want to monitor using Nagios. When I run that script and perform ps -ef
on my ubuntu EC2 instance, it shows process as python <filename>.py --arguments
. For Nagios to monitor that process using check_procs, we need to supply process name. Here process name becomes 'python'.
/usr/lib/nagios/plugins/check_procs -C python
It returns the output that one python process is running. This is fine when I'm running one python process. But If I'm running multiple python scripts and monitor only few, then I have to give that particular process name. If in the above command, I give python script name, it throws an error. So I want to mask whole python <filename>.py --arguments
to some other name so that while performing check_procs, I can give that new name.
If anyone have any idea, please let me know. I have checked other stackoverflow questions which suggest changing python process name using setproctitle but I want to perform it using shell.
Regards,
Sanket
Source: (StackOverflow)
I have a separate servers running with postgres and Nagios. I want to use "psql_replication_check.pl" with nagios to monitor the postgres replication status. This check script requires DBD::Pg module to connect to database. Installation of DBD::Pg asking for the path of pg_config file.
#perl Makefile.PL
Configuring DBD::Pg 2.17.1
Path to pg_config?
I don't have permissions to install postgres on Nagios machine. Is there anyone who has fixed this issue before.
I have CentOS 5.4 on both systems.
Source: (StackOverflow)
I have different sites running with 4 to 5 server at each location. All the locations have one monitoring server with Nagios. Now I want to create a central location and want to combine all the nagios services running at each location. Can anyone please point me to some documentation for these type of jobs.
Source: (StackOverflow)
i'm working on a script that will monitor traffic on specific hosts from nagios. I have studied some scripts already made and have gathered almost all the info i need to do it but i have encountered a problem in identifying the OID's necessary for the traffic. I wanted to use IF-MIB::ifOutOctets.1
and IF-MIB::ifInOctets.1
to get the incoming and outgoing traffic but when i tested with the following line:
snmpwalk -v 1 -c public myComputer OID
i got the same result for both the OID's and that doesn't seem right. I'm wandering if there are other variables i could try instead of those i'm using now.
It would be useful even if you can point me to where i could find some info on the IF-MIB
, because i can get all the values with snmpwalk
but i don't know how to interpret them
Source: (StackOverflow)
Please help me to understand one strange problem in equality of strings.
This is the code I'm talking about:
my $test=undef;
foreach my $List (@o_descrL) {
if (!($test)) {
$test = defined($o_noreg)
? $descr_d eq $List
: $descr_d =~ /$List/i;
printf("$descr_d = $List\t\t==> $test\n");
}
}
Unfortunately I didn't write it but I have to understand it. $List
is always "SQL Server (C4)", $descr_d
is changing according to actual item in array. Part of the printed output is here:
Power = SQL Server (C4) ==>
SQL Server (C4) = SQL Server (C4) ==>
SNMP Service = SQL Server (C4) ==>
Network Connections = SQL Server (C4) ==>
As you can see, strings in the second line of the output are equal. So why isn't $test
true?
EDIT: I've printed some more output and found out that when $descr_d eq $List
, it equals, but not if $descr_d =~ $List
. Could you please explain what is actually putting to the $test
variable? I don't understand what does defined() ? :
mean in here.
EDIT2: For a string "SQL Server Agent" the script works just fine, there is a problem only when (C4) is attached. Quite strange, isn't it?
Source: (StackOverflow)
I have the following line in a Nagios bash script. It is being used to get the up and down error rates for the specified network cards:
if=`awk -v interface="$INTERFACE" '$1 ~ "^" interface ":" { split($0, a, /: */); $0 = a[2]; print $3 " " $11 }' /proc/net/dev`
I've never worked with awk before today, so I'm finding my way a bit.
As I see it, we pass the value $INTERFACE into the awk script as interface, and then filter for lines beginning interface:
(eg eth0:). Then, we split the line using colon-space as a separator. Then, for some reason we assign the third entry in the array to $0 before actually extracting the values we want.
It seems to me that the statements split($0, a, /: */)
and $0 = a[2]
are unecessary but I may be wrong! Does the assigning of a[2] to $0 change anything when we then refer to $3 and $11? I've tried the script without the first two statements and the output is the same, but perhaps there's a corner case I've missed...
Thanks in advance
Rich
Source: (StackOverflow)
I want to setup nagios to send email notifications.
I can send email notifications manually clicking the "Send custom service notification" in nagios web interface. The notification is being created and the email is being sent and delivered successfully.
But nagios doesn't send notifications automatically. I have tested it turning off PING service on the machine (echo 1 >/proc/sys/net/ipv4/icmp_echo_ignore_all). Nagios sets PING service to CRITICAL state, but doesn't send notification email.
These are my config files:
Part of templates.cfg
define contact{
name generic-contact ; The name of this contact template
service_notification_period 24x7 ; service notifications can be sent anytime
host_notification_period 24x7 ; host notifications can be sent anytime
service_notification_options w,u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events
host_notification_options d,u,r,f,s ; send notifications for all host states, flapping events, and scheduled downtime events
service_notification_commands notify-service-by-email ; send service notifications via email
host_notification_commands notify-host-by-email ; send host notifications via email
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
}
Part of contacts.cfg
define contact{
contact_name nagiosadmin ; Short name of user
use generic-contact ; Inherit default values from generic-contact template (defined above)
alias Nagios Admin ; Full name of user
service_notification_options w,u,c,r,f,s
host_notification_options d,u,r,f,s
email MY-EMAIL@gmail.com ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
}
define contactgroup{
contactgroup_name admins
alias Nagios Administrators
members nagiosadmin
}
generic-host_nagios2.cfg
define host{
name generic-host ; The name of this host template
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
check_command check-host-alive
max_check_attempts 10
notification_interval 1
notification_period 24x7
notification_options d,u,r,f,s
contact_groups admins
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
generic-service_nagios2.cfg
define service{
name generic-service ; The 'name' of this service template
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
notification_interval 1 ; Only send notifications on status change by default.
is_volatile 0
check_period 24x7
normal_check_interval 5
retry_check_interval 1
max_check_attempts 4
notification_period 24x7
notification_options w,u,c,r,f,s
contact_groups admins
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
How can I force nagios to send notification emails?
Source: (StackOverflow)
I am trying to install nrpe plugin on ubuntu 12.04, however I am facing issue with ssl libraries. I tried installing "libcurl3-openssl-dev" package, however when I try to compile nrpe plugin after installing this package I am facing issue saying "cannot find ssl libraries".
Could anyone please shade some light on this?
Thanking you,
Regards,
Gaurav.
Source: (StackOverflow)
In Nagios
it is easy to check that a LogMessage
happened in the last 48 hours and sound an alarm. What I would like, though, is to instead configure Nagios
to sound an alarm when a specific message did not occur within 48 hours.
Can anyone point me in the right direction?
I am using the "Check WMI Plus" plugin (no agent required) in order to check the event log on a windows box.
Source: (StackOverflow)
I'm not familiar with Nagios and I'm still half way looking through the plugins and documentation but our client is currently using this and they want to use it to monitor our WCF too. Is there a way for Nagios to consume WCF methods or at least monitor the errors thrown by the WCF?
Source: (StackOverflow)
I have a a Nagios configuration which is performing a number of tests on a few hundred nodes; one of these is a variant of check_http
. It's not configured to --enable-embedded-perl (ePN) but we'll be changing that soon. Even with ePN enabled I'm concerned about the model where each execution of this Perl HTTP+SSL check will be handling only a single target.
I'd like to write a simple select() (or poll() / epoll()) driven daemon which creates connections to multiple targets concurrently, reads the results and spits out results in a form that's useable to Nagios as if it were results from a passive check.
Is there a guide to how one could accomplish this? What's the interface or API for providing batched check updates to Nagios?
One hack I'm considering would be to have my daemon update a Redis store (with a key for each target, and a short expiration time) and replace check_http
with a very small, lightweight GET of the local Redis instance on the key (the GET would either get the actual results for Nagios or a "(nil)" response which will be treated as if the HTTP connection had timed out.
However, I'm also a bit skeptical of my idea since I'd think someone has already something like this by now.
(BTW: I'm ready to be convinced to switch to something like Icinga or Zabbix or Zenoss or OpenNMS ... pretty much anything that will scale better).
Source: (StackOverflow)
Getting a "No output returned from plugin" error message from a Nagios/NRPE script
1) Running Nagios v3.2.3 and NRPE v2.12
2) The script:
OK_STATE=0
UNAME=/bin/uname -r
echo "OK: Kernel Version=$UNAME"
exit $OK_STATE
2) Command line results on the Nagios Server using NRPE
- Same OK results for both the root and nagios users:
[nagios@cmonmm03 libexec]$ ./check_nrpe -H dappsi01b.dev.screenscape.local -c check_kernel
OK: Kernel Version=2.6.18-194.11.3.el5
When I run the check_kernel.sh script on the machine's local command line it works there to.
Help, any thoughts or known solution regarding this would be appreciated?
Thank you
Source: (StackOverflow)