nrpe interview questions
Top nrpe frequently asked interview questions
I installed Nagios-NRPE on a Gentoo virtual machine.
When I tried to start nrpe using /etc/init.d/nrpe start
I got the following error:
ERROR: nrpe does not have a start function.
However I do not get this error on other Gentoo virtual machines on which I have installed Nagios-NRPE.
What might be causing this error?
Source: (StackOverflow)
I have NRPE daemon process running under xinetd on amazon ec2 instance and nagios server on my local machine.
The check_nrpe -H [amazon public IP]
gives this error:
CHECK_NRPE: Error - Could not complete SSL handshake.
Both Nrpe are same versions. Both are compiled with this option:
./configure --with-ssl=/usr/bin/openssl --with-ssl-lib=/usr/lib/i386-linux-gnu/
"allowed host" entry contains my local IP address.
What could be the possible reason of this error now??
Source: (StackOverflow)
I am using Nagios XI. I issued following command from the Nagios Server:
nagiossrv root [libexec] > check_nrpe -H 128.19.5.131 -t 30 -c check_users -w 5 -c 10
It is giving me following error:
-bash: check_nrpe: command not found
I have also added the IP address of the Nagios server (nagiossrv) to the /usr/local/nagios/etc/nrpe.cfg
file at the host's (128.19.5.131) side.
What is the issue?
Source: (StackOverflow)
I have the following struct, from the NRPE daemon code in C:
typedef struct packet_struct {
int16_t packet_version;
int16_t packet_type;
uint32_t crc32_value;
int16_t result_code;
char buffer[1024];
} packet;
I want to send this data format to the C daemon from Python. The CRC is calculated when crc32_value
is 0
, then it is put into the struct. My Python code to do this is as follows:
cmd = '_NRPE_CHECK'
pkt = struct.pack('hhIh1024s', 2, 1, 0, 0, cmd)
# pkt has length of 1034, as it should
checksum = zlib.crc32(pkt) & 0xFFFFFFFF
pkt = struct.pack('hhIh1024s', 2, 1, checksum, 0, cmd)
socket.send(....)
The daemon is receiving these values: version=2 type=1 crc=FE4BBC49 result=0
But it is calculating crc=3731C3FD
The actual C code to compute the CRC is:
https://github.com/KristianLyng/nrpe/blob/master/src/utils.c
and it is called via:
calculate_crc32((char *)packet, sizeof(packet));
When I ported those two functions to Python, I get the same as what zlib.crc32
returns.
Is my struct.pack
call correct? Why is my CRC computation differing from the server's?
Source: (StackOverflow)
If have distributed the puppet check for Nagios available from https://github.com/liquidat/nagios-icinga-checks/blob/master/check_puppetagent
My issue is that I get different results if I execute locally vs via NRPE:
[root@nagios-client /]# /usr/lib64/nagios/plugins/check_puppetagent
OK: Puppet was last run 17 minutes and 9 seconds ago
vs
[root@nagios ~]# /usr/lib64/nagios/plugins/check_nrpe -H 192.168.50.121 -c check_puppetagent
WARN: Puppet has never run, no /opt/puppetlabs/puppet/cache/state/last_run_summary.yaml found.
Editing the file /usr/lib64/nagios/plugins/check_puppetagent
and changing the line to:
summary = '/opt/puppetlabs/puppet/cache/state/last_run_summaries.yaml'
on the client yields the expected result:
[root@nagios ~]# /usr/lib64/nagios/plugins/check_nrpe -H 192.168.50.121 -c check_puppetagent
WARN: Puppet has never run, no /opt/puppetlabs/puppet/cache/state/last_run_summaries.yaml found.
So I know the correct file is being executed.
Executing it manually from remote works:
[root@nagios ~]# ssh 192.168.50.121 "/usr/lib64/nagios/plugins/check_puppetagent"
root@192.168.50.121's password:
OK: Puppet was last run 13 seconds ago
Antone have any ideas/suggestions what else I can do to troubleshoot?
Source: (StackOverflow)
In /usr/local/nagios/etc/nrpe.cfg
I added a new command check_this_process
to the already pre-defined ones:
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/$
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s$
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_this_process]=/usr/local/nagios/libexec/check_procs -w 15 -c 20 -C name
This works:
define service{
use generic-service
host_name my_host
service_description CPU Load
check_command check_nrpe!check_load
}
This doesn't:
define service{
use local-service
host_name my_host
service_description cron
check_command check_nrpe!check_this_process
}
and returns: NRPE: Command 'check_this_process' not defined
Source: (StackOverflow)
I monitor servers with databases, every server have at least one database and max 5 databases, I use check_oracle_health plugin.
I've created 5 services (check_db1, check_db2...) and for every host I added a custom variable/macro (_DB_NAME1, _DB_NAME2...) depending on the number of databases on that server, so that variable can be used with the check_dbx service. those services are affected to the "oracle-hosts" hostgroup that I've created.
For now to exclude the hosts that don't have 5 databases, I've added on each service that I want to exclude a field host_name with "!myhostname". but this method is a bit long and requires to do this exclusion everytime I add a new host, so I wanted to know if there is a more simple method, to optimize this configuration. if there is a way to make a generic service/check
(I have the same problem with checking windows disks, with hosts with only C,D drives and others with C,D,E....)
I hope that I was clear on my problem description, and thank you in advance for your help.
Source: (StackOverflow)
I am having trouble setting a variable on a template based on the result of node.chef_environment. Current when I run kitchen converge the chef run errors out on restarting the nrpe service because NRPE is complaining that allowed_hosts is blank. This is my first take at ever writing a cookbook so please excuse any ugly things that I have done.
if node.chef_environment == "development" || node.chef_environment == "qa" || node.chef_environment == "vagrant"
node.default['allowed_hosts'] = "ipaddr"
elsif node.chef_environment == "staging" || node.chef_environment == "production"
node.default['allowed_hosts'] = "ipaddr2"
end
case
when platform_family?("debian")
package "nagios-nrpe-server"
package "nagios-plugins-basic"
when platform_family?("rhel")
package "nagios-nrpe"
package "nagios-plugins-nrpe"
package "net-snmp-utils"
else
Chef::Application.fatal! "[nagios-nrpe client] unsupported platform family: #{node[:platform_family]}"
end
template "/etc/nagios/nrpe.cfg" do
source "nrpe.cfg.erb"
owner "root"
group "root"
mode "0644"
variables(
:allowed_hosts => node.default['allowed_hosts']
)
end
bash "wget" do
code <<-EOH
cd /usr/lib/nagios/
wget --no-check-certificate remote_source
EOH
end
directory "/usr/lib/nagios/plugins" do
action :delete
recursive true
end
execute "untar plugins" do
cwd "/usr/lib/nagios/"
command "tar zxvf cc_sys_nrpe.tar.gz"
end
directory "/usr/lib/nagios/plugins" do
mode "777"
recursive true
end
file "/usr/lib/nagios/cc_sys_nrpe.tar.gz" do
action :delete
end
service "nagios-nrpe-server" do
supports :status => true, :restart => true, :reload => true
action :restart
end
Source: (StackOverflow)
I have researched this problem for days but I can't come to a solution.
I have a Windows 2012 server with NSClient++ installed on this. I have also an Icinga server with the Nagios NRPE plugin installed. Also, the NSClient++ is configured to accept NRPE commands and the "allow arguments = 1" is set.
From the Icinga server, when I give this input:
/usr/lib/nagios/plugins/check_nrpe -H 192.168.1.22 -c alias_cpu
it gives this:
OK CPU Load ok.|'5m'=27%;80;90 '1m'=26%;80;90 '30s'=26%;80;90
So everything looks totally fine, but from the Icinga webinterface, I get this error:
/usr/lib/nagios/plugins/check_nrpe: option requires an argument -- 'a'
It looks that I just can't get the commands right. I tried every command I found on the internet but none of them works fine. Also, the NSClient documentation for NRPE is outdated, as they say that you should use check_nt but that command is deprecated for over a year now, so I should use check_nrpe but that doesn't work eiter.
So I created a .cfg file in /etc/icinga/objects and I am currently using these commands:
define host{
use windows-servers
host_name host.domain.com
alias host
address 192.168.1.22
}
define service{
use generic-service
host_name host.domain.com
service_description Drive Usage
check_command check_nrpe!alias_disk
}
define service{
use generic-service
host_name host.domain.com
service_description CPU Load
check_command check_nrpe!alias_cpu
}
On the Windows Server, the settings in the nsclient.ini are these:
[/settings/NRPE/server]
allowed hosts=172.16.0.7
allow arguments=1
port=5666
allow nasty_meta chars=1
use SSL = 1
Does anyone has an idea what is going wrong here? I am totally out of options now.
Am I gving wrong commands? Does anyone know the right commands? Or am I doing something else wrong?
Thanks!
Source: (StackOverflow)
I'm using chef-solo with librarian-chef to manage my servers. Here's the structure I have locally:
Cheffile
Cheffile.lock
cookbooks
data_bags
Gemfile
Gemfile.lock
.git
.gitignore
nodes
README.md
roles
tmp
Each node from the nodes/
dir has a role defined and I've added most of the generic attributes in the roles.
I've included the nrpe cookbook in one of the roles and it's working for the generic part:
```
"apache" => {
"timeout" => 5,
"keep_alive" => 'On',
"max_keep_alive_requests" => 100,
"keep_alive_timeout" => 5,
"prefork" => {
"start_servers" => 5,
"min_spare_servers" => 5,
"max_spare_servers" => 10,
"max_clients" => 100,
"max_requests_per_child" => 1000
}
},
"nrpe" => {
"server_port" => 5666,
"connection_timeout" => 300,
"dont_blame_nrpe" => 1,
"command_timeout" => 60,
"allowed_hosts" => ["10.1.1.10,10.11.1.11"],
}
}
override_attributes(attrs)
```
I'm now trying to use the LWRP provided by the cookbook to set up checks in the chef created nrpe.cfg
Any syntax I was able to think about doesn't seem to work though. The knife solo bootstrap nodename
either exits with syntax errors or completes, but nothing is added on the node. Any insight on how to add this:
nagios_nrpecheck 'check_load' do
command "#{node['nagios']['plugin_dir']}/check_load"
warning_condition '6'
critical_condition '10'
action :add
end
in the nrpe
block from the role file above will be much appreciated.
Thanks!
Source: (StackOverflow)
I'm sending a config file for thousands of nodes, because of some customisation there's maybe 5 or 6 paths to that file (There's only one file for host but the path can vary) and there isn't a easy way to figure out the default location with facts.
Based on this, I'm looking for some way to set the "dest" of copy module like we can set the "src", with a with_first_found
loop.
Something like that:
copy: src=/foo/{{ ansible_hostname }}/nrpe.cfg dest="{{item}}
with_items:
- "/etc/nagios/nrpe.cfg"
- "/usr/local/nagios/etc/nrpe.cfg"
- "/usr/lib64/nagios/etc/nrpe.cfg"
- "/usr/lib/nagios/etc/nrpe.cfg"
- "/opt/nagios/etc/nrpe.cfg"
PS: I'm sending nrpe.cfg so if someone knows a better way to find where's the default nrpe.cfg it will be a lot easier.
EDIT 1: I've managed to work with the help from @ydaetskcoR like this:
- name: find nrpe.cfg
stat:
path: "{{ item }}"
with_items:
- "/etc/nagios/nrpe.cfg"
- "/usr/local/nagios/etc/nrpe.cfg"
- "/usr/lib64/nagios/etc/nrpe.cfg"
- "/usr/lib/nagios/etc/nrpe.cfg"
- "/opt/nagios/etc/nrpe.cfg"
register: nrpe_stat
no_log: True
- name: Copy nrpe.cfg
copy: src=/foo/{{ ansible_hostname }}/nrpe.cfg dest="{{item.stat.path}}"
when: item.stat.exists
no_log: True
with_items:
- "{{nrpe_stat.results}}"
Source: (StackOverflow)
I am having a problem with respect to Nagios / NRPE service is as follows:
I have done the configuration of each of the files, but does not recognize me nagios response delivered by NRPE on the client:
File: /etc/nagios/nrpe.cfg (client)
command[check_users]=/usr/lib64/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib64/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_disk]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/mapper/VG_opt-LV_opt
command[check_zombie_procs]=/usr/lib64/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib64/nagios/plugins/check_procs -w 150 -c 200
File: services.cfg (Servidor)
define service{
use generic-service
host_name 192.168.160.10, 192.168.160.11, 192.168.160.12
service_description Disk Space
notification_options w,u,c,r
check_command check_nrpe!check_disk
}
Clearly I restarted the services on the client and server and other consultations via NRPE results are obtained, as the number of processes, CPU and RAM is the problem with check_disk.
The result obtained with the check_disk locally is:
DISK OK - free space: /opt 34024 MB (76% inode=98%);| /opt=10522MB;37544;42237;0;46930
Result via web
Disk SpaceUNKNOWN 2014-09-26 09:18:31 0d 15h 52m 53s 4/4 (No output returned from plugin)
it is even more strange when from the nagios server if I get results.
$ ./check_nrpe -H 192.168.160.10 -c check_disk
DISK OK - free space: /opt 33389 MB (74% inode=98%);| /opt=11156MB;37544;42237;0;46930
Vía Web:
**(No output returned from plugin)**
(No output returned from plugin)
NRPE Plugin for Nagios
Copyright (c) 1999-2008 Ethan Galstad (nagios@nagios.org)
Version: 2.15
Last Modified: 09-06-2013
License: GPL v2 with exemptions (-l for more info)
SSL/TLS Available: Anonymous DH Mode, OpenSSL 0.9.6 or higher required
\nUsage: check_nrpe -H <host> [ -b <bindaddr> ] [-4] [-6] [-n] [-u] [-p <port>] [-t <timeout>] [-c <command>] [-a <arglist...>]
\nOptions:
-h = Print this short help.
-l = Print licensing information.
-n = Do no use SSL
-u = Make socket timeouts return an UNKNOWN state instead of CRITICAL
<host> = The address of the host running the NRPE daemon
<bindaddr> = bind to local address
-4 = user ipv4 only
-6 = user ipv6 only
[port] = The port on which the daemon is running (default=5666)
[timeout] = Number of seconds before connection times out (default=10)
[command] = The name of the command that the remote daemon should run
[arglist] = Optional arguments that should be passed to the command. Multiple
arguments should be separated by a space. If provided, this must be
the last option supplied on the command line.
\nNote:
This plugin requires that you have the NRPE daemon running on the remote host.
You must also have configured the daemon to associate a specific plugin command
with the [command] option you are specifying here. Upon receipt of the
[command] argument, the NRPE daemon will run the appropriate plugin command and
send the plugin output and return code back to *this* plugin. This allows you
to execute plugins on remote hosts and 'fake' the results to make Nagios think
the plugin is being run locally.
\n
Source: (StackOverflow)
I have a local Nagios Server and I'm trying to configure it to monitor my tomcat8 server with check_jvm, so I can control the memory and classes used by Java.
To do so I installed the check_nrpe plugin on the client, and configured it but I'm having an 'odd' error.
If I try to call the plugin on the client from my server, it answers correctly, even using check_jvm commands as parameter.
But when I configure it so nagios do the check on his own, the web browser returns a "CHECK_NRPE: Error - Could not complete SSL handshake" for that service specifically.
This is what I have:
From my nagios server
# /usr/local/nagios/libexec/check_nrpe -H <client.ip>
NRPE v2.12
# /usr/local/nagios/libexec/check_nrpe -H <client.ip> -c tomcat_heap
OK 799998504 |max=2101870592;;; commited=2101870592;;; used=799998504;;;
Where tomcat_heap
is the name of a command defined in nrpe.cfg
at the client in order to use the check_jvm
plugin.
command[tomcat_heap]=sudo /usr/local/nagios/libexec/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 1700000000 -c 2000000000
Now, back again on my Nagios server, this is the service definition
define service{
use generic-service
host_name lin-des
service_description Tomcat heap
check_command check_nrpe!tomcat_heap
}
Now, this returns a 'CHECK_NRPE: Error - Could not complete SSL handshake' on the web app.
I've checked the allowed_hosts
on the nrpe.cfg
file, as well as on /etc/xinetd.d/nrpe
, so it includes my nagios server IP.
I've also checked Selinux and Iptables configuration.
I've also checked that both my Nagios server, and the client share the same version of the ssl libraries.
Lastly, I've checked all the permissions on /usr/local/nagios/libexec
on both the server and the client, so the user nagios have the ownership of them.
At this point, I ran out of ideas, and that's why I'm asking you. Any ideas on where the problem may be?
Source: (StackOverflow)
I am running a ruby script that uses Ruby/MySQL and net/ftp. The script is running on a Windows Vista box and is trying to create a database and ftp connection to the same remote Solaris server.
Here is the gist of the code:
require 'mysql'
require 'net/ftp'
dbh = Mysql.real_connect(db["host"], db["user"], db["pass"], db["name"])
ftp = Net::FTP.new(ftp["host"])
Now, if I run the script from the Vista box that it resides on everything works as it should. However, the script is being called from yet another server via NRPE and that's when the error occurs.
If I set db["host"]/ftp["host"] equal to the fully qualified domain name of the remote server here is the error I receive:
getaddrinfo: no address associated with hostname.
After receiving that error I tried pinging the server from the script and sure enough it failed when trying to ping the hostname, however, it worked when I pinged the IP address.
But then if I set db["host"]/ftp["host"] to the IP address of the remote server I get this error:
The requested service provider could not be loaded or initialized. - socket(2)
I'm having a hard time finding any info on how to debug this, so if anyone has any ideas they will be greatly appreciated.
Thanks in advance.
Source: (StackOverflow)
I have this running:
if (open(PS_ELF, "/bin/ps -eLf|")) {
while (<PS_ELF>) {
if ($_ =~ m/some regex/) {
# do some stuff
}
}
}
If called locally, the loop runs just fine, once for every output line of ps -eLf
Now if the same script is called from Nagios via NRPE, PS_ELF
does only contain one line (the first line output by ps
).
This puzzles me; what could be the reason?
Maybe this is not limited to/caused by Nagios at all, I just included it for the sake of completeness.
I'm on SUSE Enterprise Linux 10 SP2 and perl v5.8.8.
Source: (StackOverflow)