EzDevInfo.com

nrpe interview questions

Top nrpe frequently asked interview questions

* ERROR: nrpe does not have a start function

I installed Nagios-NRPE on a Gentoo virtual machine.

When I tried to start nrpe using /etc/init.d/nrpe start I got the following error:

ERROR:  nrpe does not have a start function.

However I do not get this error on other Gentoo virtual machines on which I have installed Nagios-NRPE.

What might be causing this error?

Source: (StackOverflow)

CHECK_NRPE: Error - Could not complete SSL handshake

I have NRPE daemon process running under xinetd on amazon ec2 instance and nagios server on my local machine.

The check_nrpe -H [amazon public IP] gives this error:

CHECK_NRPE: Error - Could not complete SSL handshake.

Both Nrpe are same versions. Both are compiled with this option:

./configure  --with-ssl=/usr/bin/openssl --with-ssl-lib=/usr/lib/i386-linux-gnu/

"allowed host" entry contains my local IP address.

What could be the possible reason of this error now??

Source: (StackOverflow)

Error: -bash: check_nrpe: command not found

I am using Nagios XI. I issued following command from the Nagios Server:

nagiossrv root [libexec] > check_nrpe -H 128.19.5.131 -t 30 -c check_users -w 5 -c 10

It is giving me following error:

-bash: check_nrpe: command not found

I have also added the IP address of the Nagios server (nagiossrv) to the /usr/local/nagios/etc/nrpe.cfg file at the host's (128.19.5.131) side.

What is the issue?

Source: (StackOverflow)

compute CRC of struct in Python

I have the following struct, from the NRPE daemon code in C:

typedef struct packet_struct {
  int16_t packet_version;
  int16_t packet_type;
  uint32_t crc32_value;
  int16_t result_code;
  char buffer[1024];
} packet;

I want to send this data format to the C daemon from Python. The CRC is calculated when crc32_value is 0, then it is put into the struct. My Python code to do this is as follows:

cmd = '_NRPE_CHECK'
pkt = struct.pack('hhIh1024s', 2, 1, 0, 0, cmd)
# pkt has length of 1034, as it should
checksum = zlib.crc32(pkt) & 0xFFFFFFFF
pkt = struct.pack('hhIh1024s', 2, 1, checksum, 0, cmd)
socket.send(....)

The daemon is receiving these values: version=2 type=1 crc=FE4BBC49 result=0

But it is calculating crc=3731C3FD

The actual C code to compute the CRC is:

https://github.com/KristianLyng/nrpe/blob/master/src/utils.c

and it is called via:

calculate_crc32((char *)packet, sizeof(packet));

When I ported those two functions to Python, I get the same as what zlib.crc32 returns.

Is my struct.pack call correct? Why is my CRC computation differing from the server's?

Source: (StackOverflow)

Unsure how to troubleshoot NRPE issue

If have distributed the puppet check for Nagios available from https://github.com/liquidat/nagios-icinga-checks/blob/master/check_puppetagent

My issue is that I get different results if I execute locally vs via NRPE:

[root@nagios-client /]# /usr/lib64/nagios/plugins/check_puppetagent
OK: Puppet was last run 17 minutes and 9 seconds ago

[root@nagios ~]# /usr/lib64/nagios/plugins/check_nrpe -H 192.168.50.121 -c check_puppetagent
WARN: Puppet has never run, no /opt/puppetlabs/puppet/cache/state/last_run_summary.yaml found.

Editing the file /usr/lib64/nagios/plugins/check_puppetagent and changing the line to: summary = '/opt/puppetlabs/puppet/cache/state/last_run_summaries.yaml' on the client yields the expected result:

[root@nagios ~]# /usr/lib64/nagios/plugins/check_nrpe -H 192.168.50.121 -c check_puppetagent
WARN: Puppet has never run, no /opt/puppetlabs/puppet/cache/state/last_run_summaries.yaml found.

So I know the correct file is being executed.

Executing it manually from remote works:

[root@nagios ~]# ssh 192.168.50.121 "/usr/lib64/nagios/plugins/check_puppetagent"
root@192.168.50.121's password:
OK: Puppet was last run 13 seconds ago

Antone have any ideas/suggestions what else I can do to troubleshoot?

Source: (StackOverflow)

Nagios - NRPE: Command '...' not defined

In /usr/local/nagios/etc/nrpe.cfg I added a new command check_this_process to the already pre-defined ones:

command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/$
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s$
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_this_process]=/usr/local/nagios/libexec/check_procs -w 15 -c 20 -C name

This works:

define service{
        use                     generic-service
        host_name               my_host
        service_description     CPU Load
        check_command           check_nrpe!check_load
}

This doesn't:

define service{
        use                             local-service
        host_name                       my_host
        service_description             cron
        check_command                   check_nrpe!check_this_process
}

and returns: NRPE: Command 'check_this_process' not defined

Source: (StackOverflow)

check multiple databases with nagios

I monitor servers with databases, every server have at least one database and max 5 databases, I use check_oracle_health plugin.

I've created 5 services (check_db1, check_db2...) and for every host I added a custom variable/macro (_DB_NAME1, _DB_NAME2...) depending on the number of databases on that server, so that variable can be used with the check_dbx service. those services are affected to the "oracle-hosts" hostgroup that I've created.

For now to exclude the hosts that don't have 5 databases, I've added on each service that I want to exclude a field host_name with "!myhostname". but this method is a bit long and requires to do this exclusion everytime I add a new host, so I wanted to know if there is a more simple method, to optimize this configuration. if there is a way to make a generic service/check

(I have the same problem with checking windows disks, with hosts with only C,D drives and others with C,D,E....)

I hope that I was clear on my problem description, and thank you in advance for your help.

Source: (StackOverflow)

Setting a variable based on enviroment in chef

I am having trouble setting a variable on a template based on the result of node.chef_environment. Current when I run kitchen converge the chef run errors out on restarting the nrpe service because NRPE is complaining that allowed_hosts is blank. This is my first take at ever writing a cookbook so please excuse any ugly things that I have done.

if node.chef_environment == "development" || node.chef_environment == "qa" || node.chef_environment == "vagrant"
        node.default['allowed_hosts'] = "ipaddr"
elsif node.chef_environment == "staging" || node.chef_environment  == "production"
        node.default['allowed_hosts'] = "ipaddr2"
end

case
when platform_family?("debian")
    package "nagios-nrpe-server"
    package "nagios-plugins-basic"
when platform_family?("rhel")
    package "nagios-nrpe"
    package "nagios-plugins-nrpe"
    package "net-snmp-utils"
else
    Chef::Application.fatal! "[nagios-nrpe client] unsupported platform family: #{node[:platform_family]}"
end

template "/etc/nagios/nrpe.cfg" do
    source "nrpe.cfg.erb"
    owner "root"
    group "root"
    mode "0644"
    variables( 
        :allowed_hosts => node.default['allowed_hosts'] 
    )
end

bash "wget" do
    code <<-EOH
    cd /usr/lib/nagios/
    wget --no-check-certificate remote_source
    EOH
end

directory "/usr/lib/nagios/plugins" do
    action :delete
    recursive true
end

execute "untar plugins" do
    cwd "/usr/lib/nagios/"
    command "tar zxvf cc_sys_nrpe.tar.gz"
end

directory "/usr/lib/nagios/plugins" do
    mode "777"
    recursive true
end

file "/usr/lib/nagios/cc_sys_nrpe.tar.gz" do
    action :delete
end

service "nagios-nrpe-server" do
    supports :status => true, :restart => true, :reload => true
    action :restart
end

Source: (StackOverflow)

NSClient++ commands for NRPE with Windows Server 2012 from Icinga

I have researched this problem for days but I can't come to a solution. I have a Windows 2012 server with NSClient++ installed on this. I have also an Icinga server with the Nagios NRPE plugin installed. Also, the NSClient++ is configured to accept NRPE commands and the "allow arguments = 1" is set. From the Icinga server, when I give this input:

/usr/lib/nagios/plugins/check_nrpe -H 192.168.1.22 -c alias_cpu

it gives this: OK CPU Load ok.|'5m'=27%;80;90 '1m'=26%;80;90 '30s'=26%;80;90

So everything looks totally fine, but from the Icinga webinterface, I get this error: /usr/lib/nagios/plugins/check_nrpe: option requires an argument -- 'a'

It looks that I just can't get the commands right. I tried every command I found on the internet but none of them works fine. Also, the NSClient documentation for NRPE is outdated, as they say that you should use check_nt but that command is deprecated for over a year now, so I should use check_nrpe but that doesn't work eiter.

So I created a .cfg file in /etc/icinga/objects and I am currently using these commands:

define host{
       use windows-servers
       host_name host.domain.com
       alias host
       address 192.168.1.22
}

define service{
        use                             generic-service
        host_name                       host.domain.com
        service_description             Drive Usage
        check_command                   check_nrpe!alias_disk
        }


define service{
        use                     generic-service
        host_name               host.domain.com
        service_description     CPU Load
        check_command           check_nrpe!alias_cpu
}

On the Windows Server, the settings in the nsclient.ini are these:

[/settings/NRPE/server]
allowed hosts=172.16.0.7
allow arguments=1
port=5666
allow nasty_meta chars=1 
use SSL = 1

Does anyone has an idea what is going wrong here? I am totally out of options now. Am I gving wrong commands? Does anyone know the right commands? Or am I doing something else wrong? Thanks!

Source: (StackOverflow)

use lwrp with chef-solo

I'm using chef-solo with librarian-chef to manage my servers. Here's the structure I have locally:

Cheffile Cheffile.lock cookbooks data_bags Gemfile Gemfile.lock .git .gitignore nodes README.md roles tmp

Each node from the nodes/ dir has a role defined and I've added most of the generic attributes in the roles.

I've included the nrpe cookbook in one of the roles and it's working for the generic part:

```

"apache" => {

    "timeout" => 5,
    "keep_alive" => 'On',
    "max_keep_alive_requests" => 100,
    "keep_alive_timeout" => 5,

    "prefork" => {
        "start_servers" => 5,
        "min_spare_servers" => 5,
        "max_spare_servers" => 10,
        "max_clients" => 100,
        "max_requests_per_child" => 1000
    }
},
"nrpe" => {
    "server_port" => 5666,
    "connection_timeout" => 300,
    "dont_blame_nrpe" => 1,
    "command_timeout" => 60,
    "allowed_hosts" => ["10.1.1.10,10.11.1.11"],

 }

} override_attributes(attrs)

```

I'm now trying to use the LWRP provided by the cookbook to set up checks in the chef created nrpe.cfg

Any syntax I was able to think about doesn't seem to work though. The knife solo bootstrap nodename either exits with syntax errors or completes, but nothing is added on the node. Any insight on how to add this:

nagios_nrpecheck 'check_load' do command "#{node['nagios']['plugin_dir']}/check_load" warning_condition '6' critical_condition '10' action :add end

in the nrpe block from the role file above will be much appreciated.

Thanks!

Source: (StackOverflow)

Ansible send file to the first met destination

I'm sending a config file for thousands of nodes, because of some customisation there's maybe 5 or 6 paths to that file (There's only one file for host but the path can vary) and there isn't a easy way to figure out the default location with facts.

Based on this, I'm looking for some way to set the "dest" of copy module like we can set the "src", with a with_first_found loop.

Something like that:

copy: src=/foo/{{ ansible_hostname }}/nrpe.cfg dest="{{item}}
with_items:
    - "/etc/nagios/nrpe.cfg"
    - "/usr/local/nagios/etc/nrpe.cfg"
    - "/usr/lib64/nagios/etc/nrpe.cfg"
    - "/usr/lib/nagios/etc/nrpe.cfg"
    - "/opt/nagios/etc/nrpe.cfg"

PS: I'm sending nrpe.cfg so if someone knows a better way to find where's the default nrpe.cfg it will be a lot easier.

EDIT 1: I've managed to work with the help from @ydaetskcoR like this:

- name: find nrpe.cfg
  stat:
    path: "{{ item }}"
  with_items:
    - "/etc/nagios/nrpe.cfg"
    - "/usr/local/nagios/etc/nrpe.cfg"
    - "/usr/lib64/nagios/etc/nrpe.cfg"
    - "/usr/lib/nagios/etc/nrpe.cfg"
    - "/opt/nagios/etc/nrpe.cfg"
  register: nrpe_stat
  no_log: True

- name: Copy nrpe.cfg
  copy: src=/foo/{{ ansible_hostname }}/nrpe.cfg dest="{{item.stat.path}}"
  when: item.stat.exists
  no_log: True
  with_items:
    - "{{nrpe_stat.results}}"

Source: (StackOverflow)

Monitoring with Nagios disk - NRPE - Linux

I am having a problem with respect to Nagios / NRPE service is as follows:

I have done the configuration of each of the files, but does not recognize me nagios response delivered by NRPE on the client:

File: /etc/nagios/nrpe.cfg (client)

command[check_users]=/usr/lib64/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib64/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_disk]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/mapper/VG_opt-LV_opt
command[check_zombie_procs]=/usr/lib64/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib64/nagios/plugins/check_procs -w 150 -c 200

File: services.cfg (Servidor)

define service{
  use                   generic-service
  host_name             192.168.160.10, 192.168.160.11, 192.168.160.12
  service_description   Disk Space
  notification_options w,u,c,r
  check_command         check_nrpe!check_disk
}

Clearly I restarted the services on the client and server and other consultations via NRPE results are obtained, as the number of processes, CPU and RAM is the problem with check_disk.

The result obtained with the check_disk locally is:

DISK OK - free space: /opt 34024 MB (76% inode=98%);| /opt=10522MB;37544;42237;0;46930

Result via web

Disk SpaceUNKNOWN    2014-09-26 09:18:31    0d 15h 52m 53s    4/4    (No output returned from plugin)

it is even more strange when from the nagios server if I get results.

$ ./check_nrpe -H 192.168.160.10 -c check_disk
DISK OK - free space: /opt 33389 MB (74% inode=98%);| /opt=11156MB;37544;42237;0;46930

Vía Web:

**(No output returned from plugin)**

(No output returned from plugin)
NRPE Plugin for Nagios
Copyright (c) 1999-2008 Ethan Galstad (nagios@nagios.org)
Version: 2.15
Last Modified: 09-06-2013
License: GPL v2 with exemptions (-l for more info)
SSL/TLS Available: Anonymous DH Mode, OpenSSL 0.9.6 or higher required
\nUsage: check_nrpe -H <host> [ -b <bindaddr> ] [-4] [-6] [-n] [-u] [-p <port>] [-t <timeout>] [-c <command>] [-a <arglist...>]
\nOptions:
-h = Print this short help.
-l = Print licensing information.
-n = Do no use SSL
-u = Make socket timeouts return an UNKNOWN state instead of CRITICAL
<host> = The address of the host running the NRPE daemon
<bindaddr> = bind to local address
-4 = user ipv4 only
-6 = user ipv6 only
[port] = The port on which the daemon is running (default=5666)
[timeout] = Number of seconds before connection times out (default=10)
[command] = The name of the command that the remote daemon should run
[arglist] = Optional arguments that should be passed to the command. Multiple
arguments should be separated by a space. If provided, this must be
the last option supplied on the command line.
\nNote:
This plugin requires that you have the NRPE daemon running on the remote host.
You must also have configured the daemon to associate a specific plugin command
with the [command] option you are specifying here. Upon receipt of the
[command] argument, the NRPE daemon will run the appropriate plugin command and
send the plugin output and return code back to *this* plugin. This allows you
to execute plugins on remote hosts and 'fake' the results to make Nagios think
the plugin is being run locally.
\n

Source: (StackOverflow)

CHECK_NRPE: Error - Could not complete SSL handshake (web)

I have a local Nagios Server and I'm trying to configure it to monitor my tomcat8 server with check_jvm, so I can control the memory and classes used by Java.

To do so I installed the check_nrpe plugin on the client, and configured it but I'm having an 'odd' error.

If I try to call the plugin on the client from my server, it answers correctly, even using check_jvm commands as parameter.

But when I configure it so nagios do the check on his own, the web browser returns a "CHECK_NRPE: Error - Could not complete SSL handshake" for that service specifically.

This is what I have:

From my nagios server

# /usr/local/nagios/libexec/check_nrpe -H <client.ip>
NRPE v2.12
# /usr/local/nagios/libexec/check_nrpe -H <client.ip> -c tomcat_heap
OK 799998504 |max=2101870592;;; commited=2101870592;;; used=799998504;;;

Where tomcat_heap is the name of a command defined in nrpe.cfg at the client in order to use the check_jvm plugin.

command[tomcat_heap]=sudo /usr/local/nagios/libexec/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 1700000000 -c 2000000000

Now, back again on my Nagios server, this is the service definition

define service{
          use                   generic-service
          host_name             lin-des
          service_description   Tomcat heap
          check_command         check_nrpe!tomcat_heap
          }

Now, this returns a 'CHECK_NRPE: Error - Could not complete SSL handshake' on the web app.

I've checked the allowed_hostson the nrpe.cfgfile, as well as on /etc/xinetd.d/nrpe, so it includes my nagios server IP.

I've also checked Selinux and Iptables configuration.

I've also checked that both my Nagios server, and the client share the same version of the ssl libraries.

Lastly, I've checked all the permissions on /usr/local/nagios/libexec on both the server and the client, so the user nagios have the ownership of them.

At this point, I ran out of ideas, and that's why I'm asking you. Any ideas on where the problem may be?

Source: (StackOverflow)

Error: The requested service provider could not be loaded or initialized. - socket(2)

I am running a ruby script that uses Ruby/MySQL and net/ftp. The script is running on a Windows Vista box and is trying to create a database and ftp connection to the same remote Solaris server.

Here is the gist of the code:

require 'mysql'
require 'net/ftp'

dbh = Mysql.real_connect(db["host"], db["user"], db["pass"], db["name"])
ftp = Net::FTP.new(ftp["host"])

Now, if I run the script from the Vista box that it resides on everything works as it should. However, the script is being called from yet another server via NRPE and that's when the error occurs.

If I set db["host"]/ftp["host"] equal to the fully qualified domain name of the remote server here is the error I receive:

getaddrinfo: no address associated with hostname.

After receiving that error I tried pinging the server from the script and sure enough it failed when trying to ping the hostname, however, it worked when I pinged the IP address.

But then if I set db["host"]/ftp["host"] to the IP address of the remote server I get this error:

The requested service provider could not be loaded or initialized. - socket(2)

I'm having a hard time finding any info on how to debug this, so if anyone has any ideas they will be greatly appreciated.

Thanks in advance.

Source: (StackOverflow)

Why does ps only return one line of output in my Perl script when I call it with Nagios?

I have this running:

if (open(PS_ELF, "/bin/ps -eLf|")) {
  while (<PS_ELF>) {
    if ($_ =~ m/some regex/) {
      # do some stuff
    }
  }
}

If called locally, the loop runs just fine, once for every output line of ps -eLf

Now if the same script is called from Nagios via NRPE, PS_ELF does only contain one line (the first line output by ps).

This puzzles me; what could be the reason?

Maybe this is not limited to/caused by Nagios at all, I just included it for the sake of completeness.

I'm on SUSE Enterprise Linux 10 SP2 and perl v5.8.8.

Source: (StackOverflow)