Setup Nagios Server with
nsca-ng for DRLM

One of the methods of error reporting with DRLM is nsca-ng, there is a sample configuration on http://docs.drlm.org/en/latest/ErrorReporting.html . In this document we cover the configuration Nagios Server with nsca-ng and also DRLM Server configuration to monitor errors from DRLM Server when running backups.

Of course we assume you have a Nagios Server configured and a DRLM Server

if not, don’t worry, just take a look on the next links

How to install DRLM

How to install NAGIOS

DRLM Server configuration

Install nsca-ng-client package

$ apt install nsca-ng-client

Set up config files

At least 2 files must be configured, in thes example we’re using 3, default.conf has the default values that can be overwritten in local.conf

/usr/share/drlm/conf/default.conf

# REPORT_TYPE=nagios
# NAGIOS VARIABLES
#
# These are default values and can be overwritten in local.conf according to your NAGIOS installation and configuration.
#

NAGCMD="/usr/sbin/send_nsca"
NAGSVC="DRLM"
NAGHOST="$HOSTNAME"
NAGCONF="/etc/drlm/alerts/nagios.cfg"

Note

Keep an eye on this variable NAGSVC . We’re going to use it on the Nagios server side as a service description it must match.

/etc/drlm/local.conf

NAGSVC="DRLM_Backup"

Note

As you can see this varible was defined previously on default.conf , it is just to show you than it can be overwritten with the /etc/drlm/local.conf file, so if you want, you can dismiss this step.

/etc/drlm/alerts/nagios.cfg

#### DRLM (Disaster Recovery Linux Manager) Nagios error reporting sample configuration file.
#### Default: /etc/drlm/alerts/nagios.cfg

### identity = <string>
#   Send  the  specified  client identity to the server.
#   By default, localhost will be used.

identity = "DRLM"

### server = <string>
#   Connect and talk to the specified server address or hostname.
#   The  default server is "localhost".

server = "Cervell"

### port = <string>
#   Connect  to  the  specified  service  name or port number on the
#   server instead of using the default port (5668).

port = 5668
password = "change-me"

Where:

  • DRLM: Is the name of the DRLM Server
  • Cervell: Is the name of the Nagios Server
  • port: Is the port where the Nagios Server is listening
  • password: Is the default password on the nsca-ng-server

Nagios Server configuration

Once the DRLM Server has been configured we’ll set up the Nagios Server.

Install required packages

$ apt install nsca-ng-server

Set up nsca-ng config files

/etc/nsca-ng/nsca-ng.cfg

command_file = "/usr/local/nagios/var/rw/nagios.cmd"

listen = "Cervell" # only listen on localhost. If you use systemd this
                          # this option is overriden by the
                          # nsca-ng-server.socket file.

user = "nagios" # run as user nagios
pid_file = "/var/run/nsca-ng/nsca-ng.pid" # pid file for nsca-ng

include(/etc/nsca-ng/nsca-ng.local.cfg)

authorize "*" {
   password = "change-me"
   #
   # The original NSCA server permits all authenticated clients to submit
   # arbitrary check results.  To get this behaviour, enable the following
   # lines:
   #
           hosts = ".*"
           services = ".*"
}

Note

This config file has been reduced to only the minimum requirements , if you want to see all options check the original file /etc/nsca-ng/nsca-ng.cfg

/lib/systemd/system/nsca-ng-server.socket

[Unit]
Description=NSCA-ng Socket
Documentation=man:nsca-ng(8) man:nsca-ng.cfg(5)

[Socket]
ListenStream=5668
#BindIPv6Only=both

[Install]
WantedBy=sockets.target

Start nsca-ng-server service

$ systemctl start nsca-ng-server
  • Check the status of the service
$ systemctl status nsca-ng-server
● nsca-ng-server.service - Monitoring Command Acceptor
   Loaded: loaded (/lib/systemd/system/nsca-ng-server.service; static)
   Active: active (running) since Fri 2017-02-17 18:35:46 CET; 5s ago
     Docs: man:nsca-ng(8)
           man:nsca-ng.cfg(5)
 Main PID: 14495 (nsca-ng)
   CGroup: /system.slice/nsca-ng-server.service
           └─14495 /usr/sbin/nsca-ng -c /etc/nsca-ng/nsca-ng.cfg

Feb 17 18:35:46 cervell nsca-ng[14495]: Ignoring `-b'/`listen' when socket activated
Feb 17 18:35:46 cervell nsca-ng[14495]: nsca-ng 1.4 (OpenSSL 1.0.1t, libev 4.15 with epoll) starting up

Nagios Config files

For this kind of configuration we’re using passive checks, the error notification is set for a limited time using the variable freshness_threshold. It’s importatnt to setup the notifications in order to receibe an email in case of error , if not you could miss it.

  • Add a new service on

/usr/local/nagios/etc/objects/templates.cfg

define service{
       name passive_service
       active_checks_enabled 0
       passive_checks_enabled 1 # We want only passive checking
       flap_detection_enabled 0
       register 0 # This is a template, not a real service
       is_volatile 0
       check_period 24x7
       max_check_attempts 1
       normal_check_interval 5
       retry_check_interval 1
       check_freshness 1
       freshness_threshold                     600
       contact_groups admins
       check_command check_dummy!0
       notifications_enabled           1                       ; Service notifications are enabled
       notification_interval 10
       notification_period 24x7
       notification_options w,u,c,r
       stalking_options w,c,u
       }
  • Add the DRLM Server on a hostgroup

/usr/local/nagios/etc/object/hostgroup.cfg

define hostgroup {
      hostgroup_name  Krbulan-Servers
      alias           Servidors Test
      members         DRLM
      }
  • Add a new check

/usr/local/nagios/etc/object/commands.cfg

#NSCA-ng Command
define command{
        command_name check_dummy
        command_line $USER1$/check_dummy $ARG1$
       }
  • Define the host and service

/usr/local/nagios/etc/object/DRLM.cfg

define host{
           use Host-krbulan
           host_name DRLM
           hostgroups  Krbulan-Servers
           alias DRLM
           address 192.168.7.9
           }

define service{
           use passive_service
           service_description DRLM_Backup
           host_name DRLM
           notifications_enabled           1
           }

Warning

service_description has to match with the variable NAGSVC before configured on the DRLM server.

  • Check Nagios configuration files
$ /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.2.0
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-01-2016
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
    Checked 16 services.
    Checked 3 hosts.
    Checked 2 host groups.
    Checked 0 service groups.
    Checked 1 contacts.
    Checked 1 contact groups.
    Checked 26 commands.
    Checked 5 time periods.
    Checked 0 host escalations.
    Checked 0 service escalations.
Checking for circular paths...
    Checked 3 hosts
    Checked 0 service dependencies
    Checked 0 host dependencies
    Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

Testing the configuration

From DRLM server exec runbackup (dummy server is not online)

$ root@DRLM:~# drlm -vD runbackup -c dummy
Disaster Recovery Linux Manager 2.1.0 / Git
Using log file: /var/log/drlm/drlm-DRLM-runbackup.20170217.190926.log
ERROR: drlm:runbackup: Client dummy SSH Server is not available (SSH) aborting ...
Aborting due to an error, check /var/log/drlm/drlm-DRLM-runbackup.20170217.190926.log for details
Terminated

On Nagios

 

On the notifications we see that the mail has been send