Setup Nagios Server with nsca-ng for DRLM

One of the methods of error reporting with DRLM is nsca-ng, there is a sample configuration on http://docs.drlm.org/en/latest/ErrorReporting.html . In this document we cover the configuration Nagios Server with nsca-ng and also DRLM Server configuration to monitor errors from DRLM Server when running backups.

Of course we assume you have a Nagios Server configured and a DRLM Server

if not, don’t worry, just take a look on the next links

How to install DRLM

How to install NAGIOS

DRLM Server configuration

Install nsca-ng-client package

$ apt install nsca-ng-client

Set up config files

At least 2 files must be configured, in thes example we’re using 3, default.conf has the default values that can be overwritten in local.conf

/usr/share/drlm/conf/default.conf

# REPORT_TYPE=nagios
# NAGIOS VARIABLES
#
# These are default values and can be overwritten in local.conf according to your NAGIOS installation and configuration.
#

NAGCMD="/usr/sbin/send_nsca"
NAGSVC="DRLM"
NAGHOST="$HOSTNAME"
NAGCONF="/etc/drlm/alerts/nagios.cfg"

Note

Keep an eye on this variable NAGSVC . We’re going to use it on the Nagios server side as a service description it must match.

/etc/drlm/local.conf

NAGSVC="DRLM_Backup"

Note

As you can see this varible was defined previously on default.conf , it is just to show you than it can be overwritten with the /etc/drlm/local.conf file, so if you want, you can dismiss this step.

/etc/drlm/alerts/nagios.cfg

#### DRLM (Disaster Recovery Linux Manager) Nagios error reporting sample configuration file.
#### Default: /etc/drlm/alerts/nagios.cfg

### identity = <string>
#   Send  the  specified  client identity to the server.
#   By default, localhost will be used.

identity = "DRLM"

### server = <string>
#   Connect and talk to the specified server address or hostname.
#   The  default server is "localhost".

server = "Cervell"

### port = <string>
#   Connect  to  the  specified  service  name or port number on the
#   server instead of using the default port (5668).

port = 5668
password = "change-me"

Where:

  • DRLM: Is the name of the DRLM Server
  • Cervell: Is the name of the Nagios Server
  • port: Is the port where the Nagios Server is listening
  • password: Is the default password on the nsca-ng-server

Nagios Server configuration

Once the DRLM Server has been configured we’ll set up the Nagios Server.

Install required packages

$ apt install nsca-ng-server

Set up nsca-ng config files

/etc/nsca-ng/nsca-ng.cfg

command_file = "/usr/local/nagios/var/rw/nagios.cmd"

listen = "Cervell" # only listen on localhost. If you use systemd this
                          # this option is overriden by the
                          # nsca-ng-server.socket file.

user = "nagios" # run as user nagios
pid_file = "/var/run/nsca-ng/nsca-ng.pid" # pid file for nsca-ng

include(/etc/nsca-ng/nsca-ng.local.cfg)

authorize "*" {
   password = "change-me"
   #
   # The original NSCA server permits all authenticated clients to submit
   # arbitrary check results.  To get this behaviour, enable the following
   # lines:
   #
           hosts = ".*"
           services = ".*"
}

Note

This config file has been reduced to only the minimum requirements , if you want to see all options check the original file /etc/nsca-ng/nsca-ng.cfg

/lib/systemd/system/nsca-ng-server.socket

[Unit]
Description=NSCA-ng Socket
Documentation=man:nsca-ng(8) man:nsca-ng.cfg(5)

[Socket]
ListenStream=5668
#BindIPv6Only=both

[Install]
WantedBy=sockets.target

Start nsca-ng-server service

$ systemctl start nsca-ng-server
  • Check the status of the service
$ systemctl status nsca-ng-server
● nsca-ng-server.service - Monitoring Command Acceptor
   Loaded: loaded (/lib/systemd/system/nsca-ng-server.service; static)
   Active: active (running) since Fri 2017-02-17 18:35:46 CET; 5s ago
     Docs: man:nsca-ng(8)
           man:nsca-ng.cfg(5)
 Main PID: 14495 (nsca-ng)
   CGroup: /system.slice/nsca-ng-server.service
           └─14495 /usr/sbin/nsca-ng -c /etc/nsca-ng/nsca-ng.cfg

Feb 17 18:35:46 cervell nsca-ng[14495]: Ignoring `-b'/`listen' when socket activated
Feb 17 18:35:46 cervell nsca-ng[14495]: nsca-ng 1.4 (OpenSSL 1.0.1t, libev 4.15 with epoll) starting up

Nagios Config files

For this kind of configuration we’re using passive checks, the error notification is set for a limited time using the variable freshness_threshold. It’s importatnt to setup the notifications in order to receibe an email in case of error , if not you could miss it.

  • Add a new service on

/usr/local/nagios/etc/objects/templates.cfg

define service{
       name passive_service
       active_checks_enabled 0
       passive_checks_enabled 1 # We want only passive checking
       flap_detection_enabled 0
       register 0 # This is a template, not a real service
       is_volatile 0
       check_period 24x7
       max_check_attempts 1
       normal_check_interval 5
       retry_check_interval 1
       check_freshness 1
       freshness_threshold                     600
       contact_groups admins
       check_command check_dummy!0
       notifications_enabled           1                       ; Service notifications are enabled
       notification_interval 10
       notification_period 24x7
       notification_options w,u,c,r
       stalking_options w,c,u
       }
  • Add the DRLM Server on a hostgroup

/usr/local/nagios/etc/object/hostgroup.cfg

define hostgroup {
      hostgroup_name  Krbulan-Servers
      alias           Servidors Test
      members         DRLM
      }
  • Add a new check

/usr/local/nagios/etc/object/commands.cfg

#NSCA-ng Command
define command{
        command_name check_dummy
        command_line $USER1$/check_dummy $ARG1$
       }
  • Define the host and service

/usr/local/nagios/etc/object/DRLM.cfg

define host{
           use Host-krbulan
           host_name DRLM
           hostgroups  Krbulan-Servers
           alias DRLM
           address 192.168.7.9
           }

define service{
           use passive_service
           service_description DRLM_Backup
           host_name DRLM
           notifications_enabled           1
           }

Warning

service_description has to match with the variable NAGSVC before configured on the DRLM server.

  • Check Nagios configuration files
$ /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.2.0
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-01-2016
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
    Checked 16 services.
    Checked 3 hosts.
    Checked 2 host groups.
    Checked 0 service groups.
    Checked 1 contacts.
    Checked 1 contact groups.
    Checked 26 commands.
    Checked 5 time periods.
    Checked 0 host escalations.
    Checked 0 service escalations.
Checking for circular paths...
    Checked 3 hosts
    Checked 0 service dependencies
    Checked 0 host dependencies
    Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

Testing the configuration

From DRLM server exec runbackup (dummy server is not online)

$ root@DRLM:~# drlm -vD runbackup -c dummy
Disaster Recovery Linux Manager 2.1.0 / Git
Using log file: /var/log/drlm/drlm-DRLM-runbackup.20170217.190926.log
ERROR: drlm:runbackup: Client dummy SSH Server is not available (SSH) aborting ...
Aborting due to an error, check /var/log/drlm/drlm-DRLM-runbackup.20170217.190926.log for details
Terminated

On Nagios

 

On the notifications we see that the mail has been send





 

How to install Nagios 4.2.0 (Debian 8)

There are some places where the next process is explained , but since I’ve pending an article explaining how to configure Nagios Server with nsca-ng I’ve been thinking in post first an easy steps to configure Nagios from scratch. This document describes how to install Nagios Core, Plugins, NRPE from source on Debian 8.

Prerequisites

$ sudo apt-get install wget build-essential apache2 php5 php5-gd libgd-dev unzip postfix

$ sudo wget https://github.com/NagiosEnterprises/nrpe/archive/3.0.tar.gz

$ sudo wget https://assets.nagios.com/downloads/nagioscore/releases/nagios-4.2.0.tar.gz

$ sudo wget http://nagios-plugins.org/download/nagios-plugins-2.1.2.tar.gz

Users & Group

$ useradd nagios

$ groupadd nagcmd

$ usermod -a -G nagios,nagcmd www-data

Nagios Core Installation

$ tar zxvf nagios-4.2.0.tar.gz

$ tar zxvf nagios-plugins-2.1.2.tar.gz

$ cd nagios-4.2.0
$ ./configure --with-command-group=nagcmd --with-httpd-conf=/etc/apache2/
$ make all
$ make install
$ make install-init
$ make install-config
$ make install-commandmode
$ make install-webconf
$ cp -R contrib/eventhandlers/ /usr/local/nagios/libexec/
$ chown -R nagios:nagios /usr/local/nagios/libexec/eventhandlers

Check Installation

$ /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.2.0
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-01-2016
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
     Checked 8 services.
     Checked 1 hosts.
     Checked 1 host groups.
     Checked 0 service groups.
     Checked 1 contacts.
     Checked 1 contact groups.
     Checked 24 commands.
     Checked 5 time periods.
     Checked 0 host escalations.
     Checked 0 service escalations.
Checking for circular paths...
     Checked 1 hosts
     Checked 0 service dependencies
     Checked 0 host dependencies
     Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Configure apache2

Enable rewrite and cgi modules

$ sudo a2enmod rewrite && sudo a2enmod cgi

Copy httpd Template Virtual Host

$ sudo cp sample-config/httpd.conf /etc/apache2/sites-available/nagios4.conf

$ sudo chmod 644 /etc/apache2/sites-available/nagios4.conf

Enable Virtual Host

$ sudo a2ensite nagios

Create user nagiosadmin for web interface

$ sudo htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
New password:
Re-type new password:
Adding password for user nagiosadmin

Restart Apache

$ systemctl restart apache2

Install Nagios Plugins

$ cd nagios-plugins-2.1.2/

$ make

$ make install

Enable & Start Nagios

$ systemctl enable nagios
Created symlink from /etc/systemd/system/multi-user.target.wants/nagios.service to /etc/systemd/system/nagios.service.

$ systemctl start nagios

$ systemctl status nagios

● nagios.service - Nagios
   Loaded: loaded (/etc/systemd/system/nagios.service; enabled)
   Active: active (running) since Fri 2016-08-19 16:50:47 CEST; 23s ago
 Main PID: 21272 (nagios)
   CGroup: /system.slice/nagios.service
           ├─21272 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
           ├─21273 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           ├─21274 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           ├─21275 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           ├─21276 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           └─21277 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg

Aug 19 16:50:47 cervell nagios[21272]: nerd: Fully initialized and ready to rock!
Aug 19 16:50:47 cervell nagios[21272]: wproc: Successfully registered manager as @wproc with query handler
Aug 19 16:50:47 cervell nagios[21272]: wproc: Registry request: name=Core Worker 21276;pid=21276
Aug 19 16:50:47 cervell nagios[21272]: wproc: Registry request: name=Core Worker 21274;pid=21274
Aug 19 16:50:47 cervell nagios[21272]: wproc: Registry request: name=Core Worker 21273;pid=21273
Aug 19 16:50:47 cervell nagios[21272]: wproc: Registry request: name=Core Worker 21275;pid=21275
Aug 19 16:50:47 cervell nagios[21272]: wproc: Registry request: name=Core Worker 21273;pid=21273
Aug 19 16:50:47 cervell nagios[21272]: wproc: Registry request: name=Core Worker 21275;pid=21275
Aug 19 16:50:48 cervell nagios[21272]: Successfully launched command file worker with pid 21277
Aug 19 16:50:48 cervell nagios[21272]: Successfully launched command file worker with pid 21277

Login to the web interface

NRPE

Install NRPE (Server service)

$ sudo apt-get install libssl-dev
$ tar zxvf nrpe-3.0.tar.gz
$ cd nrpe-3.0/
$ ./configure
$ make all
$ make install

Create NRPE service (systemd)

$ make install-init

Create NRPE service (systemd) Manual

service code

$ vi /etc/systemd/system/nrpe.service
[Unit]
Description=NRPE
After=nagios.service

[Install]
WantedBy=multi-user.target

[Service]
Type=simple
PIDFile=/usr/local/nagios/var/nrpe.pid
User=nagios
Group=nagios
ExecStart=/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
ExecStop=/usr/bin/killall /usr/local/nagios/bin/nrpe

Enable NRPE service

$  systemctl enable nrpe
Created symlink from /etc/systemd/system/multi-user.target.wants/nrpe.service to /etc/systemd/system/nrpe.service.

Start service

$  systemctl start nrpe

Check status

$  systemctl status nrpe
● nrpe.service - NRPE
   Loaded: loaded (/etc/systemd/system/nrpe.service; enabled)
   Active: active (running) since Mon 2016-08-22 15:48:12 CEST; 8s ago
 Main PID: 26435 (nrpe)
   CGroup: /system.slice/nrpe.service
           └─26435 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

Aug 22 15:48:12 cervell nrpe[26435]: Starting up daemon
Aug 22 15:48:12 cervell nrpe[26435]: Server listening on 0.0.0.0 port 5666.
Aug 22 15:48:12 cervell nrpe[26435]: Server listening on :: port 5666.
Aug 22 15:48:12 cervell nrpe[26435]: Listening for connections on port 5666
Aug 22 15:48:12 cervell nrpe[26435]: Allowing connections from: 127.0.0.1

Add NRPE on /usr/local/nagios/etc/objectsi/commands.cfg

define command{
        command_name    check_nrpe
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
        }

Config files (For new client)

vi /usr/local/nagios/etc/objects/contacts.cfg

define contact{
        contact_name                    nagiosadmin          ; Short name of user
        use                             generic-contact         ; Inherit default values from generic-contact template (defined above)
        alias                           Nagios Admin         ; Full name of user
        email                           nagiosadmin@whatever.com ; ## <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
        }
define contactgroup{
        contactgroup_name       admins
        alias                   Nagios Administrators
        members                 nagiosadmin
        }

vi /usr/local/nagios/etc/objects/host-service-definitions.cfg

define host{
        name                            Host-krbulan          ## <<***** CHANGE THIS WITH YOUR PREFERED NAME ******
        use                             generic-host
        check_period                    24x7
        check_interval                  5
        retry_interval                  1
        max_check_attempts              10
        check_command                   check-host-alive
        notification_period             workhours
        notification_interval           30
        notification_options            d,u,r
        contact_groups                  admins
        register                        0
        }
define service{
        name                            Service-krbulan      ## <<***** CHANGE THIS WITH YOUR PREFERED NAME ******
        active_checks_enabled           1
        passive_checks_enabled          1
        parallelize_check               1
        obsess_over_service             1
        check_freshness                 0
        notifications_enabled           1
        event_handler_enabled           1
        flap_detection_enabled          1
        process_perf_data               1
        retain_status_information       1
        retain_nonstatus_information    1
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           3
        retry_check_interval            2
        contact_groups                  admins
        notification_options            w,u,c,r
        notification_interval           60
        notification_period             24x7
         register                        0
        }

vi /usr/local/nagios/etc/objects/hostgroup.cfg

define hostgroup {
     hostgroup_name  Krbulan-Servers                          ## <<***** CHANGE THIS WITH YOUR PREFERED NAME ******
     alias           Servidors Krbu CPD
     members         wopr
}

vi /usr/local/nagios/etc/objects/wopr.cfg # This client name is wopr, you can change this with your client name

define host{
   use Host-krbulan                                           ## <<***** CHANGE THIS WITH YOUR NAME DEFINED IN host-service-definitions.cfg ******
   host_name wopr
   hostgroups  Krbulan-Servers                                                                ## <<***** CHANGE THIS WITH YOUR NAME DEFINED IN hostgroup.cfg ******
   alias wopr
   address 192.168.1.10                                                                               ## <<***** CHANGE THIS WITH YOUR IP ******
}

define service{
   use Service-krbulan                                                                                ## <<***** CHANGE THIS WITH YOUR NAME DEFINED IN hostgroup.cfg ******
   host_name wopr
   service_description Current Load
   check_command check_nrpe!check_load
}

define service{
   use Service-krbulan
   host_name wopr
   service_description Total Processes
   check_command check_nrpe!check_total_procs
}

define service {
   use Service-krbulan
   host_name wopr
   service_description Memoria
   check_command check_nrpe!check_memory
}




 

El Equipo de Brain Updaters en LinuxCon Europe – Resumen día 1

Hoy ha sido el primer día en la LinuxCon Europe 2015 para el Equipo de Brain Updaters, este año celebrada en Dublín.
Des de la salida del sol hasta prácticamente su ocaso hemos asistido a interesantes charlas de varios temas, descubierto algunos nuevos proyectos OpenSource y conversado con algunos asistentes de las conferencias.
Cabe destacar que ha sido un día muy provechoso, donde hemos adquirido conocimientos interesantes que seguro nos serán de gran utilidad para el proyecto DRLM y para nuestro día a día como consultores.
Por ejemplo, la charla «Read The F* Manual? Write a Better F* Manual» impartida por Rich Bowen (RedHat), con mas de 20 años de experiencia realizando documentaciones para Apache y RedHat, nos ha dado a conocer cuales son las “best practices” a la hora de realizar una documentación y como hacer que la comunidad se involucre en su desarrollo. Sin duda alguna nos será de gran utilidad para mejorar la documentación del proyecto DRLM. Esta charla, que a prior nos parecía como la «menos» atractiva dentro de nuestra selección, nos ha sorprendido gratamente y finalmente ha sido la mejor por unanimidad en el equipo. Realmente muy didáctica e interesante!
También hemos aprendido algún que otro “truco” que seguro que usaremos programando en Bash y hemos aclarado algunas dudas en lo que se refiere al comportamiento del Shell con la charla “Introduction to Advanced Bash Usage” impartida por James Pannacciulli (Media Temple), que aún siendo la última de las charlas del día y estando ya un poco cansados y saturados de tanta información, nos ha pasado rápida.
Sin querer dejar la temática de Shell hemos asistido a una charla muy didáctica para hacer debugging de Bash en la conferencia «Use ‘strace’ To Understand Your Shell (BASH)» impartida por Harald König (Bosch Sensortec GmbH), asi que empezaremos a usar strace para nuestros debugs del código de DRLM y otros desarrollos en Bash que hacemos.
Otra charla que nos ha gustado ha sido la de «Enhance OpenSSH for Fun and Security», impartida por Julien Pivotto (Inuits), en la que a medida que Julien exponia el contenido de su conferencia y casi sin darnos cuenta, nos íbamos poniendo «deberes» para investigar algunas ideas que iban surgiendo aplicables al proyecto DRLM y en nuestro trabajo del día a día con SSH.
En resumen un gran día para el equipo de BrainUpdaters y para el proyecto DRLM!
Por poner una pega, no nos ha gustado nada, que en varias ocasiones y durante las charlas, han anunciado por megafonía interrumpiendo a los ponentes y asistentes, que IBM (Diamond Sponsor) estaba haciendo algo en algún lugar del edificio. Entendemos que es el Sponsor principal, pero interrumpir las charlas de los ponentes de las conferencias no nos parece nada agradable.
Este ha sido el resumen de un gran primer día en la LinuxCon. Esperemos que mañana sea, por lo menos, tan bueno como hoy!!!
– El equipo de BU desde Dublún –

Hoy 6 de Octubre el equipo de Brain Updaters asistirá a las siguientes conferencias:

 

– A Beautiful Build: Releasing Linux Source Correctly – Bradley Kuhn, Software Freedom Conservancy
– System Recovery with BTRFS and Snapshots / Rollback – Thorsten Kukok, SUSE
– Tutorial: Learning the Basics of Buildroot – Thomas Petazzoni, Free Electrons
– oVirt Integration With Foreman And Katello – Bringing Your Virtualized Data-Center Into The Next Level – Yaniv Bronheim, Red Hat
– The Devil Wears RPM: Continuous Security Integration – Ikey Doherty, Intel
– BoFs: MinnowBoard