# cd /usr/local/nagios/etc
# ls -l
-rw-rw-r-- 1 nagios nagios 12999 Apr 24 21:55 cgi.cfg
-rw-r--r-- 1 root root 50 May 12 11:55 htpasswd.users
-rw-rw-r-- 1 nagios nagios 44868 May 12 14:46 nagios.cfg
drwxrwxr-x 2 nagios nagios 4096 May 14 01:22 objects
-rw-rw---- 1 nagios nagios 1312 Apr 24 21:55 resource.cfg
nagios.cfg is the main configuration file of Nagios. It contains global parameters and is used to include other user customized configuration files. e.g.
cfg_file=/usr/local/nagios/etc/objects/commands.cfg
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/objects/templates.cfg
# Definitions for monitoring the local (Linux) host
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
Let's get started by example:
First, we define something for Nagios to montior. The basic unit is a host, which may have many services
/usr/local/nagios/etc/objects/localhost.cfg
define host{
use linux-server ; Name of host template to use
; This host definition will inherit all variables that are defined
; in (or inherited by) the linux-server host template definition.
host_name localhost
alias localhost
address 127.0.0.1
}
define service{
use local-service ; Name of service template to use
host_name localhost
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
The highlighted part statement tells the host and service to use templates defined in templates.cfg, so let's have a look. Note dhat "linux-server" itself is the child of another template "generic-host"
/usr/local/nagios/etc/objects/templates.cfg
define host{
name generic-host ; The name of this host template
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is
enabled
flap_detection_enabled 1 ; Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
notification_period 24x7 ; Send host notifications at any time
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
define host{
name linux-server ; The name of this host template
use generic-host ; This template inherits other values from the generic-host template
check_period 24x7 ; By default, Linux hosts are checked round the clock
check_interval 5 ; Actively check the host every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 0 ; Check each Linux host 10 times (max)
check_command check-host-alive ; Default command to check Linux hosts
notification_period workhours ; Linux admins hate to be woken up, so we only notify during the day
; Note that the notification_period variable is being overridden from
; the value that is inherited from the generic-host template!
notification_interval 120 ; Resend notifications every 2 hours
notification_options d,u,r ; Only send notifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
What a template does is to define common parameters that would be used over and over again by many hosts and services. So, instead of including these parameter in every host and service definition, we create a template. The template basically tells Nagios how and how often to check on the host or service, and what to do in case there is a state change. Most parameters are pretty self-explanatory, for example, "check_period 24x7" and "check_interval 5" is saying this host should be monitored 24 hours a day, 7 days a week, and Nagios should check on the host every 5 minutes.
The paramters below may not be obvious on how they work, so I will talk more about them
"notification_options" - In which situations should Nagios send out notifications? If we don't specify any, Nagios will send out notifications in all situations, but sometimes that may not be what we wanted. So in the example above, "d,u,r" would mean "send me notifications when host is DOWN, UNREACHABLE, and RECOVER from d or u". Flapping means the host/service is flapping between bad(d,u) and good(r), we would probably talk more about that later.
d = DOWN state
u = UNREACHABLE state
r = recoveries (OK state)
f = starts and stops flapping
s = scheduled downtime starts and ends
n (none) as an option, no host notifications will be sent out
What a template does is to define common parameters that would be used over and over again by many hosts and services. So, instead of including these parameter in every host and service definition, we create a template. The template basically tells Nagios how and how often to check on the host or service, and what to do in case there is a state change. Most parameters are pretty self-explanatory, for example, "check_period 24x7" and "check_interval 5" is saying this host should be monitored 24 hours a day, 7 days a week, and Nagios should check on the host every 5 minutes.
The paramters below may not be obvious on how they work, so I will talk more about them
"notification_options" - In which situations should Nagios send out notifications? If we don't specify any, Nagios will send out notifications in all situations, but sometimes that may not be what we wanted. So in the example above, "d,u,r" would mean "send me notifications when host is DOWN, UNREACHABLE, and RECOVER from d or u". Flapping means the host/service is flapping between bad(d,u) and good(r), we would probably talk more about that later.
d = DOWN state
u = UNREACHABLE state
r = recoveries (OK state)
f = starts and stops flapping
s = scheduled downtime starts and ends
n (none) as an option, no host notifications will be sent out
check_command check-host-alive
This is the command Nagios would call to determine the host's state. To find out what it does, we would have to look at another configuraiton file - commands.cfg
define command{
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}
Ok, what is "$USER1$"? What is "check_ping"? etc... Again, we would need to yet look at another configuration file - resource.cfg, which is quite simple:
$USER1$=/usr/local/nagios/libexec
Let's now run the command: /usr/local/nagios/libexec/check_ping
# /usr/local/nagios/libexec/check_ping
check_ping: Could not parse arguments
Usage:
check_ping -H <host_address> -w <wrta>,<wpl>% -c <crta>,<cpl>% [-p packets] [-t timeout] [-4|-6]
This is the command Nagios would call to determine the host's state. To find out what it does, we would have to look at another configuraiton file - commands.cfg
define command{
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}
Ok, what is "$USER1$"? What is "check_ping"? etc... Again, we would need to yet look at another configuration file - resource.cfg, which is quite simple:
$USER1$=/usr/local/nagios/libexec
Let's now run the command: /usr/local/nagios/libexec/check_ping
# /usr/local/nagios/libexec/check_ping
check_ping: Could not parse arguments
Usage:
check_ping -H <host_address> -w <wrta>,<wpl>% -c <crta>,<cpl>% [-p packets] [-t timeout] [-4|-6]
In Nagios, you may set WARNING and CRITICAL when there is problems detected, so in most commands, -w usually means warning criteria, -c means critical criteria. When there is time unit involved, usually it would be in ms. In check_ping, rta is "rta" is round trip average, '"pl" is packet loss. So let's get back to the command_line
$USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
This would mean we ping the host 5 times (-p 5), and mark "warning" if rta is > 3000ms or there is 80% packet loss; mark critical if rta>5000ms or there is 100% packet loss.
notification_period workhours
This time we goto "timeperiods.cfg". You would find a few examples in this file, such as work hours, specific holidays etc.define timeperiod{
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
contact_groups admins
Contact is how Nagios notify you when there are state changes. Let's have a look at contacts.cfg
define contact{
contact_name nagiosadmin ; Short name of user
use generic-contact ; this is from templates.cfg
alias Nagios Admin ; Full name of user
email your_email_address@your_domain
}
define contactgroup{
contactgroup_name admins
alias Nagios Administrators
members nagiosadmin
}
NOTE that "generic-contact" is in templates.cfg
define contact{
name generic-contact ; The name of this contact template
service_notification_period 24x7 ; service notifications can be sent anytime
host_notification_period 24x7 ; host notifications can be sent anytime
service_notification_options w,u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events
host_notification_options d,u,r,f,s ; send notifications for all host states, flapping events, and scheduled downtime events
service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-email
register 0
}
NOTE that "notify-host-by-email" and "notify-service-by-email" are in commands.cfg. These are simply using the "/bin/mail" command that comes with the OS to send out the emails. You can certinaly use other means to send out the notifications other than email. For instance, we can talk about how to use Telegram to send out the alarms.
define command{
command_name notify-host-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
}
# 'notify-service-by-email' command definition
define command{
command_name notify-service-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
}
Not sure if you are already feeling a bit dizzy as we are always jumping around configuration files... I got that feeling at first too, but once you get used to the templated type of configuration, it is actually not that difficult. Next, we will use more real example as I found it the esaier way to learn Nagios.