Table of Contents
Nagios Core Configuration - host, hostgrop, hostdependency, hostescalation
In the case that you have followed steps mentioned at installation of the Nagios Core document at this web page the predefined configuration file with host related objects are located at:
/opt/nagios-<VERSION>/etc/objects/localhost.cfg
/opt/nagios-<VERSION>/etc/objects/printer.cfg
/opt/nagios-<VERSION>/etc/objects/switch.cfg
/opt/nagios-<VERSION>/etc/objects/windows.cfg
Preface
Nagios Core need to know basic information about monitored node. Like the IP, FQDN, … . According to this “host” object is used to configure all node related information so that they can be used at Nagios Core (for example in service definition).
Mentioned “host” objects is possible to group and to objects that include larger part of the monitored infrastructure. It’s possible also to configure relation between monitored “host” objects. In this way Nagios Core will be able to do event correlation, to reduce load on support team.
hosts
Description
One of the main objects used at Nagios Core is probably “host” object. Usually it`s describing monitored node and providing information related to this node. Mentioned information can be used at “host” related objects of Nagios Core (like related “command” or “service” objects).
Definition of “host” object is including as well “command” that will be used for “host” object “up/down” monitoring. Commonly it is ICMP test of node availability, but it’s possible to customize it.
Location
Nagios Core is providing to you already predefined set of “host” objects that is possible to copy and modify to for your node.
Located at (in the case that you have followed the installation of Nagios Core described on this website):
/opt/nagios-<VERSION>/etc/objects/localhost.cfg
/opt/nagios-<VERSION>/etc/objects/printer.cfg
/opt/nagios-<VERSION>/etc/objects/switch.cfg
/opt/nagios-<VERSION>/etc/objects/windows.cfg
Location Customization
In the case that you prefer to use your own configuration file, to store your customized configuration it is possible to define path to your configuration file.
In this case Nagios Core need to know where to search for the customized configuration file.
According to this it is required to update the main Nagios Core configuration file – “nagios.cfg”
It is possible to specify:
cfg_file=/<path>/<to>/<your>/<config>/<file> # Direct path to you customized configuration file cfg_dir=/<path>/<to>/<your>/<config>/<dir> # Path to the directory where to search for the config file.
Official documentation
In most of my documents I’m preventing to copying of the official documentation. On another hand I think at this point it is really handy as I will not reinvent the wheel.
Description:
A host definition is used to define a physical server, workstation, device, etc. that resides on your network.
Definition Format:
define host{ host_name host_name # Mandatory parameter alias alias # Mandatory parameter display_name display_name address address parents host_names hourly_value # hostgroups hostgroup_names check_command command_name initial_state [o,d,u] max_check_attempts # # Mandatory parameter check_interval # retry_interval # active_checks_enabled [0/1] passive_checks_enabled [0/1] check_period timeperiod_name # Mandatory parameter obsess_over_host|obsess [0/1] check_freshness [0/1] freshness_threshold # event_handler command_name event_handler_enabled [0/1] low_flap_threshold # high_flap_threshold # flap_detection_enabled [0/1] flap_detection_options [o,d,u] process_perf_data [0/1] retain_status_information [0/1] retain_nonstatus_information [0/1] contacts contacts # Mandatory parameter contact_groups contact_groups # Mandatory parameter notification_interval # first_notification_delay # notification_period timeperiod_name # Mandatory parameter notification_options [d,u,r,f,s] notifications_enabled [0/1] stalking_options [o,d,u] notes note_string notes_url url action_url url icon_image image_file icon_image_alt alt_string vrml_image image_file statusmap_image image_file 2d_coords x_coord,y_coord 3d_coords x_coord,y_coord,z_coord }
Directive Descriptions:
host_name: | This directive is used to define a short name used to identify the host. It is used in host group and service definitions to reference this particular host. Hosts can have multiple services (which are monitored) associated with them. When used properly, the $HOSTNAME$ macro will contain this short name. |
alias: | This directive is used to define a longer name or description used to identify the host. It is provided in order to allow you to more easily identify a particular host. When used properly, the $HOSTALIAS$ macro will contain this alias/description. |
address: | This directive is used to define the address of the host. Normally, this is an IP address, although it could really be anything you want (so long as it can be used to check the status of the host). You can use a FQDN to identify the host instead of an IP address, but if DNS services are not available this could cause problems. % When used properly, the $HOSTADDRESS$ macro will contain this address. Note: If you do not specify an address directive in a host definition, the name of the host will be used as its address. A word of caution about doing this, however - if DNS fails, most of your service checks will fail because the plugins will be unable to resolve the host name. |
display_name: | This directive is used to define an alternate name that should be displayed in the web interface for this host. If not specified, this defaults to the value you specify for the host_name directive. Note: The current CGIs do not use this option, although future versions of the web interface will. |
parents: | This directive is used to define a comma-delimited list of short names of the "parent" hosts for this particular host. Parent hosts are typically routers, switches, firewalls, etc. that lie between the monitoring host and a remote hosts. A router, switch, etc. which is closest to the remote host is considered to be that host's "parent". Read the "Determining Status and Reachability of Network Hosts" document located here for more information. If this host is on the same network segment as the host doing the monitoring (without any intermediate routers, etc.) the host is considered to be on the local network and will not have a parent host. Leave this value blank if the host does not have a parent host (i.e. it is on the same segment as the Nagios host). The order in which you specify parent hosts has no effect on how things are monitored. |
hourly_value: | This directive is used to represent the value of the host to your organization. The value is currently used when determining whether to send notifications to a contact. If the host's hourly value plus the hourly values of all of the host's services is greater than or equal to the contact's minimum value, the contact will be notified. For example, you could set this value and the minimum value of contacts such that a system administrator would be notified when a development server goes down, but the CIO would only be notified when the company's production ecommerce database server was down. The value could also be used as a sort criteria when generating reports or for calculating a good system administrator's bonus. The hourly value defaults to zero. |
hostgroups: | This directive is used to identify the short name(s) of the hostgroup(s) that the host belongs to. Multiple hostgroups should be separated by commas. This directive may be used as an alternative to (or in addition to) using the members directive in hostgroup definitions. |
check_command: | This directive is used to specify the short name of the command that should be used to check if the host is up or down. Typically, this command would try and ping the host to see if it is "alive". The command must return a status of OK (0) or Nagios will assume the host is down. If you leave this argument blank, the host will not be actively checked. Thus, Nagios will likely always assume the host is up (it may show up as being in a "PENDING" state in the web interface). This is useful if you are monitoring printers or other devices that are frequently turned off. The maximum amount of time that the notification command can run is controlled by the host_check_timeout option. |
initial_state: | By default Nagios will assume that all hosts are in UP states when it starts. You can override the initial state for a host by using this directive. Valid options are: o = UP, d = DOWN, and u = UNREACHABLE. |
max_check_attempts: | This directive is used to define the number of times that Nagios will retry the host check command if it returns any state other than an OK state. Setting this value to 1 will cause Nagios to generate an alert without retrying the host check. Note: If you do not want to check the status of the host, you must still set this to a minimum value of 1. To bypass the host check, just leave the check_command option blank. |
check_interval: | This directive is used to define the number of "time units" between regularly scheduled checks of the host. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation. |
retry_interval: | This directive is used to define the number of "time units" to wait before scheduling a re-check of the hosts. Hosts are rescheduled at the retry interval when they have changed to a non-UP state. Once the host has been retried max_check_attempts times without a change in its status, it will revert to being scheduled at its "normal" rate as defined by the check_interval value. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation. |
active_checks_enabled *: | This directive is used to determine whether or not active checks (either regularly scheduled or on-demand) of this host are enabled. Values: 0 = disable active host checks, 1 = enable active host checks (default). |
passive_checks_enabled *: | This directive is used to determine whether or not passive checks are enabled for this host. Values: 0 = disable passive host checks, 1 = enable passive host checks (default). |
check_period: | This directive is used to specify the short name of the time period during which active checks of this host can be made. |
obsess_over_host / obsess *: | This directive determines whether or not checks for the host will be "obsessed" over using the ochp_command. |
check_freshness *: | This directive is used to determine whether or not freshness checks are enabled for this host. Values: 0 = disable freshness checks, 1 = enable freshness checks (default). |
freshness_threshold: | This directive is used to specify the freshness threshold (in seconds) for this host. If you set this directive to a value of 0, Nagios will determine a freshness threshold to use automatically. |
event_handler: | This directive is used to specify the short name of the command that should be run whenever a change in the state of the host is detected (i.e. whenever it goes down or recovers). Read the documentation on event handlers for a more detailed explanation of how to write scripts for handling events. The maximum amount of time that the event handler command can run is controlled by the event_handler_timeout option. |
event_handler_enabled *: | This directive is used to determine whether or not the event handler for this host is enabled. Values: 0 = disable host event handler, 1 = enable host event handler. |
low_flap_threshold: | This directive is used to specify the low state change threshold used in flap detection for this host. More information on flap detection can be found here. If you set this directive to a value of 0, the program-wide value specified by the low_host_flap_threshold directive will be used. |
high_flap_threshold: | This directive is used to specify the high state change threshold used in flap detection for this host. More information on flap detection can be found here. If you set this directive to a value of 0, the program-wide value specified by the high_host_flap_threshold directive will be used. |
flap_detection_enabled *: | This directive is used to determine whether or not flap detection is enabled for this host. More information on flap detection can be found here. Values: 0 = disable host flap detection, 1 = enable host flap detection. |
flap_detection_options: | This directive is used to determine what host states the flap detection logic will use for this host. Valid options are a combination of one or more of the following: o = UP states, d = DOWN states, u = UNREACHABLE states. |
process_perf_data *: | This directive is used to determine whether or not the processing of performance data is enabled for this host. Values: 0 = disable performance data processing, 1 = enable performance data processing. |
retain_status_information: | This directive is used to determine whether or not status-related information about the host is retained across program restarts. This is only useful if you have enabled state retention using the retain_state_information directive. Value: 0 = disable status information retention, 1 = enable status information retention. |
retain_nonstatus_information: | This directive is used to determine whether or not non-status information about the host is retained across program restarts. This is only useful if you have enabled state retention using the retain_state_information directive. Value: 0 = disable non-status information retention, 1 = enable non-status information retention. |
contacts: | This is a list of the short names of the contacts that should be notified whenever there are problems (or recoveries) with this host. Multiple contacts should be separated by commas. Useful if you want notifications to go to just a few people and don't want to configure contact groups. You must specify at least one contact or contact group in each host definition. |
contact_groups: | This is a list of the short names of the contact groups that should be notified whenever there are problems (or recoveries) with this host. Multiple contact groups should be separated by commas. You must specify at least one contact or contact group in each host definition. |
notification_interval: | This directive is used to define the number of "time units" to wait before re-notifying a contact that this host is still down or unreachable. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. If you set this value to 0, Nagios will not re-notify contacts about problems for this host - only one problem notification will be sent out. |
first_notification_delay: | This directive is used to define the number of "time units" to wait before sending out the first problem notification when this host enters a non-UP state. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. If you set this value to 0, Nagios will start sending out notifications immediately. |
notification_period: | This directive is used to specify the short name of the time period during which notifications of events for this host can be sent out to contacts. If a host goes down, becomes unreachable, or recoveries during a time which is not covered by the time period, no notifications will be sent out. |
notification_options: | This directive is used to determine when notifications for the host should be sent out. Valid options are a combination of one or more of the following: d = send notifications on a DOWN state, u = send notifications on an UNREACHABLE state, r = send notifications on recoveries (OK state), f = send notifications when the host starts and stops flapping, and s = send notifications when scheduled downtime starts and ends. If you specify n (none) as an option, no host notifications will be sent out. If you do not specify any notification options, Nagios will assume that you want notifications to be sent out for all possible states. Example: If you specify d,r in this field, notifications will only be sent out when the host goes DOWN and when it recovers from a DOWN state. |
notifications_enabled *: | This directive is used to determine whether or not notifications for this host are enabled. Values: 0 = disable host notifications, 1 = enable host notifications. |
stalking_options: | This directive determines which host states "stalking" is enabled for. Valid options are a combination of one or more of the following: o = stalk on UP states, d = stalk on DOWN states, and u = stalk on UNREACHABLE states. More information on state stalking can be found here. |
notes: | This directive is used to define an optional string of notes pertaining to the host. If you specify a note here, you will see the it in the extended information CGI (when you are viewing information about the specified host). |
notes_url: | This variable is used to define an optional URL that can be used to provide more information about the host. If you specify an URL, you will see a red folder icon in the CGIs (when you are viewing host information) that links to the URL you specify here. Any valid URL can be used. If you plan on using relative paths, the base path will the the same as what is used to access the CGIs (i.e. /cgi-bin/nagios/). This can be very useful if you want to make detailed information on the host, emergency contact methods, etc. available to other support staff. |
action_url: | This directive is used to define an optional URL that can be used to provide more actions to be performed on the host. If you specify an URL, you will see a red "splat" icon in the CGIs (when you are viewing host information) that links to the URL you specify here. Any valid URL can be used. If you plan on using relative paths, the base path will the the same as what is used to access the CGIs (i.e. /cgi-bin/nagios/). |
icon_image: | This variable is used to define the name of a GIF, PNG, or JPG image that should be associated with this host. This image will be displayed in the various places in the CGIs. The image will look best if it is 40×40 pixels in size. Images for hosts are assumed to be in the logos/ subdirectory in your HTML images directory (i.e. /usr/local/nagios/share/images/logos). |
icon_image_alt: | This variable is used to define an optional string that is used in the ALT tag of the image specified by the <icon_image> argument. |
vrml_image: | This variable is used to define the name of a GIF, PNG, or JPG image that should be associated with this host. This image will be used as the texture map for the specified host in the statuswrl CGI. Unlike the image you use for the <icon_image> variable, this one should probably not have any transparency. If it does, the host object will look a bit wierd. Images for hosts are assumed to be in the logos/ subdirectory in your HTML images directory (i.e. /usr/local/nagios/share/images/logos). |
statusmap_image: | This variable is used to define the name of an image that should be associated with this host in the statusmap CGI. You can specify a JPEG, PNG, and GIF image if you want, although I would strongly suggest using a GD2 format image, as other image formats will result in a lot of wasted CPU time when the statusmap image is generated. GD2 images can be created from PNG images by using the pngtogd2 utility supplied with Thomas Boutell's gd library. The GD2 images should be created in uncompressed format in order to minimize CPU load when the statusmap CGI is generating the network map image. The image will look best if it is 40×40 pixels in size. You can leave these option blank if you are not using the statusmap CGI. Images for hosts are assumed to be in the logos/ subdirectory in your HTML images directory (i.e. /usr/local/nagios/share/images/logos). |
2d_coords: | This variable is used to define coordinates to use when drawing the host in the statusmap CGI. Coordinates should be given in positive integers, as they correspond to physical pixels in the generated image. The origin for drawing (0,0) is in the upper left hand corner of the image and extends in the positive x direction (to the right) along the top of the image and in the positive y direction (down) along the left hand side of the image. For reference, the size of the icons drawn is usually about 40×40 pixels (text takes a little extra space). The coordinates you specify here are for the upper left hand corner of the host icon that is drawn. Note: Don't worry about what the maximum x and y coordinates that you can use are. The CGI will automatically calculate the maximum dimensions of the image it creates based on the largest x and y coordinates you specify. |
3d_coords: | This variable is used to define coordinates to use when drawing the host in the statuswrl CGI. Coordinates can be positive or negative real numbers. The origin for drawing is (0.0,0.0,0.0). For reference, the size of the host cubes drawn is 0.5 units on each side (text takes a little more space). The coordinates you specify here are used as the center of the host cube. |
hostgrop
Description
Sometime it’s really handy to group “host” objects in to larger group, so that you can maintain them as one object. With help of “hostgroup” object at Nagios Core it’s possible to create one object that include several “host” or another “hostgroup” objects.
Nice example where to use it, is the case that you’re looking for possibility to group all nodes based on the platform.
For example in this hierarchy:
All_Hosts - - Servers - - - - MS_Windows - - - - Linux - - - - UX - - - - BSD - - Network - - - - Cisco - - - - HP Procure - - - - H3C - - - - Ruby - - Another
As well another use case is to group all nodes based on location, so that you’ll be able to create several “hotgroups” including another “hostgroup” or “hosts”.
For example in this hierarchy:
Customer (will include all “hostgroups” on “Region” level) - - Region (will include all “hostgroups” on “Country” level) - - - - Country (will include all “hostgroups” on “Town” level) - - - - - - Town (will include all “hostgroups” on “Street” level) - - - - - - - - Street (will include all “hostgroups” on “Building” level) - - - - - - - - - - Building (will include all “hostgroups” on “Room” level) - - - - - - - - - - - - Room (will include all “hostgroups” on “Rack” level) - - - - - - - - - - - - - - Rack (will include all “hosts” mounted in this “Rack”) - - - - - - - - - - - - - - - - Host
Location Customization
In the case that you prefer to use your own configuration file, to store your customized configuration it is possible to define path to your configuration file.
In this case Nagios Core need to know where to search for the customized configuration file.
According to this it is required to update the main Nagios Core configuration file – “nagios.cfg” \It is possible to specify:
cfg_file=/<path>/<to>/<your>/<config>/<file> # Direct path to you customized configuration file cfg_dir=/<path>/<to>/<your>/<config>/<dir> # Path to the directory where to search for the config file.
Official documentation
In most of my documents I’m preventing to copying of the official documentation. On another hand I think at this point it is really handy as I will not reinvent the wheel.
Description:
A host group definition is used to group one or more hosts together for simplifying configuration with object tricks or display purposes in the CGIs.
Definition Format:
define hostgroup{ hostgroup_name hostgroup_name # Mandatory parameter alias alias # Mandatory parameter members hosts hostgroup_members hostgroups notes note_string notes_url url action_url url }
Directive Descriptions:
hostgroup_name: | This directive is used to define a short name used to identify the host group. |
alias: | This directive is used to define is a longer name or description used to identify the host group. It is provided in order to allow you to more easily identify a particular host group. |
members: | This is a list of the short names of hosts that should be included in this group. Multiple host names should be separated by commas. This directive may be used as an alternative to (or in addition to) the hostgroups directive in host definitions. |
hostgroup_members: | This optional directive can be used to include hosts from other "sub" host groups in this host group. Specify a comma-delimited list of short names of other host groups whose members should be included in this group. |
notes: | This directive is used to define an optional string of notes pertaining to the host. If you specify a note here, you will see the it in the extended information CGI (when you are viewing information about the specified host). |
notes_url: | This variable is used to define an optional URL that can be used to provide more information about the host group. If you specify an URL, you will see a red folder icon in the CGIs (when you are viewing hostgroup information) that links to the URL you specify here. Any valid URL can be used. If you plan on using relative paths, the base path will the the same as what is used to access the CGIs (i.e. /cgi-bin/nagios/). This can be very useful if you want to make detailed information on the host group, emergency contact methods, etc. available to other support staff. |
action_url: | This directive is used to define an optional URL that can be used to provide more actions to be performed on the host group. If you specify an URL, you will see a red "splat" icon in the CGIs (when you are viewing hostgroup information) that links to the URL you specify here. Any valid URL can be used. If you plan on using relative paths, the base path will the the same as what is used to access the CGIs (i.e. /cgi-bin/nagios/). |
hostdependency
Description
When Nagios Core is providing monitoring for larger infrastructure, it’s required in case of failure detection to do root cause analyze. In this way we can reduce load on the support team and focus on the main issues.
To configure topology based correlation between “host” objects is possible to use “hostdependency” object of Nagios Core.
In this case Nagios Core will correlate events based on configured topology relations. Instead of setting alarms for all “host” behind the affected node, Nagios Core will use “Unreachable” state for all “host” object behind of the affected “host”. According to this it’s required to configure “host” objects to prevent notification when “host” has “Unreachable” status.
Location Customization
In the case that you prefer to use your own configuration file, to store your customized configuration it is possible to define path to your configuration file.
In this case Nagios Core need to know where to search for the customized configuration file.
According to this it is required to update the main Nagios Core configuration file – “nagios.cfg”
It is possible to specify:
cfg_file=/<path>/<to>/<your>/<config>/<file> # Direct path to you customized configuration file cfg_dir=/<path>/<to>/<your>/<config>/<dir> # Path to the directory where to search for the config file.
Official documentation
In most of my documents I’m preventing to copying of the official documentation. On another hand I think at this point it is really handy as I will not reinvent the wheel.
Description:
Host dependencies are an advanced feature of Nagios that allow you to suppress notifications for hosts based on the status of one or more other hosts. Host dependencies are optional and are mainly targeted at advanced users who have complicated monitoring setups. More information on how host dependencies work (read this!) can be found here.
Definition Format:
define hostdependency{ dependent_host_name host_name # Mandatory parameter dependent_hostgroup_name hostgroup_name host_name host_name # Mandatory parameter hostgroup_name hostgroup_name inherits_parent [0/1] execution_failure_criteria [o,d,u,p,n] notification_failure_criteria [o,d,u,p,n] dependency_period timeperiod_name }
Directive Descriptions:
dependent_host_name: | This directive is used to identify the short name(s) of the dependent host(s). Multiple hosts should be separated by commas. |
dependent_hostgroup_name: | This directive is used to identify the short name(s) of the dependent hostgroup(s). Multiple hostgroups should be separated by commas. The dependent_hostgroup_name may be used instead of, or in addition to, the dependent_host_name directive. |
host_name: | This directive is used to identify the short name(s) of the host(s) that is being depended upon (also referred to as the master host). Multiple hosts should be separated by commas. |
hostgroup_name: | This directive is used to identify the short name(s) of the hostgroup(s) that is being depended upon (also referred to as the master host). Multiple hostgroups should be separated by commas. The hostgroup_name may be used instead of, or in addition to, the host_name directive. |
inherits_parent: | This directive indicates whether or not the dependency inherits dependencies of the host that is being depended upon (also referred to as the master host). In other words, if the master host is dependent upon other hosts and any one of those dependencies fail, this dependency will also fail. |
execution_failure_criteria: | This directive is used to specify the criteria that determine when the dependent host should not be actively checked. If the master host is in one of the failure states we specify, the dependent host will not be actively checked. Valid options are a combination of one or more of the following (multiple options are separated with commas): o = fail on an UP state, d = fail on a DOWN state, u = fail on an UNREACHABLE state, and p = fail on a pending state (e.g. the host has not yet been checked). If you specify n (none) as an option, the execution dependency will never fail and the dependent host will always be actively checked (if other conditions allow for it to be). Example: If you specify u,d in this field, the dependent host will not be actively checked if the master host is in either an UNREACHABLE or DOWN state. |
notification_failure_criteria: | This directive is used to define the criteria that determine when notifications for the dependent host should not be sent out. If the master host is in one of the failure states we specify, notifications for the dependent host will not be sent to contacts. Valid options are a combination of one or more of the following: o = fail on an UP state, d = fail on a DOWN state, u = fail on an UNREACHABLE state, and p = fail on a pending state (e.g. the host has not yet been checked). If you specify n (none) as an option, the notification dependency will never fail and notifications for the dependent host will always be sent out. Example: If you specify d in this field, the notifications for the dependent host will not be sent out if the master host is in a DOWN state. |
dependency_period: | This directive is used to specify the short name of the time period during which this dependency is valid. If this directive is not specified, the dependency is considered to be valid during all times. |
hostescalation
Description
The main idea of “hostescalaton” object at Nagios Core is to configure automated escalation process for detected issues.
For example it can be used for automated escalation of the issues based on the hierarchy of delivery model.
- 1st detection of the issue ,send notification to 1st level support team
- 10th detection of the issue, send notification to 2nd level support team
- 20th detection of the issue, send notification to 3th level support team
- 30th detection of the issue, send notification to Escalation Manager
Any way please do not use it in this way.
- In the case that 1st level team has already started investigation, probably they have already contacted relevant vendors.
- After some time the same alarm will be sent to 2nd and later on to 3th level support team.
- This 2nd and 3th level team will need to start the whole investigation from scratch, instead of continue to work on the case with the information that has 1st level support team has already collected.
On another hand “hostescalation” object is providing us the possibility to automatize fixing of some issues. In this case we can run script that will try to fix the issue, instead of sending notification at 1st detection of the issue. In case that the issue will be still present during next polling period we can sent notification to responsible team.
In this way it is possible to automate some kind of host related issues. Like in the case that we are running a Virtual Server we can try to start him before we will send “Host Down” alarm.
Location Customization
In the case that you prefer to use your own configuration file, to store your customized configuration it is possible to define path to your configuration file.
In this case Nagios Core need to know where to search for the customized configuration file.
According to this it is required to update the main Nagios Core configuration file – “nagios.cfg“ It is possible to specify:
cfg_file=/<path>/<to>/<your>/<config>/<file> # Direct path to you customized configuration file cfg_dir=/<path>/<to>/<your>/<config>/<dir> # Path to the directory where to search for the config file.
Official documentation
In most of my documents I’m preventing to copying of the official documentation. On another hand I think at this point it is really handy as I will not reinvent the wheel.
Description:
Host escalations are completely optional and are used to escalate notifications for a particular host. More information on how notification escalations work can be found here.
Definition Format:
define hostescalation{ host_name host_name # Mandatory parameter hostgroup_name hostgroup_name contacts contacts # Mandatory parameter contact_groups contactgroup_name # Mandatory parameter first_notification # # Mandatory parameter last_notification # # Mandatory parameter notification_interval # # Mandatory parameter escalation_period timeperiod_name escalation_options [d,u,r] }
Directive Descriptions:
host_name: | This directive is used to identify the short name of the host that the escalation should apply to. |
hostgroup_name: | This directive is used to identify the short name(s) of the hostgroup(s) that the escalation should apply to. Multiple hostgroups should be separated by commas. If this is used, the escalation will apply to all hosts that are members of the specified hostgroup(s). |
first_notification: | This directive is a number that identifies the first notification for which this escalation is effective. For instance, if you set this value to 3, this escalation will only be used if the host is down or unreachable long enough for a third notification to go out. |
last_notification: | This directive is a number that identifies the last notification for which this escalation is effective. For instance, if you set this value to 5, this escalation will not be used if more than five notifications are sent out for the host. Setting this value to 0 means to keep using this escalation entry forever (no matter how many notifications go out). |
contacts: | This is a list of the short names of the contacts that should be notified whenever there are problems (or recoveries) with this host. Multiple contacts should be separated by commas. Useful if you want notifications to go to just a few people and don't want to configure contact groups. You must specify at least one contact or contact group in each host escalation definition. |
contact_groups: | This directive is used to identify the short name of the contact group that should be notified when the host notification is escalated. Multiple contact groups should be separated by commas. You must specify at least one contact or contact group in each host escalation definition. |
notification_interval: | This directive is used to determine the interval at which notifications should be made while this escalation is valid. If you specify a value of 0 for the interval, Nagios will send the first notification when this escalation definition is valid, but will then prevent any more problem notifications from being sent out for the host. Notifications are only sent out when the host recovers. This is useful if you want to stop having notifications sent out after a certain amount of time. Note: If multiple escalation entries for a host overlap for one or more notification ranges, the smallest notification interval from all escalation entries is used. |
escalation_period: | This directive is used to specify the short name of the time period during which this escalation is valid. If this directive is not specified, the escalation is considered to be valid during all times. |
escalation_options: | This directive is used to define the criteria that determine when this host escalation is used. The escalation is used only if the host is in one of the states specified in this directive. If this directive is not specified in a host escalation, the escalation is considered to be valid during all host states. Valid options are a combination of one or more of the following: r = escalate on an UP (recovery) state, d = escalate on a DOWN state, and u = escalate on an UNREACHABLE state. Example: If you specify d in this field, the escalation will only be used if the host is in a DOWN state. |
Example
Host Correlation
In this example we can configure several hosts located in different datacenters across several regions. We will use one main datacenter for services shared across all regions where we will locate as well our Nagios Core server.
As well in each region and city we will have one datacenter that will interconnect local branches and provide specific services used only at local level.
It will be only light over view to present the configuration possibilities. For better understanding please see the map.
Map
Customer - XYZ |router-xyz1 (10.0.0.2) |---------------------- router-xyz (10.0.0.1 <- HSRP IP) | ----- NagiosCore (10.0.0.101) |router-xyz2 (10.0.0.3) |------------------------------------------------------------------|-------------------------------------------------------------------| | router-apac-xyz1(10.0.0.12) |router-emea-xyz1 (10.0.0.22) | router-ams-xyz1 (10.0.0.32) | router-apac-xyz2(10.0.0.13) |router-emea-xyz2 (10.0.0.23) | router-ams-xyz2 (10.0.0.33) Region APAC Region EMEA Region AMS | router-apac-xyz1(10.1.0.2) |router-emea-xyz1 (10.2.0.2) | router-ams-xyz1 (10.3.0.2) | router-apac-xyz2(10.1.0.3) |router-emea-xyz2 (10.2.0.3) | router-ams-xyz2 (10.3.0.3) |--------------------------| |----------------------------| |----------------------------| |router-TO-xyz1(10.1.0.12) |router-KL-xyz1(10.1.0.22) |router-LO-xyz1(10.2.0.12) |router-PA-xyz1(10.2.0.22) |router-NY-xyz1(10.3.0.12) |router-TR-xyz1(10.3.0.22) |router-TO-xyz2(10.1.0.13) |router-KL-xyz2(10.1.0.23) |router-LO-xyz2(10.2.0.13) |router-PA-xyz2(10.2.0.23) |router-NY-xyz2(10.3.0.13) |router-TR-xyz2(10.3.0.23) Japan-Tokyo Malaysia-Kuala_Lumpur UK-London France-Paris USA-New_York Canada-Toronto |router-TO-xyz1(10.1.1.2) |router-KL-xyz1(10.1.2.2) |router-LO-xyz1(10.2.1.2) |router-PA-xyz1(10.2.2.2) |router-NY-xyz1(10.3.1.2) |router-TR-xyz1(10.3.2.2) |router-TO-xyz2(10.1.1.3) |router-KL-xyz2(10.1.2.3) |router-LO-xyz2(10.2.1.3) |router-PA-xyz2(10.2.2.3) |router-NY-xyz2(10.3.1.3) |router-TR-xyz2(10.3.2.3) |router-TO (10.1.1.1 HSRP) |router-KL (10.1.2.1 HSRP) |router-LO (10.2.1.1 HSRP) |router-PA (10.2.2.1 HSRP) |router-NY (10.3.1.1 HSRP) |ruter-TR (10.3.2.1 HSRP) |---------------| |---------------| |---------------| |---------------| |---------------| |---------------| Host1(10.1.1.101) | Host3(10.1.2.101) | Host5(10.2.1.101) | Host7(10.2.2.101) | Host9(10.3.1.101) | Host11(10.3.2.101) | Host2(10.1.1.102) Host4(10.1.2.102) Host6(10.2.1.102) Host8(10.2.2.102) Host10(10.3.1.102) Host12(10.3.2.102)
Configuration
define host{ # Template that we will use for shared Host configuration name xyz-host # Tempalte name check_period 24x7 # Monitoring 24/7 check_interval 5 # Polling interval 5 min retry_interval 1 max_check_attempts 3 # 3 time try to poll the device until changing the Hard status check_command check-host-alive # monitoring script notification_interval 0 # Send notification only once notification_options d,f # When the device is DOWN or FLAPPING contact_groups admins # Who will be contacted notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 notification_period 24x7 register 0 # As it is only template do not take it as a Nagios Core Object } ######################## Routers # Configuration of "host" objects for our routers ## Main Customer XYZ DC define host{ host_name router-xyz.xyz.org # Router Name alias router-xyz parents localhost # Parent "host" address 10.0.0.1 # IP use xyz-host # TEmplate to be used } define host{ host_name router-xyz1.xyz.org # Router Name alias router-xyz1 parents router-xyz.xyz.org # Parent "host" address 10.0.0.2 # IP use xyz-host # TEmplate to be used } define host{ host_name router-xyz2.xyz.org # Router Name alias router-xyz2 parents router-xyz.xyz.org # Parent "host" address 10.0.0.3 # IP use xyz-host # TEmplate to be used } #### APAC Main DC define host{ host_name router-APAC-xyz1.xyz.org alias router-APAC-xyz1 parents router-xyz1.xyz.org,router-xyz2.xyz.org address 10.0.0.12 use xyz-host } define host{ host_name router-APAC-xyz2.xyz.org alias router-APAC-xyz2 parents router-xyz1.xyz.org,router-xyz2.xyz.org address 10.0.0.13 use xyz-host } ###### Japan-Tokyo define host{ host_name router-TO-xyz1.xyz.org alias router-TO-xyz1 parents router-APAC-xyz1.xyz.org,router-APAC-xyz2.xyz.org address 10.1.0.12 use xyz-host } define host{ host_name router-TO-xyz2.xyz.org alias router-TO-xyz2 parents router-APAC-xyz1.xyz.org,router-APAC-xyz2.xyz.org address 10.1.0.13 use xyz-host } define host{ host_name router-TO.xyz.org alias router-TO parents router-TO-xyz1.xyz.org,router-TO-xyz2.xyz.org address 10.1.1.1 use xyz-host } ###### Malaysia-Kuala_Lumpur define host{ host_name router-KL-xyz1.xyz.org alias router-KL-xyz1 parents router-APAC-xyz1.xyz.org,router-APAC-xyz2.xyz.org address 10.1.0.22 use xyz-host } define host{ host_name router-KL-xyz2.xyz.org alias router-KL-xyz2 parents router-APAC-xyz1.xyz.org,router-APAC-xyz2.xyz.org address 10.1.0.23 use xyz-host } define host{ host_name router-KL.xyz.org alias router-KL parents router-KL-xyz1.xyz.org,router-KL-xyz2.xyz.org address 10.1.2.1 use xyz-host } #### EMEA Main DC define host{ host_name router-EMEA-xyz1.xyz.org alias router-EMEA-xyz1 parents router-xyz1.xyz.org,router-xyz2.xyz.org address 10.0.0.22 use xyz-host } define host{ host_name router-EMEA-xyz2.xyz.org alias router-EMEA-xyz2 parents router-xyz1.xyz.org,router-xyz2.xyz.org address 10.0.0.23 use xyz-host } ###### UK-London define host{ host_name router-LO-xyz1.xyz.org alias router-LO-xyz1 parents router-EMEA-xyz1.xyz.org,router-EMEA-xyz2.xyz.org address 10.2.0.12 use xyz-host } define host{ host_name router-LO-xyz2.xyz.org alias router-LO-xyz2 parents router-EMEA-xyz1.xyz.org,router-EMEA-xyz2.xyz.org address 10.2.0.13 use xyz-host } define host{ host_name router-LO.xyz.org alias router-LO parents router-LO-xyz1.xyz.org,router-LO-xyz2.xyz.org address 10.2.1.1 use xyz-host } ###### France-Paris define host{ host_name router-PA-xyz1.xyz.org alias router-PA-xyz1 parents router-EMEA-xyz1.xyz.org,router-EMEA-xyz2.xyz.org address 10.2.0.22 use xyz-host } define host{ host_name router-PA-xyz2.xyz.org alias router-PA-xyz2 parents router-EMEA-xyz1.xyz.org,router-EMEA-xyz2.xyz.org address 10.2.0.23 use xyz-host } define host{ host_name router-PA.xyz.org alias router-PA parents router-PA-xyz1.xyz.org,router-PA-xyz2.xyz.org address 10.2.2.1 use xyz-host } #### AMS Main DC define host{ host_name router-AMS-xyz1.xyz.org alias router-AMS-xyz1 parents router-xyz1.xyz.org,router-xyz2.xyz.org address 10.0.0.32 use xyz-host } define host{ host_name router-AMS-xyz2.xyz.org alias router-AMS-xyz2 parents router-xyz1.xyz.org,router-xyz2.xyz.org address 10.0.0.33 use xyz-host } ###### USA-New_York define host{ host_name router-NY-xyz1.xyz.org alias router-NY-xyz1 parents router-AMS-xyz1.xyz.org,router-AMS-xyz2.xyz.org address 10.3.0.12 use xyz-host } define host{ host_name router-NY-xyz2.xyz.org alias router-NY-xyz2 parents router-AMS-xyz1.xyz.org,router-AMS-xyz2.xyz.org address 10.3.0.13 use xyz-host } define host{ host_name router-NY.xyz.org alias router-NY parents router-NY-xyz1.xyz.org,router-NY-xyz2.xyz.org address 10.3.1.1 use xyz-host } ###### Canada-Toronto define host{ host_name router-TR-xyz1.xyz.org alias router-TR-xyz1 parents router-AMS-xyz1.xyz.org,router-AMS-xyz2.xyz.org address 10.3.0.22 use xyz-host } define host{ host_name router-TR-xyz2.xyz.org alias router-TR-xyz2 parents router-AMS-xyz1.xyz.org,router-AMS-xyz2.xyz.org address 10.3.0.23 use xyz-host } define host{ host_name router-TR.xyz.org alias router-TR parents router-TR-xyz1.xyz.org,router-TR-xyz2.xyz.org address 10.3.2.1 use xyz-host } ######################## HOSTS define host{ host_name host1.xyz.org alias host1 parents router-TO.xyz.org address 10.1.1.101 use xyz-host } define host{ host_name host2.xyz.org alias host2 address 10.1.1.102 parents router-TO.xyz.org use xyz-host } define host{ host_name host3.xyz.org alias host3 address 10.1.2.101 parents router-KL.xyz.org use xyz-host } define host{ host_name host4.xyz.org alias host4 address 10.1.2.102 parents router-KL.xyz.org use xyz-host } define host{ host_name host5.xyz.org alias host5 address 10.2.1.101 parents router-LO.xyz.org use xyz-host } define host{ host_name host6.xyz.org alias host6 address 10.2.1.102 parents router-LO.xyz.org use xyz-host } define host{ host_name host7.xyz.org alias host7 address 10.2.2.101 parents router-PA.xyz.org use xyz-host } define host{ host_name host8.xyz.org alias host8 address 10.2.2.102 parents router-PA.xyz.org use xyz-host } define host{ host_name host9.xyz.org alias host1 address 10.3.1.101 parents router-NY.xyz.org use xyz-host } define host{ host_name host10.xyz.org alias host10 address 10.3.1.102 parents router-NY.xyz.org use xyz-host } define host{ host_name host11.xyz.org alias host11 address 10.3.2.101 parents router-TR.xyz.org use xyz-host } define host{ host_name host12.xyz.org alias host12 address 10.3.2.102 parents router-TR.xyz.org use xyz-host } ######################## HostGroup define hostgroup{ hostgroup_name Customer_XYZ # Name of the Host Group alias Customer_XYZ # Alias members router-xyz.xyz.org,router-xyz1.xyz.org,router-xyz2.xyz.org # Host members of the Host Group hostgroup_members Region_APAC,Region_EMEA,Region_AMS # Host Group members of the Host Group } ## APAC define hostgroup{ hostgroup_name Region_APAC alias Region_APAC members router-APAC-xyz1.xyz.org,router-APAC-xyz2.xyz.org hostgroup_members Japan-Tokyo,Malaysia-Kuala_Lumpur } #### Japan-Tokyo define hostgroup{ hostgroup_name Japan-Tokyo alias Japan-Tokyo members host1.xyz.org,host2.xyz.org,router-TO-xyz1.xyz.org,router-TO-xyz2.xyz.org } #### Malaysia-Kuala_Lumpur define hostgroup{ hostgroup_name Malaysia-Kuala_Lumpur alias Malaysia-Kuala_Lumpur members host3.xyz.org,host4.xyz.org,router-KL-xyz1.xyz.org,router-KL-xyz2.xyz.org } ## EMEA define hostgroup{ hostgroup_name Region_EMEA alias Region_EMEA members router-EMEA-xyz1.xyz.org,router-EMEA-xyz2.xyz.org hostgroup_members UK-London,France-Paris } #### UK-London define hostgroup{ hostgroup_name UK-London alias UK-London members host5.xyz.org,host6.xyz.org,router-LO-xyz1.xyz.org,router-LO-xyz2.xyz.org } #### France-Paris define hostgroup{ hostgroup_name France-Paris alias France-Paris members host7.xyz.org,host8.xyz.org,router-PA-xyz1.xyz.org,router-PA-xyz2.xyz.org } ## AMS define hostgroup{ hostgroup_name Region_AMS alias Region_AMS members router-AMS-xyz1.xyz.org,router-AMS-xyz2.xyz.org hostgroup_members USA-New_York,Canada-Toronto } #### USA-New_York define hostgroup{ hostgroup_name USA-New_York alias USA-New_York members host9.xyz.org,host10.xyz.org,router-NY-xyz1.xyz.org,router-NY-xyz2.xyz.org } #### Canada-Toronto define hostgroup{ hostgroup_name Canada-Toronto alias Canada-Toronto members host11.xyz.org,host12.xyz.org,router-TR-xyz1.xyz.org,router-TR-xyz2.xyz.org }
Now it is possible to reload the Nagios Core configuration:
[root@NagiosCore ~]# /etc/init.d/nagios reload
- Map
or
- Host Groups
URL's
Nagios Core 3 “host” documentation: http://nagios.sourceforge.net/docs/nagioscore/3/en/objectdefinitions.html#host
Nagios Core 4 “host” documentation: http://nagios.sourceforge.net/docs/nagioscore/4/en/objectdefinitions.html#host
Nagios Core 3 “hostgrop” documentation: http://nagios.sourceforge.net/docs/nagioscore/3/en/objectdefinitions.html#hostgroup
Nagios Core 4 “hostgrop” documentation: http://nagios.sourceforge.net/docs/nagioscore/4/en/objectdefinitions.html#hostgroup
Nagios Core 3 “hostdependency” documentation: http://nagios.sourceforge.net/docs/nagioscore/3/en/objectdefinitions.html#hostdependency
Nagios Core 4 “hostdependency” documentation: http://nagios.sourceforge.net/docs/nagioscore/4/en/objectdefinitions.html#hostdependency
Nagios Core 3 “hostescalation” documentation: http://nagios.sourceforge.net/docs/nagioscore/3/en/objectdefinitions.html#hostescalation
Nagios Core 4 “hostescalation” documentation: http://nagios.sourceforge.net/docs/nagioscore/4/en/objectdefinitions.html#hostescalation