The SysAdmin Network

No more hiding in the server room

We currently have Nagios 2.9 installed and has been running nicely for a few years. I want to migrate it off that old Linux server onto a new one. I've gotten Nagios 3.1.2 installed and running fine. Before I migrate all 240+ of our devices over to the new install, I'm curious as to how other SysAdmins are configuring it.

I've heard of:
- Each host in its own config file with it's corresponding services, and you can copy then edit the file to easily add a new device, and all like devices in separate folders.
- All alike hosts in a single config file. With or Without all services in that same config file.
- All Hosts and services in one file. Then dependencies laid out in a separate file (our current config)

How are the Nagios config files arraigned in your system? I'd like to get differing ideas to make ours the most efficient it can be.

Views: 1921

Reply to This

Replies to This Discussion

Ah, Nagios configuration. A topic near to my heart. I was actually whining about it the other day.

Anyway, I'm monitoring somewhere around 100 hosts between two nagios servers. I've got two physical sites (well, technically four, but two critical sites), and each site has its own Nagios install. Each nagios install monitors all of the local servers plus the remote nagios server, plus all of the network connections. This way I'll be alerted if either of the nagios machines go down, any of the network connections go down, or (obviously) any of the "normal" servers have issues.

As for individual server configuration management, I'd recommend you put your entire /usr/local/nagios/etc directory into a subversion repository (along with libexec, too). I have to admit that I haven't done this yet, but it's in the works. I want to be able to track changes to my config over time, and subversion is a great way to do that.

I create a hierarchy of subdirectories under etc/objects. Here's how mine looks:

[root@web etc]# pwd
/usr/local/nagios/etc
[root@web etc]# tree -d
.
`-- objects
|-- commands
|-- computers
| |-- linux
| `-- windows
|-- misc
`-- network
|-- firewalls
|-- links
|-- routers
`-- switches

11 directories


Generally speaking, I absolutely abuse Nagios 3's multiple inheritance.

I have a prototype for both of the "major" types: computers and network. There's a "computers.cfg" and a "network.cfg" that lives in the directory of the same name, and the only contents of that file is a host declaration:

define host {        use                     generic-host
        name                    computers
check_command check_ping!100.0,20%!500.0,60%
notification_options d,u,r,f,s
register 0
max_check_attempts 10
notification_interval 60
contact_groups it-admins
}

That's a good "general" for my install, and as all of these values can be overridden by more specific declarations later, hey, no worries.

Right now, there are a set of "group" config files and "service" config files in those directories as well. I'm going to be moving them into their own subdirectories eventually, to be easier to manage and less cluttered, but the general organizational theme is that every group check should be self-contained.

In other words, if I've got a web group, then the web-group.cfg file contains the hostgroup declaration as well as any service declarations needed to make sure that those checks happen. Since many groups undoubtedly share the same checks, and you don't want to be changing 6 million pieces of code every time one of the low level commands change, the services can inherit their settings from a very "general" related service check down the config line.

So for instance, my web servers need web checks, obviously. So do my application servers. So do my firewalls. And my load balancers. I don't want to maintain a full fledged service declaration set for each one of those (and I bet you don't either), but at the same time, all of those devices are administered by very different people in many cases. So in your group config, create a service declaration similar to this:

define service{
        use                     generic-http
hostgroup_name firewall-http
contact_groups firewall-admins
}

This allows you to have ultra-local service declarations in with your hostgroup declarations, and you have very finite control over any variables that need changed. Notice that I didn't specify the check_command, because the one specified in generic-http is perfectly acceptable. If the firewall web servers had some specialized requirement, I would specify the check_command in the service declaration, and then below the service declaration, I would create the check_command to be used. Ultra local, discrete units able to be administered in a much more efficient way than searching through commands.cfg to find the right check command.

Because we've got the services compartmentalized like this, we can use Unix permissions to manage users' abilities to edit these files. Only want your firewall team to be able to change firewall nagios rules? Very difficult if you've only got 5 files. Simple if it's setup like this.

Individual hosts do, in fact, get their own configuation file, and the filename is FQDN.cfg. This eliminates any ambiguity that might come into affect with less specific names, and it allows me to setup exactly the checks I need, because host declarations can belong to multiple hostgroups. Got a web server that also serves files?

define host{
        use             linux-host
host_name fs-web.internal.domain
alias fs-web.internal.domain
address 10.95.1.22
hostgroups http-servers, file-servers
}

*BAM* Every check I need is in place, because the host uses the "linux-host" prototype, which specifies check by ping (remember the check_command ping specified in computers.cfg?), automatically adds a check for snmpd, and any other requirements. http-servers does all the web related stuff, and file-servers makes sure all the NAS stuff is available. All in 7 lines.

The initial configuration still takes a lot of time to get the hierarchy setup and to "wrap your head around" the way it works, but the amount of time I've saved by being able to create short files which do lots of stuff is just amazing. I can't even tell you how much time, because I've stopped thinking about it.

I honestly can't believe that some books and howtos tell you to keep things in the original files.

I hope this has helped. :-)

--Matt
Great layout Matt - Thanks for sharing the details!
Thanks, Jeff. And you're welcome :-) I'm always an advocate of being lazy the smart way ;-)
Very interesting way of doing things, Matt.

As we've recently grown to over 110 hosts and a good 800-900 checks I've taken some time to restructure things today. One of the main changes is moving hosts into their over config files and out of huge generic files that are 1000's of lines long!

Next step - attack the services!

On that point - how have you structured your service / check commands? I'm intending to create a "services" or "commands" folder and create cfg files for each type of check, ie: generic windows ones will be in window_generic.cfg and MS Exchange will all be in one file seperate from everything else.

Has anyone used service groups yet? Was thinking of looking into these as an easy was to group services together but haven't played with them as of yet.
Hi Graycat,

Thanks for looking over the layout.

As for dealing with services en mass, I create hostgroups to lump the hosts together, then I apply that hostgroup to the service, rather than a long list of server names.

For instance, *ALL* of my Linux hosts get ssh, diskspace, snmp, and several other services checked. If I've got 50 linux servers, I could do it like this:


define service{
service_description SSH Service Check
check_command check_ssh
host_name linux01, linux02, linux03, ... linux50
}

define service{
service_description SNMP Service Check
check_command check_snmp
host_name linux01, linux02, linux03, ... linux50
}

etc

but that's insane.

Instead, I do it like this:

define hostgroup{
hostgroup_name linux-servers
}

Then, the host objects look like this:

define host{
use generic-host
host_name linux01
address 192.168.0.10
hostgroups linux-servers
}

Then my actual services are configured like this:

define service{
service_description SSH service check
check_command check_ssh
hostgroup_name linux-servers
}


And voila, every host that is set with "hostgroups linux-servers" automatically has check_ssh run against it.

Now, whenever you need to add a new server, you just make sure to apply the right hostgroups to it, and the checks start happening automatically.

What I ended up doing was creating an OS group (such as linux-servers) that assigns all of the "standard" checks (like ssh, disk space, date, snmp, etc), and additional groups for my "special" services.

For instance, there's a hostgroup for fileservers that automatically checks NFS and Samba (since all of my fileservers have those services). I have a postgres hostgroup that checks postgres, and a mysql group that checks, you guessed it, mysql.

A linux host that runs postgres might look like this

define host{
use generic-host
host_name dbServer01
address 192.168.0.35
hostgroups linux-servers, postgres
}

All of the linux checks and postgres checks are applied instantly.

When configuring Nagios, an hour of prep and design time gains you thousands of hours in the long run of maintenance.
I should have also answered your question about service groups, but left it out, sorry.

Service Groups used to be strictly for display purposes, but you can do some other things with them in the 3_0 branch. You'll want to read over the next couple of pages:

http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html#servi...
http://nagios.sourceforge.net/docs/3_0/objecttricks.html

I don't use them, but I could see how they would be useful for triggering inter-system dependencies. For instance, if you had a system which processed files automatically, but the files were delivered via email, you could create an "email" servicegroup, and a "fileprocessor" service group, and make the fileprocessor group dependent on the email group:

define servicedependency{
servicegroup_name email
dependent_servicegroup_name fileprocessor
etc etc
}

Like I said, I haven't done this, since most of my systems are relatively non-interdependent. But it's possible.
I stole a good bit of my writing here, modified a few small things, and made a blog entry:

http://www.standalone-sysadmin.com/blog/2009/07/nagios-config/
I decide to keep to the Nagios standard. I have all my hosts in a single config (hosts.cfg w/ comments), all services in a separate config (services.cfg w/ comments). All my config for my additional items have been placed in (/usr/local/nagios/etc/objects) where (/usr/local/nagios/etc/) has all my (cgi.cfg, nagios.cfg,resourse.cfg, nrpe.cfg), this is also the location of my (passwd.users)
I saw this today, and it reminded me of your question.

I have just checked to see who is doing the presentation and what do you know? Matt gets everywhere!

RSS

© 2012   Created by Elizabeth Ayer and Michael Francis.   Powered by

Badges  |  Report an Issue  |  Terms of Service