Friday, June 14, 2013

I'm sorry Nagios, but it's time to say goodbye.

Often when one starts the awkward breakup proceedings, the conversation goes something like this:

It's not you, it's me. I've changed, and we're just not compatible anymore.

Truer words were never spoken.

I've been using Nagios for a long time... all the way back to when it was called Netsaint. In those days, setting up a new server was a Big Deal. You would have to do all sorts of things, not the least of which being the process of physically unboxing/setting up the machine.

In those days, Nagios worked well. You probably have a bunch of hostgroups defined and some nice templates... Create a host definition for web17c, add it to the webservers hostgroup, restart nagios and you are good to go! At athenahealth I was monitoring 200 odd boxes this way and have very little to complain about.

It's a different world today. At my current job we are all cloud based (AWS, more on that in another post!). We scale our resources up and down throughout the day to the order of +/- 50 nodes over the course of an 8 hour workday.

This puts us in a position where we need to monitor hosts we don't know about, running services we may not be aware of.

I've tried a lot of things to get this working well. My next couple of posts are going to be about the ways we tried to make it work, and what we ended up doing instead.


No comments:

Post a Comment