After some discussion it came down to this:
- We want to know when a critical system/service is unavailable.
- We want graphs to easily analyze trends.
- We don't want to have to restart anything when instances are added or removed.
Sensu is a popular modern monitoring router (their term!). It doesn't try to be a Swiss Army knife... it takes input from scripts, and performs actions based on that input.
For a long time Nagios user, Sensu is a bit disconcerting. There's no enormous "all clear" dashboard with hundreds of green dots. You know when something is wrong, but have no real way to assess that everything is right.
We chewed on that for a while and eventually decided we were OK with it.
One thing I really liked about Sensu was the way clients were added and removed. I won't re-write their documentation, but it comes down to this:
- When sensu-client starts, the node self-registers.
- A simple API call takes a node out of the clients list.
Want to stop a node? Just make sure that your stop script makes the right API call. When it starts back up it will put itself back in.
The other thing I liked is that it uses Nagios style checks (based on exit code of the plugin), so we were able to drop it in place pretty easily and just call the plugins that were already installed.
So that gets monitoring out of the way for now... on to graphing!
No comments:
Post a Comment