Archive for the ‘mcollective’ Category

Why Sensu is a monitoring router – some cool handlers

I have just finished a couple of handlers that really fits well into to the Sensu routing model.

The Execution handler

Will automatically execute things triggered by alerts, for example restart a service or enable debug mode. By using the great tool mcollective the handler can execute tasks on other servers. One example I show here is to restart an apache service if the web application doesn’t respond, there are a lot of nice things that could be done to automate the handling of unexpected events in the system. Our conclusion is that there are many events that could be handled in an automatic or semi-automatic way and by doing the handling via the Sensu router you will be alerted when something happens, you have the history of the event and actions can be triggered on other instances than the instance that the alert was triggered on in the first place.

This is an example of a check that check if the foo web site is responding. If the site doesn’t work and an alert is triggered the execute handler will run, and will run the task(s) defined in execute. In this case all servers with the role Foowebserver will restart the apache2 service.

"foo_website": {
      "handlers": [
        "rfdefault",
        "execute"
      ],
      "command": "/opt/sensu/plugins/http/check-http.rb -h www.foo.com -p / -P 80",
      "execute": [
        {
          "scope": "CLASS",
          "class": "role.Foowebserver",
          "application": "service",
          "execute_cmd": "apache2 restart"
        }
}

This will actually run this mcollective command on the sensu-server and restart the apache2 servers

mco service -C role.Foowebserver apache2 restart

The execute handler and mcollective could be use for a lot of great stuff like restarting services, shut-down or start servers, enable or disable debug mode for applications or gather more data from the servers. The only thing needed is to extend mcollective with more agents. One of the great advantages with mcollective is that it is agent based so it’s feel more safe than just execute remote ssh commands.

The graphite notify handler

The graphite notify handler is quite simple and just send a 1 when an event occur and 0 when it’s resolved, that means that it’s easy to get statistics how often an error occur and on which machines.

http://graphite.recfut.com/render/?width=1548&height=786&_salt=1353966035.28&from=17%3A00_20121126&until=18%3A00_20121126&target=keepLastValue(r13b.sensu.events.adm3_recfut_net.chef_client_fatal_error)&hideLegend=true&hideGrid=false&lineWidth=4&width=400&height=280

In this case we had one event that last for 20 minutes at 17:10.

The only thin needed to be done is to add some graphite config and then add the graphite_notify handler

  "graphite_notify":{
    "host":"graphite.foo.com",
    "prefix":"sensu.events"
  }
Advertisements