If you are interested in monitoring, and successfully set up a system (whether home-grown or custom-off-the-shelf) for your own use, there comes a moment when you go from monitoring only the systems you care about, to monitoring systems that other people care about. Monitoring for yourself is all about having the best data for the least effort. Monitoring for others? That's when your job becomes a game of "what just happened" whack-a-mole.
In your role as a monitoring-for-others engineer, you find yourself answering certain questions repeatedly. Over the course of my 20 years as a monitoring specialist, I began to jokingly call them "The Four Questions,” modeled after the four questions that are asked during the Passover meal, or Seder. But once I made the joke, I realized there are other connections, lessons, and insights that the Seder has to offer both IT generally, and monitoring specifically.
Back to the actual four questions: I have learned through hard experience that monitoring solutions which are prepared to address these questions tend to succeed, while those which can't answer them often fail.
In this talk, I will describe the four common questions and their answers. I’ll explain why you need to structure your monitoring solution in specific ways in order to answer the questions appropriately, and then how to go about doing it. Along the way, I'll share a bit of wisdom gleaned from Torah, Passover, and the wisdom of the sages.
The Four Questions (Every Monitoring Engineer gets asked)
In my sordid career, I have been an actor, bug exterminator and wild-animal remover (nothing crazy like pumas or wildebeasts. Just skunks and raccoons.), electrician, carpenter, stage-combat instructor, American Sign Language interpreter, and Sunday school teacher.
Oh, and I work with computers.
Since 1989 (when you got a free copy of Windows 286 on twelve 5¼” floppies when you bought a copy of Excel 1.0) I have worked as a classroom instructor, courseware designer, desktop support tech, server support engineer, and software distribution expert. Then about 16 years ago I got involved with systems monitoring. I've worked with a wide range of tools: Tivoli, Nagios, Patrol, ZenOss, OpenView, SiteScope, and of course SolarWinds. I've designed solutions for companies that were extremely modest (~10 systems) to those that were ludicrous (250,000 systems in 5,000 locations). During that time, I've had to chance to learn about monitoring all types of systems – routers, switches, load-balancers, and SAN fabric as well as windows, linux, unix servers running on physical and virtual platforms, and even that magical environment known as "the cloud".