The Fletcher Project: Root Cause Analysis

Friday, 21 November 2008

Root Cause Analysis

To help me kill some time at an airport (which seems to be my second job these days), let me reach into my wardrobe of soap-box issues and pick something out. Ah, root cause analysis, here we go.

In my opinion, proper root cause analysis is the most important part of any operational support process.

Having a professional, predictable response and the skills to restore service quickly are critical - but you have to ensure that your support processes don't stop there. If they do, then you're simply doomed to let history repeat itself, and this means more downtime, more reactive firefighting, and less satisfied customers.

Good root cause analysis takes into account the entire event - systemwide conditions, the teams response, the available data, the policies applied - not just the technical issue which triggered the fault, and looks for ways to reduce the likelihood of recurrence.

Doing root cause analysis properly can be expensive, because you don't need to get to the bottom of why it happened this time, it's why it keeps happening, and why the system was susceptible to the issue in the first place that you need to uncover to really add future value. Think of the time spent on it as an investment in availability, freeing up your team to work more strategically (as well as enjoy their jobs more), and happier users (which oddly seems to make happier engineers).

But what you learn by doing this isn't really worth the time you spend on it without the organizational discipline to follow up with real changes. If you're truly tracing issues back to their root, you'd be surprised how many are the result of a chain of events that could stretch right back to the earliest phases in projects. This needs commitment.

If you make money out of responding to problems then you'll probably want to ignore my advice. There is a whole industry of IT suppliers whose core business lives here, and while it's an admirable pursuit, don't take the habit with you when you join an internal team!

The Fletcher Project

Friday, 21 November 2008

Root Cause Analysis

No comments:

Labels

Blog Archive

Favorite Posts

Other Stuff

Other People