Yesterday I posted a little something about SLAs and I’m always happier with things when I can wrap them up with a handful of guidelines. Not always possible in the complicated world we live in, but here goes anyway:
- Discover the things that are meaningful for the business. I risk stating the obvious but there is always a temptation to approach this ‘backwards’ by starting off with what can be measured rather than what is significant (and then working out how to measure it). You don’t want to end up with a bunch of metrics that are easy to count but don’t describe desired system performance.
- Strike a balance between persistence and change. Unless doing lots of projects isn’t important to you, be careful not to base all your KPIs on availability/stability metrics – or if you do, at least be aware of how that can drive reluctance to push changes through the system.
- Make appropriate interpretations for each product or system. In most organisations different systems, or parts of each system, are subject to different uptime, capacity, latency etc demands. And assuming you pick some basics like performance they should be specific to each product; for a website that might be a number of page impressions, and for an analytic system that might be a time to render when a data set is updated.
- Include time as a dimension. Most businesses – particularly on the web – have a number of 24x7 products, but there are also a lot of systems that only get used during business hours or at certain intervals (e.g. payroll is usually a monthly thing).
- Disregard #1. Kind of. Now that you’ve gotten this far, you will need to consider some feasibility, because signing up to unachievable SLAs doesn’t help anyone. Have a look at what devices and services underpin the business functionality you are measuring. Trees of dependencies, composites in SOA for example, tend to live up to the least strict SLA rather than the aggregate of the set.
Rules of thumb – apply in conjunction with local knowledge!