Wednesday, 17 March 2010

Cloud Congress Done Quick

[well, my bit at least]

Yesterday I spoke at this year’s Cloud Computing Conference covering what cloud computing really is all about (one you peel away the hype) and how to break the back of the adoption problem – what do you put out there and how do you get started? It was the first time I’d ever done a talk that had absolutely no diagrams or pictures whatsoever in the presentation and, considering the plainness of the slide deck, it didn’t go too badly at all.

Excluding the mundane introduction, here’s the reader’s digest version of my key points:

What's in a cloud?

As a relativity new and trendy technology cloud computing is open to a lot of debate and even the 'correct' interpretation changes as the technology matures. We've had web applications for a long time now, so I'm not comfortable with SaaS being thrown into the cloud bucket. I like to define it in the context of how it changes how you deliver software (by abstracting away the complexities, layout, and connectivity of infrastructure from you developers) and how it impacts the way delivery costs are calculated (by exchanging metered billing on a usage basis for CAPEX-heavy up front acquisition).

My test (does your definition of cloud computing rely on the observations of your end users?) is simply to ask yourself "am I saying I am a cloud company just because my users access my product over the web?" If so, then perhaps you need to consider that maybe you might not be. Nothing wrong with that, but nothing new either.

What your FD sees

My next section expanded on the CAPEX vs OPEX shift that cloud platforms enable. If you have got your product design right then - on the web anyway - more usage should equate to more revenue and, with a cloud platform, more usage equates to more cost. See how that works? The cost base grows in line with revenues, therefore smoothing out that lumpy accounting and tricky budgeting activity that is characteristic of large hardware drops throughout a site's lifetime. You've also able to bring new things online relatively quickly (eliminating the ordering/delivery lead time/racking and stacking stages) and you can afford to try out slightly more speculative business cases - if I'm on the fence about a particular feature then I'm much more likely to give it a try if I know I can pack it up quickly and cheaply later if it's turns out for the worst.

Most cloud platforms have pretty good metering systems in place and that allows you a much more granular view of what part of your system is directly responsible for what parts of your cost base. From a business case perspective my advice here is to include all costs and not forget about time as a factor. Sounds a little obvious but when, for example, looking at Amazon's S3 as a storage platform for an analytical data set your total cost is going to include the transferring in and out of data as well as the cost per GB of storing it. Time matters too - I've seen the odd business case for a cloud platform fail to stack up because over a 3 year period more total cash is paid out than total cash spend once the large CAPEX bill is amortized over the same period.

Cloud capabilities

I also touched on some of the less obvious uses for cloud computing because so much emphasis is given to migrating existing systems into the cloud and I don't think enough time is given to considering what additional things that aren't being done now could be brought on because of the inherent properties of cloud platforms. These include private content delivery networks (because the larger cloud players tend to have a good reach), enabling offshore or outsourced development without opening your perimeters to external organizations who may have weaker security policies, neutral territory for integrations or joint ventures, and large scale load testing (because where else will you get hundreds of high spec load generators external to your network and connected over realistically-latent lines).

Development and testing environments are a good way to dip a toe in because, if you're doing it right, you will already have nicely sanitized data (which gets you clear of most of the oft-cited security concerns) and you won't be expecting production-sized load. It's also the best way to get a good feel for the cloud suitability of your production system without making any user impacting changes.

Architecting for the cloud

Many people - mostly those selling cloud integration tools or who charge by the hour - will tell you about how they can help you move your systems into the cloud. Don't kid yourself on. If you're using a half-way decent definition of a cloud platform then there is a lot more to it than is commonly appreciated. There are a number of good design patterns that I believe organizations should start adopting today and not just because they prepare systems for cloud runtime in the future, but also because they're pretty good ideas on their own merits.

It all starts with my favorite - decoupling and creating clear, distinct boundaries between functionality in a system and abstracting the specifics of the implementation which delivers said functionality behind a well defined interface. When you present specific uses of data to a network in this way then, subject to a bunch of common sense rules, you are able to host that individual part of your overall system another way - on different servers, in another data center, or hey, in the cloud! As long as it's reachable by it's consumers - which brings us nicely onto messaging and using state sparingly - you buy yourself the flexibility to move things quite dynamically. A service registry is also highly desirable if you a) have composites made up of many services and b) want to be able to move them and scale up/down dynamically.

All good practices for scalability and availability regardless of your stance on the cloud.

The crystal ball

You're not allowed to speak at an event and not give some predictions. I think there is an old charter or something somewhere. So mine were; barriers are coming down and this will continue with technologies such as private link and private clouds, as with all trendy concepts the waters will be muddy for a while as the word 'cloud' is appended to everything we've already got in order to sell it to us again, and within 5 years I expect to see hybrids (cloud-type platforms in use to some degree) in almost every organization.

Overall a good conference - some top panelists and speakers, and I met some great folks there. Thanks to all the guys and gals at SixDegrees for putting together a worthwhile and fun event.

You can find the slides here.

Monday, 8 March 2010

Just a little more on SLAs…

Yesterday I posted a little something about SLAs and I’m always happier with things when I can wrap them up with a handful of guidelines.  Not always possible in the complicated world we live in, but here goes anyway:

  1. Discover the things that are meaningful for the business.  I risk stating the obvious but there is always a temptation to approach this ‘backwards’ by starting off with what can be measured rather than what is significant (and then working out how to measure it).  You don’t want to end up with a bunch of metrics that are easy to count but don’t describe desired system performance.
  2. Strike a balance between persistence and change.  Unless doing lots of projects isn’t important to you, be careful not to base all your KPIs on availability/stability metrics – or if you do, at least be aware of how that can drive reluctance to push changes through the system.
  3. Make appropriate interpretations for each product or system.  In most organisations different systems, or parts of each system, are subject to different uptime, capacity, latency etc demands.  And assuming you pick some basics like performance they should be specific to each product; for a website that might be a number of page impressions, and for an analytic system that might be a time to render when a data set is updated.
  4. Include time as a dimension.  Most businesses – particularly on the web – have a number of 24x7 products, but there are also a lot of systems that only get used during business hours or at certain intervals (e.g. payroll is usually a monthly thing).
  5. Disregard #1.  Kind of.  Now that you’ve gotten this far, you will need to consider some feasibility, because signing up to unachievable SLAs doesn’t help anyone.  Have a look at what devices and services underpin the business functionality you are measuring.  Trees of dependencies, composites in SOA for example, tend to live up to the least strict SLA rather than the aggregate of the set.

Rules of thumb – apply in conjunction with local knowledge!

Sunday, 7 March 2010

More Meaningful SLAs

Establishing internal service levels with the rest of the business is a difficult process - there are so many variables that can be measured and, as we all know, you change what you measure by measuring it. For example, if you express your SLA exclusively in terms of system uptime, then you improve all the activities around keeping your system available. The flipside of this is that you often discourage the activities around effecting change in the system - after all, any releases or upgrades or new features always carry some risk to availability and that's what they're measured on...

The place to start is to work out what's important to the organisation. Performance and availability are critical to us (a latency sensitive transactional platform with variable usage patterns) but so is change (a content driven web application correlated with events in the real world). We decided that performance, availability, change, and support response were the key metrics for us - nothing unique so far, and next we had to make an interpretation of each of these that was relevant to our various systems.

A basic principle here is recognising that it isn't just the raw numbers that should be appropriate to each individual product, but what is being measured too. Throwing an overall value at the problem (for example 99% availability across the board) makes the job of putting together your SLA easier, but is it a true reflection of your infrastructure? Whenever I've seen this coarse-grained approach used it has always led to less than acceptable uptime for the most critical applications and wasted investment propping up others that are realistically less important.

Another way to make sure your SLAs really closely matches business need is to introduce the dimension of time. In many systems and many organisations demand - and the cost of downtime - varies over time. For example, how many accounts and payroll systems are used around the clock? If you can trade off to 'best endeavors' over weekends and evenings then you shouldn't have too much trouble meeting a five nines commitment during business hours between Monday and Friday.

For our website we have a flat availability target (such is the nature of a 24x7 site) and performance we interpreted in a latency metric for price publishing and order placement. For reporting systems - which do not experience the same round-the-clock demands - we have different availability targets during business and after hours. Performance in the context of those systems is interpreted as a certain set of daily reports delivered by a fixed time each morning and a message delivery SLA on alerts on certain events. SLA's around change and product delivery are much more complicated and fraught with subjective measures. We've gone with measuring development projects iteration-by-iteration; what got delivered vs. what was committed during that sprint's planning. It's objective and encourages good estimation and strict control of scope creep during a sprint.

Making SLAs commensurate with what the business genuinely demands from a given piece of technology is important. Setting your sights high can seem like a good idea on the surface but, when you consider the frightening magnitude of difference in cost between 99.5% and 99.9% uptime, that couple of points can only ever be described as waste if they are not intimately linked to the organisations success.

Sunday, 28 February 2010

Gates and Nonblocking Delivery Practices

A few weeks ago one of my guys sent me this article from The Agile Executive which is - although not explicitly said - about the collision between ITIL infrastructure governance and agile development methodologies. I use the word 'collision' very deliberately because those guys deserve credit for how well they've managed to knit those (often opposing) worlds together. Which brings me nicely onto gates.

Gates, defined as the "You Shall Not Pass" checkpoints which must be navigated throughout a project, are not inherently bad. They just get misused in the same way that anything we do has the potential to be applied over zealously.


Gates have a place in delivery and fulfill an important function. They serve as a kind of final quality checklist to make sure that all those things you said mattered and had to be done actually have been done. As long as they are few, important, and near the end of the development cycle, you can make this work.

Where this tends to go wrong is when gates are established without clear and measurable clearance criteria defined up front. What you have then is an obstacle with a variable and subjective success criteria - and then everyone wonders why a project arrives at the gate and struggles to meet the requirements to move on. An understanding of exactly what's on the checklists a project will face as it goes live and exactly how that will be measured has to be front loaded, so that development teams know exactly how to make a piece of work transition through the lifecycle smoothly and organize resources in advance.

If this isn't established early and you get a project logjam downstream then you often end up having to compromise - and that means compromising on key quality or operational requirements that you believed are important enough to gate - in order to restart the flow. The good old fashioned 'defects fixed later cost more' curve still applies here.

Another common misuse of gates is introducing them into a process primarily for reporting purposes. It is true that a regular set of gates established throughout the end-to-end project process provides a set of convenient handholds against which to benchmark progress (we are at 'customer feedback on specification' or entering 'test run 3' etc) but there are a couple of downsides that go along with that.

Firstly, it tends to encourage over-reliance on artifacts such as documentation and reports rather than working software as a primary work output. Secondly, progress becomes measured by the clearance of gate after gate, step after step, and revisiting a previous step is seen as a step backwards. That lines up with an easy to describe liner view of the world, however with most complex projects in most organizations things are not that simple and progress often involves degrees of overlap and leapfrogging.

Wednesday, 27 January 2010

5 tips for more strategic budgeting

With the end of the financial year looming as the next major fiscal date in many UK company's diaries, IT people everywhere will soon be embarking on the wishlist-to-business plan journey that is the budgeting process.

There is already a lot of good advice out there on how to estimate numbers and prioritise what makes the cut, and that can be a brutal process, so here's a few things I keep in mind to make sure I'm not being too short sighted:

1 - Know the cost of a customer. The rest of this list isn't in any particular order, but this is definitely number 1. Most good CIOs and CTOs I meet know what this disk array costs vs. that disk array and what these licenses cost vs. those licenses, but not that many know what a customer costs to acquire and convert or to reactivate if they drift away. The relevance? Knowing that means you can work out exactly how many frustrated customers you can lose because of that bug before its worth fixing, or that slow server before its worth upgrading. In this light, things that might otherwise not have made the cut become the no-brainers that they should be.

2 - Don't be influenced by trendy buzzwords. There is always a meme that vendors will be able to exploit because of our inability to apply common sense and logic to a pitch on whatever the currently fashionable thing is. SOA, virtualisation, and cloud computing come to mind whenever I recall ill thought through spending sprees. That's not the same as saying that these - or any of the other candidates in the same category - aren't good technologies with valid applications, just that any investment in them should be handled with the same pessimism and judgement of real business value as money spent anywhere else. Just adding 'cloud' onto the name of a product doesn't change it's fundamental ROI, so don't let these slip by you unchecked.

3 - Know the cost of scale. When sizing up a new project, give growth plans some weighting rather than just initial costs. Depending on the degree of speculation involved in your plan, taking cheaper entry options can sometimes bite you when you reach scalability limits.

4 - Wider consultation. A year is a long time and a budgeting exercise inevitably involves dusting off your crystal ball and trying to foresee everything that everyone is going to want in the coming 12 months. Good luck with that and, although it seems a little obvious, spending a bit more time with each area of the business trying to coax their plans and ideas out of them will at least expose the bigger ones up front.

5 - Increase priority on core things. Another one that sounds obvious but - through its limited practice - clearly isn't. In tougher economic times it is particularly important to give the most crucial things the most attention, and it can be mistake to favor too many might-lead-somewhere-might-not initiatives over those core basics upon which the business depends today. I think it is critical to set aside time and money for innovation and to take a punt on those less-certain ventures that our instincts tell us have that commercial je ne sais quoi but it needs to be proportionate to the main line. This is important when deciding what to drop in order to bring your wishlist and the available cash resources closer together - big but critical projects can be easy targets because they pull back more cost in a single swipe than a number of smaller but less meaningful items will.

Wednesday, 6 January 2010

The hidden wisdom of Kevin McCloud

I was lucky enough to get a copy of the Grand Designs Handbook for Christmas - it's a compendium of building projects from the channel 4 show (the book, not Christmas). The first few chapters have a lot of lessons learned and good practices picked up from 9 seasons worth of experience with, well, variously successful building projects.

Reading some of the advice, there is a lot that can be applied directly to software projects. There is a section on 'how to behave with your team' which is aimed at helping people keep their building sites productive - but the advice translates well to technical projects. Here's a couple of my favorites along with my join-the-dots:

"Don't give verbal instructions on site unless they've been agreed with your consultants and they're backed up in writing. If you just issue verbal instructions, you risk confusion and possible extra costs."

This is a top one for me - especially in the agile world where many interpret 'agile' as 'make it up as you go along' and then end up disappointed. I think maximizing interaction between delivery team and customer team is a good thing especially when both sides recognise the boundary between clarification and change in in-flight work.

"Do take time to prepare detailed written briefs setting out all your requirements for all your consultants. A good brief is an essential part of the design process."

One of the simplest formulas the govern all project delivery (hardhat or mousepad) is that the quality of what you get out with never exceed the quality of what you put in - yet this seems to escape so many of us so often. What you invest in clear instruction and regular contact with developers is a key variable in how happy you'll be with the results.

"Do allow your experts the freedom to do their jobs. Stand back from the project and enjoy your role as client; making decisions and hanging around, generally being a good egg and imbuing the world with optimism and excitement."

I don't exactly advocate stakeholders 'standing back' from a project, and I don't think that's what Kevin means either. Don't micromanage your delivery team and, as a customer, be sure to perceive the difference between decisions you should make (and don't let the team wait for those) and decisions the team should make (and let them use their skills and experience to make them).

"Don't listen to other people outside the professional team. You'll only end up getting very confused."

External feedback and extra ideas are good, but remember that an engineer knows his business best and don't fall into the trap of listening to someone else simply because you like their answers better. I consider myself relatively smart but even I'll say all sorts of wacky things when I don't have any skin in the game...

There are a couple of other sections with close analogies to technical projects - how to be a perfect customer and how to put a good brief together - which I think are 2 other critical ingredients to any successful engineering project (construction or software) but they'll keep for another post.

Tuesday, 22 December 2009

Standups vs Team Meetings

When teams first start to pick up SCRUM, there is a tendency to let stand ups replace regular team meetings. The risks here are that your stand up will elongate and get off topic because it is the only chance you get to talk about issues as a group, yet your team cohesion will still suffer because no matter how long you can string out a stand up in the morning it won't be long enough to table all the things you need to get through as a team.

A stand up is a project-oriented meeting, all about running the day to day work of the team, has a tightly fixed agenda, is time bound, and specifically focused on what's required for today.

A team meeting is a team-oriented meeting, all about maintaining the team, continuous improvement, future requirements, and focussed on bigger picture topics.

They are definitely two very different meetings with two very different purposes, and both of them are necessary for a team to run smooth projects and keep productivity and job satisfaction high.