The things I should have learned from years of putting together people and technology to get successful product online. We’ll probably talk about strategy, distributed systems, agile development, webscale computing and of course how to manage those most complicated of all machines – the human being – in our quest to expose the most business value in the least expensive way.
Wednesday, 31 March 2010
The Paradox of Freemium
Firstly, for those who may have spent the last few years in a cave, I broadly define freemium as giving away something of substantial value along with an associated upgrade/expansion path which returns benefit to the organisation.
This is already commonplace. So many of the everyday tools I use have an element of freemium to their business model; google apps, zen agile, bitbucket, linkedin, mockingbird, a number of podcasts (which lead to paid-for books or other products), and of course almost every iPhone app I see these days has a fairly comprehensive ‘lite’ version.
For a long time we’ve given away tightly limited trial versions, exposing a very small subset of the functionality (or usage time) in order to coax users into a purchase to unlock the rest, and freemium is almost a reversal of this. Give away quite a rich set of functionality with no expectations, and you’ll pick up a group of buyers for that additional stuff on top.
Making this happen comes down to product design; you have to make sure that you hold back something(s) compelling enough to encourage purchases while making sure that your base product is functional and complete enough to be of significant utility on its own. There are so many ways to slice this, and sometimes the differences don’t even need to be rendered – you might simply impose several distinct licensing terms (free for academic or home use for example).
In one of my all-time favourite Ted Talks, Dan Pink says:
"The mid 1990s, Microsoft started an encyclopedia called Encarta. They had deployed all the right incentives. All the right incentives. They paid professionals to write and edit thousands of articles. Well compensated managers oversaw the whole thing to make sure it came in on budget and on time. A few years later another encyclopedia got started. Different model, right? Do it for fun. No one gets paid a cent, or a Euro or a Yen. Do it because you like to do it.
Now if you had, just 10 years ago, if you had gone to an economist, anywhere, And said, "Hey, I've got these two different models for creating an encyclopedia. If they went head to head, who would win?" 10 years ago you could not have found a single sober economist anywhere on planet Earth, who would have predicted the Wikipedia model."
I think those same sober economists would probably have said much the same thing about freemium, and probably much more recently. Getting customers to pay for your product by first giving most of it away wasn’t common sense – and maybe it still isn’t – but you certainly can’t ignore the evidence that it works.
Friday, 26 March 2010
Encouraging Profitable Usage
I was particularly interested in the stats near the end - in a crackdown on gold farming they managed to drop CPU utilisation 30% through a 2% drop in players (banned accounts who were violating these terms). To paraphrase that; 2% of the player base [revenues] were creating 30% of the load [costs] and that my friends, is what I call unprofitable usage of a product.
When you read that last sentence it makes perfect sense, however so few of us on the web think this way. How often have you looked up your top workload creators (page impressions, transactions) and your top revenue creators (purchasers, subscribers) and compared the two? How sure are you that your costliest customers - that you are probably supporting a collection of CPUs to service - are not marginal or worse in terms of value to your organisation? With sites more data rich than ever (read: more back end work per view) and screen-scraping a more widespread way to acquire content it's worth working to discourage this behaviour or at least make conscious decisions to allow it.
This thinking should be taken all the way back to product design - on the web we tend to emphasise users/visitor/subscribers without really thinking through what constitutes profitable usage.
It's not just about nailing people who are out to exploit you either - as a long time data provider my biggest issues have usually been well-meaning customers with lazy integrations. If you're going to build a client on top of a web API it's quick and easy to poll the bejesus out of it round the clock and then react to data in the responses rather than, for example, get a schedule of 'interesting' data and dynamically set up threads pick up smaller changes to discreet items on separate and appropriate timers. Take trades for example; why hammer a Hang Seng Index Options feed after 4:15pm? Or how about our feed - do you need 5 prices a second for this weekend's football fixtures on Tuesday or would hourly pre-match odds and squad changes be more than sufficient that far out? A slightly trickier initial integration but a better and cheaper product (for both parties) thereafter.
And finally I've always found it more effective to encourage good behaviour than discourage bad behaviour. Sometimes this can be as simple as an SLA which guarantees a superb latency up to a certain number of requests and creeps out in bands beyond that. Other times this can be purely commercial - why not extend better pricing to customers who genuinely cost you less to service or a tiered revenue share to those who contribute more to the partnership?
Monday, 22 March 2010
Reporting & Agile
This view is usually 100% in contradiction with what I hear from the same organisation's development practice - who will usually tell me that visibility has never been greater and that they have an unprecedented degree of transparency and openness with their business partners.
Can they both be right?
When we, as engineers, make statements like the above we're usually referring to the public nature of our SCRUM/XP/etc artifacts - product backlogs and burndown charts are open to anyone in real time, requirements, estimates, and plans tend to be kept in readily accessible web tools, and of course most events in the process (such as daily standups and each iteration's demo) are public forums. What else do you need to have the most detailed, accurate, up to date view of every piece of work anytime you want it?
This is all true, but where we tend to go wrong is in underestimating the underlying complexity of something that inherently makes sense to us. There is actually a significant degree of interpretation - based on technical knowledge built up over years - necessary to answer a simple question such as "how are things going?" through agile artifacts alone; we just tend to be pretty good at doing it in our heads as we go.
Here's a simple example to underline my point. O2 has a pretty nifty self-service portal, and I can use it at any time of the day or night to see what my current mobile phone bill is and how far through my monthly quota of data and text messages I am. It works well for me because they have taken some underlying data about my account, put it through some pre-processing to provide some plain-English conclusions, and presented it on the web conveniently formatted so that I can interpret its meaning. Instead, imagine if they did transparency and self-service as we sometimes do in software delivery and just told me that all the data is there - the same artifacts that they use internally are open to me - and I must draw my own conclusions from some raw material. I guess I would install toad, connect up to their DB, select my rows from the billing table, join that with my rows from the metering table, and dump them into a view that showed me a current bill value along with my unused quota of texts etc. A few of us out there would probably quite like this alternative but, nonetheless, we have to agree that it is unlikely to be a level of service that the majority of O2's customers are going to be happy with let alone know how to use.
So what can we do differently?
Try to aggregate and simplify a bit for your stakeholders; provide just a little more interpretation and presentation on top of the solid data you already keep and it will be much easier for non-techies to make sense of. As long as you are not misrepresenting the underlying facts (easy to do unintentionally whenever summarising data for consumption by others) you'll probably find that your business partners will refer to that as the 'transparency' you've been trying to achieve and they'll thank you dearly for it.
Wednesday, 17 March 2010
Cloud Congress Done Quick
[well, my bit at least]
Yesterday I spoke at this year’s Cloud Computing Conference covering what cloud computing really is all about (once you peel away the hype) and how to break the back of the adoption problem – what do you put out there and how do you get started? It was the first time I’d ever done a talk that had absolutely no diagrams or pictures whatsoever in the presentation and, considering the plainness of the slide deck, it didn’t go too badly at all.
Excluding the mundane introduction, here’s the reader’s digest version of my key points:
What's in a cloud?
As a relativity new and trendy technology cloud computing is open to a lot of debate and even the 'correct' interpretation changes as the technology matures. We've had web applications for a long time now, so I'm not comfortable with SaaS being thrown into the cloud bucket. I like to define it in the context of how it changes how you deliver software (by abstracting away the complexities, layout, and connectivity of infrastructure from you developers) and how it impacts the way delivery costs are calculated (by exchanging metered billing on a usage basis for CAPEX-heavy up front acquisition).
My test (does your definition of cloud computing rely on the observations of your end users?) is simply to ask yourself "am I saying I am a cloud company just because my users access my product over the web?" If so, then perhaps you need to consider that maybe you might not be. Nothing wrong with that, but nothing new either.
What your FD sees
My next section expanded on the CAPEX vs OPEX shift that cloud platforms enable. If you have got your product design right then - on the web anyway - more usage should equate to more revenue and, with a cloud platform, more usage equates to more cost. See how that works? The cost base grows in line with revenues, therefore smoothing out that lumpy accounting and tricky budgeting activity that is characteristic of large hardware drops throughout a site's lifetime. You've also able to bring new things online relatively quickly (eliminating the ordering/delivery lead time/racking and stacking stages) and you can afford to try out slightly more speculative business cases - if I'm on the fence about a particular feature then I'm much more likely to give it a try if I know I can pack it up quickly and cheaply later if it's turns out for the worst.
Most cloud platforms have pretty good metering systems in place and that allows you a much more granular view of what part of your system is directly responsible for what parts of your cost base. From a business case perspective my advice here is to include all costs and not forget about time as a factor. Sounds a little obvious but when, for example, looking at Amazon's S3 as a storage platform for an analytical data set your total cost is going to include the transferring in and out of data as well as the cost per GB of storing it. Time matters too - I've seen the odd business case for a cloud platform fail to stack up because over a 3 year period more total cash is paid out than total cash spend once the large CAPEX bill is amortized over the same period.
I also touched on some of the less obvious uses for cloud computing because so much emphasis is given to migrating existing systems into the cloud and I don't think enough time is given to considering what additional things that aren't being done now could be brought on because of the inherent properties of cloud platforms. These include private content delivery networks (because the larger cloud players tend to have a good reach), enabling offshore or outsourced development without opening your perimeters to external organizations who may have weaker security policies, neutral territory for integrations or joint ventures, and large scale load testing (because where else will you get hundreds of high spec load generators external to your network and connected over realistically-latent lines).
Development and testing environments are a good way to dip a toe in because, if you're doing it right, you will already have nicely sanitized data (which gets you clear of most of the oft-cited security concerns) and you won't be expecting production-sized load. It's also the best way to get a good feel for the cloud suitability of your production system without making any user impacting changes.
Architecting for the cloud
Many people - mostly those selling cloud integration tools or who charge by the hour - will tell you about how they can help you move your systems into the cloud. Don't kid yourself on. If you're using a half-way decent definition of a cloud platform then there is a lot more to it than is commonly appreciated. There are a number of good design patterns that I believe organizations should start adopting today and not just because they prepare systems for cloud runtime in the future, but also because they're pretty good ideas on their own merits.
It all starts with my favorite - decoupling and creating clear, distinct boundaries between functionality in a system and abstracting the specifics of the implementation which delivers said functionality behind a well defined interface. When you present specific uses of data to a network in this way then, subject to a bunch of common sense rules, you are able to host that individual part of your overall system another way - on different servers, in another data center, or hey, in the cloud! As long as it's reachable by it's consumers - which brings us nicely onto messaging and using state sparingly - you buy yourself the flexibility to move things quite dynamically. A service registry is also highly desirable if you a) have composites made up of many services and b) want to be able to move them and scale up/down dynamically.
All good practices for scalability and availability regardless of your stance on the cloud.
The crystal ball
You're not allowed to speak at an event and not give some predictions. I think there is an old charter or something somewhere. So mine were; barriers are coming down and this will continue with technologies such as private link and private clouds, as with all trendy concepts the waters will be muddy for a while as the word 'cloud' is appended to everything we've already got in order to sell it to us again, and within 5 years I expect to see hybrids (cloud-type platforms in use to some degree) in almost every organization.
Overall a good conference - some top panelists and speakers, and I met some great folks there. Thanks to all the guys and gals at SixDegrees for putting together a worthwhile and fun event.
You can find the slides here.
Monday, 8 March 2010
Just a little more on SLAs…
Yesterday I posted a little something about SLAs and I’m always happier with things when I can wrap them up with a handful of guidelines. Not always possible in the complicated world we live in, but here goes anyway:
- Discover the things that are meaningful for the business. I risk stating the obvious but there is always a temptation to approach this ‘backwards’ by starting off with what can be measured rather than what is significant (and then working out how to measure it). You don’t want to end up with a bunch of metrics that are easy to count but don’t describe desired system performance.
- Strike a balance between persistence and change. Unless doing lots of projects isn’t important to you, be careful not to base all your KPIs on availability/stability metrics – or if you do, at least be aware of how that can drive reluctance to push changes through the system.
- Make appropriate interpretations for each product or system. In most organisations different systems, or parts of each system, are subject to different uptime, capacity, latency etc demands. And assuming you pick some basics like performance they should be specific to each product; for a website that might be a number of page impressions, and for an analytic system that might be a time to render when a data set is updated.
- Include time as a dimension. Most businesses – particularly on the web – have a number of 24x7 products, but there are also a lot of systems that only get used during business hours or at certain intervals (e.g. payroll is usually a monthly thing).
- Disregard #1. Kind of. Now that you’ve gotten this far, you will need to consider some feasibility, because signing up to unachievable SLAs doesn’t help anyone. Have a look at what devices and services underpin the business functionality you are measuring. Trees of dependencies, composites in SOA for example, tend to live up to the least strict SLA rather than the aggregate of the set.
Rules of thumb – apply in conjunction with local knowledge!
Sunday, 7 March 2010
More Meaningful SLAs
The place to start is to work out what's important to the organisation. Performance and availability are critical to us (a latency sensitive transactional platform with variable usage patterns) but so is change (a content driven web application correlated with events in the real world). We decided that performance, availability, change, and support response were the key metrics for us - nothing unique so far, and next we had to make an interpretation of each of these that was relevant to our various systems.
A basic principle here is recognising that it isn't just the raw numbers that should be appropriate to each individual product, but what is being measured too. Throwing an overall value at the problem (for example 99% availability across the board) makes the job of putting together your SLA easier, but is it a true reflection of your infrastructure? Whenever I've seen this coarse-grained approach used it has always led to less than acceptable uptime for the most critical applications and wasted investment propping up others that are realistically less important.
Another way to make sure your SLAs really closely matches business need is to introduce the dimension of time. In many systems and many organisations demand - and the cost of downtime - varies over time. For example, how many accounts and payroll systems are used around the clock? If you can trade off to 'best endeavors' over weekends and evenings then you shouldn't have too much trouble meeting a five nines commitment during business hours between Monday and Friday.
For our website we have a flat availability target (such is the nature of a 24x7 site) and performance we interpreted in a latency metric for price publishing and order placement. For reporting systems - which do not experience the same round-the-clock demands - we have different availability targets during business and after hours. Performance in the context of those systems is interpreted as a certain set of daily reports delivered by a fixed time each morning and a message delivery SLA on alerts on certain events. SLA's around change and product delivery are much more complicated and fraught with subjective measures. We've gone with measuring development projects iteration-by-iteration; what got delivered vs. what was committed during that sprint's planning. It's objective and encourages good estimation and strict control of scope creep during a sprint.
Making SLAs commensurate with what the business genuinely demands from a given piece of technology is important. Setting your sights high can seem like a good idea on the surface but, when you consider the frightening magnitude of difference in cost between 99.5% and 99.9% uptime, that couple of points can only ever be described as waste if they are not intimately linked to the organisations success.