Friday 16 January 2015

The power and the pitfalls of test and learn

My core philosophy on product is that product/market fit is a journey; it is a function of discovery and learning over time, and it is impractical to expect to be able to make a set of ideal decisions up front (i.e. in advance of any product development and real feedback from real customers).

We now live in the world where the fast eat the slow, and I am going to argue that learning is speed:

At Hotwire we’ve dramatically improved our business performance in the last year by focussing on learning as a primary goal.  It’s a first order metric, along with the ‘hard’ KPIs most businesses are familiar with (revenue, room nights, activations, etc) but - as any good teacher will tell you - measuring learning is extremely hard to do.  Fortunately for us, there’s another feature which is both easy to observe and highly correlated with learning: experimentation.

Whenever you see a high rate of improvement (in nature and in science) you will almost always see a high rate of experimentation.  There’s a great Thomas Edison quote which goes something like; “None of my inventions came by accident. I see a worthwhile need to be met and I make trial after trial until it comes.”  Most of our early learning as human beings is heavily experiment driven; it is interactions with the world that teach us how to behave effectively within it.  But my favorite story about the value of pursuing learning as a primary goal is the Kremer Prize:

In 1959 Henry Kremer, an industrialist and patron of early aviation, established a prize of £50,000 for the first human-powered aircraft to achieve a controlled flight.  Hundreds of attempts were made over 20 years with no success.  And most of these were well-informed experts; aviation companies and universities etc.  The prize went unclaimed until 1977 when Paul MacCready won it with his Gossamer Condor.  The secret to his success wasn’t being smarter than any of those who previously attempted flights or knowing something they didn’t, it was his approach to the problem being fundamentally different.

Most teams spent months designing and building their craft, then they took it out to a field or an airstrip to try it out, then they crashed, then they swept all the pieces up and returned to the hanger for another few months to rebuild.  MacCready focused not on the airframe that would work, but on a cheap construction which was easy to assemble and disassemble by hand in the field.  Using this he was able to try more designs out in one single day than every preceding team added together for the entire previous year.

His formula for success had 2 main features; variation and repetition.  You can see this in nature too; natural selection tries out variations of organisms over and over again (optimizing to the ultimate KPI - life itself!) as those organisms improve their suitability to their environments.  While not strictly mathematical, Fisher’s theorem is a useful formalization of this (and the other examples we’ve discussed here).  It goes something like this; “The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time.”  Or, more simply:

The capability to try out more hypothesis at any given time is highly correlated with faster improvement.

So learning is speed, and number of experiments is a useful approximation of learning.  But before you just count all a/b tests etc, there are some subtleties to success here:

The first thing I like to watch out for is confirmation bias.  As you embrace test and learn and make experiments cheaper to run you lower the bar for organizational participation.  This is unquestionably another benefit - harnessing the innovation of a larger slice of your org - but not everyone is as academically disciplined as you’d hope, and there is an underlying human tendency to like our own ideas and see them in a less-critical light than the ideas of others.  A while ago I was fortunate enough to be able to spend some time with Alan Kay, and he told me that science is the process that stops people from falling in love with their own ideas.  This is more than just a principle; if you come up with a hypothesis which you’re trying to prove instead of disprove, you will tend to discount contradictory evidence (i.e. proof points that suggest customers do not like it) and waste a lot of time trying marginally different manifestations of the same core idea in the desperate hope that you can somehow make them love it.  You lose speed, not gain speed, this way.

A fun way to look at this is to examine the difference between scientific thinking and religious thinking.  Jack Cohen is another wonderful human being, with a vast amount to teach the world, with whom I had the pleasure of spending a little time.  He argues that; in religious thinking all that matters is how hard you believe, especially in the presence of contradictory evidence (it’s there to test your belief).  In scientific thinking, all that matters is how hard you doubt, especially in the presence of confirmatory evidence (easy answers there to trick you into overgeneralizing your observations).

People will naturally come up with ideas that they like, but good product hygiene is about coming up with ideas that your customer likes.  Rigor in testing is the ultimate arbiter which can make that distinction clear to you, if you let it.  If you’re not open to being wrong - in fact expecting it and actively seeking to make it so - then you cannot, by definition, learn.

Another big-picture mistake is confusing iteration with learning.  It’s a common problem in the whole ‘transition to agile’ world; break up a big-up-front-plan into a number of predetermined phases, label them ‘iterations’ and then cash in your huge cheque as a profound agile coach.  Starting with the same inflexible ideas and delivering them across a higher number of software releases has some benefits in terms of quality and system risk, but it does nothing to improve your product/market fit.  You had determined the level of product/market fit at the beginning of the project and you have not improved that fit over the whole, say, year you were working on it whether it’s one single release at end of the 12 months on a whole series of incremental drops every two weeks.  Learning is about regularly shipping something customers can touch, then watching their interactions with that thing.  Looking for where it enriches and where it detracts, and only then deciding on the final scope for that next iteration.  The point is that each iteration should reflect the learnings from the previous iteration and improve upon it with respect to the customer experience, and therefore cannot be rigidly determined in advance of that feedback.

To be inclusive and stimulate the organizational pathways for innovating, a pretty broad filter is useful in the beginning.  All ideas are valid, and anything can be reduced to a testable hypothesis.  But, as you execute on a test and learn backlog, selecting ideas which are both coherent with the existing product and carry a higher probability of resulting in an improved experience for the customer becomes important.  Think back to our Kremer Prize example; while MacCready focused on the ability to explore a large number of variants, he was not randomly experimenting with geometry in the hope that he got lucky and found something that would fly.  He was an engineer who ran an aerospace company and he had a detailed understanding of all the mechanics involved in lift and control etc.  In user experience terms that means figuring out what customers are sensitive to, and using those things as inspiration points for ideas.  On the internet these sensitivities are rooted in behavioral science (decision theory, nudge theory etc) and are often things like social signals (how many others have bought the same item or currently viewing this page), urgency messaging (this is a popular item or this deal expires soon), and recommendation (if you like x you might like y) etc.  Discovering what these sensitivities are for your particular product can get you more ideas aligned to the physics of your particular business.

When talking about ways to increase the likelihood of ‘winning’ tests I always like to reiterate the value of ‘losers’ too.  A losing test is essentially an idea someone had for a new product or interaction which, without the experiment disproving the hypothesis, would have resulted in a project that would have taken away valuable product development resources for no (or negative) return.  What did you save right there?  How much customer attrition did you avoid by staying away from that unpopular option?

There is much more to doing this well - a whole series of posts wouldn’t do it justice - but I like to focus on the quality of the thinking first.

So does learning == speed?  At least in the special case of how rapidly you can get to the right product/market fit and grow, it certainly does here.  Is the number of experiments a good heuristic for organizational learning?  If you chart the improvement in our real business performance and the number of concurrent tests over time, you see very similar growth curves, just as Fisher’s theorem predicted...