The Fletcher Project: June 2008

Wednesday 25 June 2008

Of Multicultural Offices

Have a look at this interview with Horacio Falcao from INSEAD on cross-cultural negotiation.

I found the points about overestimating and underestimating proximity in relationships particularly interesting - Horacio says that we often make too many assumptions when dealing with people from similar backgrounds and nationalities and this can end up costing us. When dealing with those we perceive to be obviously different we take extra care to ensure we explicitly state everything up front, a valuable practice we can take foregranted when we strongly relate with the other party from the start.

Saturday 21 June 2008

The Worlds Biggest Marketing Deadline

I am never a fan of deadline-oriented architecture and sometimes you get a whopper; this is most certainly my biggest one to date.

Well, it was get our Euro 2008 features out in time or mow the worlds largest lawn. Marketing eh?

Wednesday 18 June 2008

CAP

A couple of months ago I wrote a little about the architectural concepts ACID and BASE, two descriptions of two very different systems. In a company like ours, the business (and it's pseudo-techie product managers) fail to recognize the mutual exclusivity involved in various combinations of ACID and BASE, desiring the benefits of both concurrently. This is a pretty vast comprehension chasm to cross without a good tool to help us explain the tradeoffs - enter Eric Brewer's CAP theorem.

CAP stands for Consistency, Availability and tolerance to network Partitions and works a little like the great software triangle (scope, cost, time) in that you may only have 2 of the 3 properties in any given implementation. Note that we talk about an implementation here because it is perfectly valid, and in many cases quite sensible, to build different features within a single system to different CAP tradeoffs.

Consider a system with high availability requirements. From this starting point you may chose to design in strong consistency (the data is always the same from any perspective) but you will not be able to distribute the system across any network boundary. Your other choice would be network tolerance (it will run nicely geographically separated) but you will have to accept a window of inconsistency in both normal and failure modes. If you have the option of doing away with your availability requirement then you might build something partitioned and consistent but you'll always have to fail to guarantee consistency through any network event.

Trying to keep a widely distributed data set highly available and 100% consistent at any given moment will bring you up against certain laws of physics. Good luck with that.

Saturday 14 June 2008

S**t Happens

I talk a lot about failure, how to build for it and recover from it. Of all the things that will happen to your system during its lifetime failure of some sort is one of the few inevitable events.

A lot can go wrong with computers, but surely their best-known weakness has to be their fundamental incompatibility with water.

Focusing on building systems that survive individual node failure is an excellent discipline, but as you can see from that clip, you can't count on your datacenter to always be there. That means distributing your system across servers in the same location will protect you from a number of (the most common) failure scenarios but if it's really, really important that you are always up then it needs to be in more than one place.

Think electricity. Think connectivity. Think geography.

Thursday 12 June 2008

Rocket Powered Horse Trials

Pictured here in flight at our secret testing centre in Royal Ascot. I'm sure Bert would back this, he's always been a fan of new tech.

Stopped time - part 5

Monday 9 June 2008

Process Improvement

I'm hearing a lot of talk about continuous improvement these days. I'm all for it, but there are 2 really common shortcomings in most peoples implementation:

It isn't only about adding steps/gates/processes. Sometimes a process can be improved by removing a step, or perhaps the organisation can be better served by abandoning the process altogether.
Improve your process improvement process. It's part of your organisation just like any other process, and as such, should be subject to a bit of continuous improvement.

Making sure you do a bit of number 2 gives you some controls to ensure enough of number 1 happens.

Friday 6 June 2008

Rack Mount 1, Technician 0

Failure is coming to get you, but we're getting better at predicting the scenarios and coding for them. We think a lot about servers dying, losing network connectivity, power cuts, and how to respond to critical bugs. These things are essentially unexpected technical events, but there is a whole other category at play in real life - human error.

Imagine this server has your data on it...

What will your customers see while that gets put back together? How are you going to get the data back?

Despite the comedy value of that clip, this is exactly the sort of thing that happens in real life - people make mistakes. But even when this kind of maintenance is less clownishly executed, it still needs to happen - and you need to decide what effect you're going to let planned maintenance events have on your revenue stream.

Wednesday 4 June 2008

The Rules About Rules

What we do as technical teams needs to have some rules. It's necessary for any group of people who need to work together to have a common frame of reference that gives them an idea of how to interact with one another and what to expect from each other.

But let's not forget that, in many cases, that's all rules are - a framework to get started with. There are the odd few that are a little more material, for example things that deal with regulatory compliance or safety, but generally (in our industry) they're in the minority and they're obvious with a little experience.

It can be easy to get hung up on the wrong stuff with rules and the key is to always think about the why. Why did we make that rule in the first place? What was the reason - the principle - behind the rule? If you know why then you can weigh the rule against the benefit of the action you want to take. For example, we [used to - we're braver now] have a rule that we don't make changes on a Friday. The principles behind this one are risk management and practicality; the weekends are the busiest times for our trading exchange and we have skeleton coverage Saturday and Sunday. Now let's say we've got a really nifty feature finished that we're sure will give us a reasonable revenue uplift over the weekend, but, we just finished it on Friday. So do we just forego the benefits entirely because of change control? The right thing to do is consider the system impact of the release and, if it doesn't change any core components, why not do it? After all, the controlling the risks was what we were after doing in the first place.

My "rule on rules" is to be rigid about the principles behind rules but flexible on the rules themselves, and remember, you shouldn't be in charge of enforcing rules if you don't know when it's best not to!

Tuesday 3 June 2008

SAI 25

A while back the SAI 25 was published on Silicon Alley Insider and we're sitting at number 4. There is a reasonably scientific approach to how these companies are measured and, based on that, I can see how we're climbing the list. Growth, margin and market share are all strengths of ours. Kick ass.

The Fletcher Project