Saturday 14 June 2008

S**t Happens

I talk a lot about failure, how to build for it and recover from it.  Of all the things that will happen to your system during its lifetime failure of some sort is one of the few inevitable events.

A lot can go wrong with computers, but surely their best-known weakness has to be their fundamental incompatibility with water.

Focusing on building systems that survive individual node failure is an excellent discipline, but as you can see from that clip, you can't count on your datacenter to always be there.  That means distributing your system across servers in the same location will protect you from a number of (the most common) failure scenarios but if it's really, really important that you are always up then it needs to be in more than one place.

Think electricity.  Think connectivity.  Think geography.

