Saturday, 30 August 2008

There's something about Darwin

Biological computing isn't very well defined in the collective mind, and it means different things to different people. To me, it is more powerful as an analogy, a way of thinking about a systems overall behavior, than as any specific piece of technology.

I'm going to argue that there are 2 fundamental tenets upon which systems can be designed and built; mechanical and biological. Both start from a different mindset and both exhibit different basic capabilities that constrain what sort of things can be achieved within a given effort. I'm also going to postulate (triple word score!) that a system exhibiting biological characteristics is much more suitable to web-scale computing than it's mechanical counterpart.

Before we get into the meat of it, let's do some simple word association:

Forgetting completely about software for a minute (we'll deal with the analogies later), what comes to mind when you think about a mechanical system? Probably things like gears, cogs meshed together, reliability, tight integration, close supervision, a single contiguous chain, unthinking automation, dependency, predictability, and control.

Again leaving software out of the picture, what comes to mind when you think about a biological system? Probably things like change, iteration, flexibility, loose relationships between units, unpredictability, variance, the hive over the individual, response to environment, awareness, and cooperation.

The mechanical analogy is how we've all traditionally built computer systems. It kind of makes sense, we've been building mechanisms for a very long time and the logical, mathematical basis of computer science lends itself to this kind of thinking - representing in software a chain of cogs, each coupled to the next, powered by and dependent upon it's neighbor.  Look inside an old watch. Now take something small and seemingly insignificant out. Still working? Thought not.

A mechanical type of system has a number of 'moving parts', all of which must be functioning for any of them to function. The system is very easy to observe and very easy to predict - you'll pretty much be able to guarantee what state it will be in under any given circumstances. Information will flow through this 'production line' from one step to the next, operated on with monotonous exactitude. You'll have to keep a close eye on it though, because no steps in the chain will ever be aware of any others, and therefore can't make a decision about whether or not it's safe to pass their output on to the next step (will it be lost?) or whether they can trust the input of the previous step (is it corrupt?). Luckily it's pretty easy to observe, but unluckily it will more than likely need manual intervention to 'realign the teeth' when a gear goes bad. Scaling is tough too, you can always use bigger cogs but everything will only ever go as fast as the slowest wheel.

Recently we've recognized some patterns of behavior that occur in the natural world that would be of benefit in running a large scale computer system, and at the cost of increased complexity, we can replicate these in software just as we replicated a chain of hardwired moving parts.

The biological analogy is a way to build systems that are aware of their environments and their fellows, possess a degree of self-governance, and can respond 'intelligently' to changing circumstances. Look at a line of ants. Block their path, take some away, move their food source. Still working? You bet.

There are a couple of equally valid analogies to draw with the natural world; firstly observe how a single organism works on the cellular level. What matters is the whole animal functions competitively, individual cells are irrelevant, but they all work together to keep the 'system' alive. They have a way to replicate themselves, kill off corruption, heal 'faults' and modify their role as requirements change. They don't need externally monitored or controlled by a central point - they all have a little bit of this distributed amongst their number. Now imagine that animal is your system and the cells are servers, nodes, units of functionality - whatever it is that makes up your system. The other way to think about biological computing is to study social insects, cooperative hunters, and hives/swarms. You end up in pretty much the same place, a group of individuals which make up a whole with the whole being more valuable than the individuals, I think which one you prefer depends on where you went to school (or how much animal planet you've watched). It can be easier to think about bees making conscious (instinctive?) choices, communicating and reaching a consensus, than trying to divine meaning in the more mysterious chemical reactions that govern the cellular world.

A biological type of system is made up of a collection of disposable nodes, the sum of which is the whole system. Nothing is more important than the survival of the system, and these 'cells' must change purpose and even 'die' to keep the system going. There is no central government or master nodes, all the 'cells' are peers and the administrative tasks in the system are distributed amongst them. The system is capable of making simple decisions for itself, responding to environmental stimulus like load, failure conditions and available upgrades. Many of these decisions are reached by consensus, using protocols which create a 'hive mind' pulling together the opinion of every relevant individual and answering questions such as; is a feature down? Where am I? Am I the right version? What are my neighbors doing? Who will service the next request? Is this safe to eat?

Biological systems are more difficult to observe and their behavior becomes harder to predict the further into the future you look. Their decisions are only as good as the data they have and the rules you give them to assess it with - so you can end up worse off than if humans manually configured everything. A cellular system is composed of a number of small, totally independent pieces of functionality, and the upsides to this are scalability and partial failure. Scale comes through being able to arbitrarily add more 'cells' to your 'organism' as it needs to get bigger and bottlenecks are easier to solve as every individual component is able to operate as fast as it's legs will carry it (these designs are usually asynchronous). Partial failure means that even if all the nodes that make up a certain feature are down, your system as a whole will still work sans that one bit of functionality. Self healing is when a system is aware of what it should look like (as a gecko I should have a tail) and is able to recognize when it does not (ouch I don't anymore) and take some corrective action (better grow it back). This itself is a double edged sword, imagine you're intentionally taking something down because of a security flaw; you have to give yourself some way to prevent the page springing back up somewhere else. The 'split brain' problem becomes even more significant; usually in a split brain you face potential inconsistency with data being written in more than one place. With a system designed to repair itself, you might just end up with 2 complete copies working independently - that may not be all bad (depending on how your system works) but the ability to kill of duplication once connectivity is returned is something that needs addressed.

So now the previously promised postulation (I am good at scrabble). We know we can build systems whose behavior is easy to predict, they are easy to observe but they are rigid and require a lot of manual oversight and intervention. The mechanical way. We know we can build systems that are flexible, automatically resilient to environmental change and inherently scalable but they are difficult to accurately predict and can run away with themselves if not given the right information to from which to make deductions about state. The biological way. In a web-scale environment where availability, scalability, and the capability to ship frequently are such vital attributes of any product, it seems to me that we benefit more from thinking in a loosely coupled, compartmentalized, organic way than an interlocked, highly dependent, production line way.

No comments: