Thursday 15 May 2008

A Better Way to Say 'State'

I wrote a little bit about state in this post and recently came up with a much simpler description.  Here goes:

Tracking state is necessary when a part of my application needs to make a decision based on your previous activity.  That previous activity could be a trail of things you did along the way (collected items in your basket) or a more binary prerequisite test you'll either pass or fail (logged in or not).

Something closely related to state is session, and I picked this up from Jeff Atwood while I was trawling around for something totally unrelated.  I think it's one of the best plain-English explanations I've read on the topic and, since he explains it better than me, I'd encourage you to take a look.

So that's all pretty simple but, as I always say, the basics are what everything's built on - and where this starts to get interesting is when you're building distributed systems.  It's easy to partition stateless functionality but what do you do when you need to track state or keep persistent session information?  Well that's easy, you share your state from a centralised place - that'll see you through for a little while but what about when you need to scale that horizontally?

There are a lot of systems already doing this (AFS, DNS, LDAP, NFS) but there are no standard solutions for distributed state management, these systems all implement their own unique consistency and conflict resolution methods.  We're now seeing a lot of webscale businesses hitting these scalability walls - requirements are forcing the production of customized infrastructure services like Google's Bigtable + Chubby and Amazon's S3.  We're balancing on the edge of this wall ourselves and given what a difficult but rewarding challenge it is, I feel fear and anticipation in equal measure!

No comments: