Friday 30 May 2008

Do Some Good

One of our engineers posted this link on our internal forum and made a spirited appeal to our charitable sides in an attempt to stir up some volunteers.  I like to help good causes (we all have our favorites) and I would always rather donate my time - help out in a practical way - than simply giving money away.  I get more of a sense of satisfaction doing that and I don't feel like I've 'bought' my conscience off!

I’ve helped some charities in a similar capacity in the past and I have to say I found it curiously rewarding in a way I wasn’t quite expecting.  But even if you're just not that charitable then among the many, worthwhile philanthropic reasons to help out like this there are also some perfectly good selfish motivations - or maybe symbiotic might be a better word:

The thing you have to remember about charities is they are almost always terminally short of resources – most significantly cash and people – yet they still have the same IT challenges that a lot of small/medium businesses have. That means you’ve got to work with constraints you won’t be used to because you can’t just buy hardware, you can’t just use something commercial or licensed and you [often] can’t use expensive network connections or hosting. This means you have to be really creative with what you put together and you also have to exercise your end-to-end solution muscles because, chances are you won’t be able to assemble a reasonable team either – it’ll be all down to you.  This will be a totally different environment for you to learn to be effective in because most of us are in the fortunate position of having a budget that’s appropriate to the problems we're trying to solve.  Sometimes this 'plenty' can make you lazy – necessity is, after all, the mother of invention.

I guess the summary is you'll get exposure to a very different size and type of problem to the one you're used to working with every day and, because of the unique constraints, you'll really get to exercise your problem solving skills.  So give it a go, you’ll probably find it quite refreshing and might even learn something too.

Wednesday 28 May 2008

Definition of Scalable

We talk a lot about scalability but what is it that we really mean when we refer to a system or service as scalable?

"A service is said to be scalable if an increase in system resources results in a proportional increase in performance."

To webscale computing increased performance typically means serving more units of work (pages, TPS) but it can also mean larger units of work (bigger datasets, many-where-clauses).

The main reason I like this definition of scalability is it separates the scaled from the scalable.  I've seen plenty of really ugly systems go big - but vertically and at ludicrous capital costs.  Yes, you managed to squeeze scale that thing but that doesn't automatically earn you the right to refer to it as scalable.

To me, scalability is an economic thing as much as it is a technical thing.  You have to build wide and grow complex IP across a commodity platform - but if you can't maintain (reduce!!!) marginal cost while you're at it then you still haven't earned the right to call your system truly scalable.

Monday 26 May 2008

Introducing The Transylvania JUG

Networking is important and the technical community is a valuable source of experience - a way to expand the knowledge available to you to levels beyond your own team.  If you can strike the right balance between business confidentiality and sharing implementation lessons then the benefits are significant.

Outside of Bucharest there are very few events in the IT community in Romania; that's why I am especially pleased to be able to help stimulate this sort of activity by launching the Transylvania Java Users Group for the Java development community in Romania.  Full credit needs to go to Gabriel Pop (one of our Java developers) for organizing the group and Csaba Szabo (one of our UI developers) for designing the group logo:


The first meeting was on Wednesday 21 May at 7pm in Betfair's Romanian office.  The topic chosen for the inaugural meeting was the SpringSource application platform - about 20 local IT professionals attended the session which was followed up by a healthy discussion and - of course - the Champions League final piped into our big video conferencing TV!

It generated enough future interest to become a regular monthly event, so if you're interested in attending or presenting at upcoming sessions, please email me or Gabi.

Friday 23 May 2008

Tips for Successful Offshoring

Over the last few weeks I've posted about how our offshore shenanigans have worked out and about how sustainable the location is shaping up to be. I thought it'd be worth also putting something out about some of the little operational lessons we learnt along the way...

For many organisations like us, offshoring is no longer about the cheapest delivery or adjusting the capital/labor ratio - it is now about the best quality and retaining capacity (although cost effectives is very much a priority). Companies that are successful at this are winning because they are not treating their offshore resources as cheap disposable assets - they are investing in the people, infrastructure and facilities. The economics involved often means ROI on this is significantly better than the same investment spent at home. Given this change in landscape how do you make sure you're still getting the best out of offshoring?

The first thing to do when considering offshoring is to decide what exactly you are going to offshore. Your options typically fall somewhere between product specialization and disaggregation, or in plain English having an offshore center wholly own the entire production chain (SDLC) vs. providing one or more steps (such as development, testing or support) as a service.

We went down the product specialization route. Setting up our offshore engineering center to be capable of end-to-end delivery for a chosen set of our products via agile/SCRUM was the right decision for us; so not everything here will be relevant to everyone!

Here is what made a difference for us:

  1. Set yourself up to be as independent as possible. Even a 2 hour time difference doesn't sound much but when you think about it 9am in the UK is 11am in Romania. You can pretty much lose half a day if you create organizational dependencies in either direction. If there are certain areas you can't achieve total independence in then work one day ahead (plan properly) in the lagging time zone.
  2. Invest in good communication. Personal interaction matters, doing frequent visits both ways and using tools like video conferencing to keep relationships alive in between are good. There are also a few easy, obvious things that really make a big difference on a daily basis; like phones that use internal extension numbers and having voicemail and integrated calendars between offices. Sounds simple but you'd be surprised how many people rely solely on email and don't integrate basic planning tools.
  3. Collocate whenever possible. When you can't collocate make sure you outline the key events in your SDLC (new project kick offs, sprint planning and demos in agile) to your product guys and get their commitment that they'll turn up, in person, to these as a minimum. Traveling a lot gets costly so if necessary coax them over with company apartments to alleviate their hotel budget woes - it's worth the expense.
  4. Get good local advice. An obvious one for professional services like legal and accounting but don't forget about some experience on the ground to help judge cultural impacts and advise on local 'norms' for everything from pay to holidays.
  5. Treat the people the same way as you do the rest of the organisation. Obviously there will be differences in areas like employment contracts and other things you might need to do to keep parity with the local market but these are expected in a mature global organisation. As long as everyone has the same opportunities for career development and can participate in all the same company-wide initiatives that will help teams bond internationally.
  6. Be prepared to fight head office. If you're responsible for any remote office you'll always be subject various well meaning head office types coming up with great new 'process improvements' for the whole organisation. Great if you can adopt them but if they don't quite suit the local environment it can be a nightmare fending off the unified procedure merchants. When the right thing to do is have something locally tailored you have to have that fight if you want the best results from those teams.
  7. Cultural adapting is a 2 way street. A lot of emphasis is put on how the big, bad HQ must come to grips with, and embrace, the unique flavor of a newly acquired territory; but the reality is a blending has to take place. Yes, the greater organisation is absorbing a new culture but that new culture is joining a greater whole and as such needs to expect some changes too.

Clearly there is a whole lot more to it than this but these are the big things we picked up that worked well in our model. Drop me a line if you've picked up something valuable in similar circumstances.

Wednesday 21 May 2008

Shameless Promotion No 6

A few days ago we dragged some giant balloons around London to promote our trading exchange market on the London mayoral election.  It was quite a unique promotion - it even made the BBC =)

Tuesday 20 May 2008

Isolating Failure

A few weeks ago we launched our Sportsbook, our flagship risk-taking product, into the Italian market.  It's a very strategic product both because of how it fits into our international expansion plans and because of how it's built.  We intended for this system to get big fast so we built it wide - distributed, message based and API driven.

Just days after launch the one thing we can always guarantee will happen happened - part of the system failed.  Our bet placement engine hung [bad news] but the rest of the system continued to work [good news] so while our customers couldn't place any bets, they could hit the site, view the markets, register, login, deposit, manage risk... everything else basically.  This kind of failure scenario would see a lot of more monolithic web systems firing out 404s or 500s in the blink of an eye.

This is exactly the benefits you look for when you build decoupled functionality and minimize dependencies, consciously seeking to isolate features from each other.  The next step for us is automatic detection and repair - working to minimize human intervention.

Thursday 15 May 2008

A Better Way to Say 'State'

I wrote a little bit about state in this post and recently came up with a much simpler description.  Here goes:

Tracking state is necessary when a part of my application needs to make a decision based on your previous activity.  That previous activity could be a trail of things you did along the way (collected items in your basket) or a more binary prerequisite test you'll either pass or fail (logged in or not).

Something closely related to state is session, and I picked this up from Jeff Atwood while I was trawling around for something totally unrelated.  I think it's one of the best plain-English explanations I've read on the topic and, since he explains it better than me, I'd encourage you to take a look.

So that's all pretty simple but, as I always say, the basics are what everything's built on - and where this starts to get interesting is when you're building distributed systems.  It's easy to partition stateless functionality but what do you do when you need to track state or keep persistent session information?  Well that's easy, you share your state from a centralised place - that'll see you through for a little while but what about when you need to scale that horizontally?

There are a lot of systems already doing this (AFS, DNS, LDAP, NFS) but there are no standard solutions for distributed state management, these systems all implement their own unique consistency and conflict resolution methods.  We're now seeing a lot of webscale businesses hitting these scalability walls - requirements are forcing the production of customized infrastructure services like Google's Bigtable + Chubby and Amazon's S3.  We're balancing on the edge of this wall ourselves and given what a difficult but rewarding challenge it is, I feel fear and anticipation in equal measure!

Monday 12 May 2008

Regulators - Let's Be Friends

Working in online gambling, something I am often frustrated by is regulatory requirements.  I'm pretty lucky - our compliance department focuses on reaching win-win agreements with the various regulatory bodies that govern our products, meaning we still get to build our business and they still ensure we're looking after our customers and treating them fairly.

That doesn't sound too bad, so why am I frustrated?  Basically, a lot of regulatory requirements place restrictions on the configurations and architectures we can use.  As a businessman I completely understand the necessity, but as an engineer that really grinds - we're no longer selecting the best technical solution to our business problem, we're selecting the best solution we're allowed...

Examples of some of these restrictions are stipulation of where our data centers may reside (more correctly, where certain business processes are executed, which in turn infers DC location to some extent), what services are allowed to listen on certain networks and even extra steps to our SDLC such as code reviews and external auditor testing.

Regulators exist to enforce the legislation that our business is subject to; their role in this governance is to ensure the integrity of our systems (fairness to customers), protect those at risk and, let's be honest, ensure the proper receipt of taxes and levies.  All things Betfair is absolutely committed to.  As an organisation we've always had a strong moral stance on this stuff and I totally encourage raising the bar on these standards - that only strengthens the competitive advantage we get from our investment in these areas.

The only thing I want to see done differently is a stronger focus on what and a looser focus on how.  I can appreciate that the simplest way to make sure people measure up to these standards is a degree of influence over the technical solution, but easier compliance isn't the only effect of this control.  Telling us where to put our servers or what data we're allowed to store can also result in a material increase in the cost of doing business or force us to own a system that's significantly more difficult to scale.  Requiring software patches to be reviewed before they can ship adds an expensive step to the SDLC that might, one day, result in a longer delay before we're able to close a security vulnerability - doesn't that defeat the purpose?

Why can't we operate a trust based system?  Require us to demonstrate certain levels of organisational and technical controls while giving us freedom to choose how we achieve it.  Treat companies which regularly exceed these requirements with a lighter touch - that will allow regulators to invest more in auditing and advising the organisations that need the most help.  I'd prefer a more results-based system that rewards higher internal standards and offers greater support for those struggling to make the grade.  That would let us treat compliance as an important input to our product development rather than a difficult external constraint.

That may not be perfect either but there has be a better way...

Sunday 11 May 2008

Shameless Promotion No 4

Our Italian Sportsbook is live now (whew!) and the marketing leading us up to Euro 2008 have started.  You can see one of our clips here.

Thursday 8 May 2008

Nice Threads

Last month I saw this post on codinghorror.  I wanted to wait and see what sort of discussion it kicked off among developers before I picked it up - and it looks like it's got a fair trail of comments now.

Firstly, let me just say I think it is good (and long overdue) to see software engineers getting this interested in hardware.  Abstract all you want but the systems we build all eventually run on hardware.  A fundamental understanding of IO, of how computers store and retrieve data, perform logical operations and access memory will make the difference between good software and great software.

Secondly, I'm not sure I agree with the statement "dual-core CPUs protect you from badly written software" but I think the right sentiment is there if you add a "certain failure conditions" qualifier.  There is a central topic this post skirts around without really nailing, and that is how vital resource management in a system is.

What we're talking about is good threading behavior.  I was hoping to see a lot more discussion about limiting the lifetime of threads, dynamically managing thread count according to capacity, creating affinity (making threads sticky to certain cores) and assumptions about environment (how much of a CPU/core can you safely say is yours?).  This kind of discussion usually leaks into good memory and disk management - which is excellent - these are the 2 other key physical constraints your systems are bound by.

As a final point I think it's worth noting the difference between a core and a CPU.  Most people will tell you the answer is nothing, but there is one key difference - pins.  In most architectures a core is essentially a CPU via all the right components being present and correct; CU, ALU, FPU, registers etc but a core always shares pins (the little copper legs that connect the CPU to the motherboard) with all other cores on that die.  As a CPU your pins are how you get data on and off the die - so anytime you reach out to memory, non-integrated L2 cache, disk or network etc you do so against contention with all the other cores resident on the same die.  Subtle but critical when bus becomes an issue, and maybe you'll need to think more about affinity when your application gets to that scale.

Monday 5 May 2008

Eachan's 5 Simple Rules for Fiscal Happiness

As you gradually make the move from engineer to engineering management you inevitably take on more administrative duties and eventually this path will yield up responsibility for a budget. This can be a scary thing for techies, but the good news is the world of accounting is no different to anything else we do - you can peel away all the layers of complexity we're so fond of creating and reduce it to a mere handful of things that really matter.

Here are my 5 simple rules for responsibly managing any budget of any size:

  1. Teach your staff about ROI. Get them in the habit of putting things forward to you in a cost/benefit format, it's good practice for them and will save you time having that 1st level filter in place.
  2. Every £1 spent must return more than £1 in value. Sometimes this is obvious and sometimes it can be a little intangible; try imagining yourself explaining it to your boss and if that feels OK it'll probably pass.
  3. [once past number 2] There must be no better use of that £1. If you decide, for example, it's worth spending £2K on attending a conference then have a quick think before committing that cash. You've determined you're prepared to part with £2K so what else could you do with that money, and do any of those other ideas return more business value?
  4. Always manage to the bottom line. There are a whole lot of ways to look at how cash resources are being utilised - headcount, line categories, fixed and variable costs etc but at the end of the day the only material thing that matters is you spend less than you earn.
  5. Don't go over budget. Sounds simple but it's amazing how many people do this all the time; for the best of reasons - they think it's OK if it's for a good reason. It's never OK, cashflow is the heart of every business. If you have a brilliant idea but not the available budget to do it then put together a business case and ask for more.

Managing a budget doesn't have to be that difficult, with a few simple rules and a bit of common sense you'll still have plenty of time left over to see some engineering get done!