Sunday 28 September 2008

Parallel vs Distributed

The difference between parallel computing and distributed computing is another important piece of theory to keep in mind when designing a system. The concepts are significantly different, but far from mutually exclusive - for example you can run a number of parallel computing tasks on different nodes inside a distributed system.

The confusion, if it exists, arises from what the parallel and distributed concepts share in common - the division of a problem into multiple smaller units of work that can be independently solved with a degree of autonomy.

So what makes distributed distributed and parallel parallel? Both involve doing smaller units of processing on multiple separate CPUs, thusly contributing to a larger overall job. The key difference is in where those CPUs reside (and note that we'll treat "CPU" and "core" as synonymous for our purposes today). Simple answer:

Parallel is work divided amongst CPUs within a single host.

Distributed is work divided amongst CPUs in separate hosts.

How you break down work so that parts of it can be done concurrently, whether parallel or distributed, is largely governed by a single constraint - data dependency. Way back in the day a systems architect at IBM came up with a set of guidelines for assessing the degree to which this can be achieved, and a way to estimate the maximum benefit it will deliver. This simple rule bears his name today.

The key design considerations around parallel or distributed processing are in how you tackle this data dependency. In parallel computing, you need to use synchronization and blocking techniques to manage the access to common memory by the various threads you've split your problem up amongst. Solving the same issue with distributed computing simplifies your memory/thread management within each host, but you put the complexity back into state tracking, cluster management, and data storage.

It's arguably fair to say that, as a rule, parallel computing is more performant and distributed computing is more scalable. When crunching through a lot of work via many threads in one box, everything is done at silicon speeds, your only physical throttle being memory bandwidth and the pins between cores. The downside here being a hard limit to the amount of work you can do concurrently, which pretty much maps to the number of cores you can fit into your system - and scaling that up gets pricy. Doing the same work in a distributed system faces only theoretical constraints to how much work can be done concurrently, the question being how scalable your network and cluster management is, and it's usually cheap to add more systems and hence cores. The downside here being latency, as messages need to traverse networks many times slower than internal system buses, and of course you need a process to collect and reassemble results from all your nodes before you can confidently write your answers down to disk.

Like most technology, there are problems to which one is more suitable than the other, and also like most technology, there are many times when it is simply a matter of taste. Some of us are from big box school and feel more comfortable managing threads and memory space within a vast, single environment. Some of us are from cloud school, at rest amongst a dynamic mesh of cheap, disposable nodes, investing ourselves in the communications fabric between them.

Tuesday 23 September 2008

Don't be a developer, be an Engineer

I do part time lecturing engagements in local universities whenever I can make a break for it from my day job - dodging and weaving through a kind of webscale Logan's Run while being hunted down by product owners and various other stakeholders, each individually as determined and unyielding as a T-800 on steroids, yet all the more terrifying for their emergent pack behavior. It's nice to have friends.

Over the last year I've made a quite a few such regularly scheduled escapes, as a result of which, I've noticed a disturbing trend. It can be summed up in one simple sentence; people are being taught to be developers, not engineers. Let me tell you a little more about that, and why I think it's a bad thing.

Developers vs Engineers...

Firstly, we need to appreciate the difference between a developer and an engineer. Again simple to summarize; a developer writes code, an engineer solves problems. I think it helps to define a developer in relation to an engineer, as I think "developer" is a subset of "engineer", so:

Engineers work as part of the business, helping to define the problem, helping to flesh out the specification, and bringing the practicalities and constraints to the table. They are creative; coming up with a number of technical solutions which solve the business problem, yet commercially savvy enough to evaluate the candidate solutions against the organizations goals to determine the best one. Engineers also appreciate that when it comes time to descend upon the keyboard in anger, writing the actual code is one little part of the whole job; they need to keep themselves and the rest of the team on track, report on progress, manage stakeholders such as operations, external vendors, and IT support staff, and plan for the release, version control, maintenance and quality of the system they're building.... On top of all this, good engineers are also constantly looking at their tools and their environments; working out how to make their own lives easier by improving the build systems, repositories, and automation that supports their work.

Developers take a tightly specified piece of functionality and express said functionality in whatever programming language their proficiency is in. You'll recall that we mentioned that as part of being an engineer.

What's wrong with this...

Is this necessarily a bad thing? It doesn't have to be, but here is why I think it is; we're not increasing the intelligence of our environments at the same speed as we're reducing our appreciation of the complexity they mask. Let me expand a little more on what I mean by this; by the time you drag together a new class in your favorite java IDE a massive amount of stuff just got done for you - even assuming you haven't included any non-essential resources from the toolkit. Machines will interpret your instructions so that they're executable by a given CPU. They'll manage the memory allocation and reclamation. They'll keep track of state and they'll write data for you. they'll take care of IPC and RPC and even hook you up to a database if you want. They'll put things onto the network stack and pull them off, and when worst comes to worst they'll give you a whole bunch of debugging information to ponder. Easy. But this kind of ready-to-roll stack doesn't just occur naturally in the wild; people made these machine-helpers. Those people were engineers, most certainly not developers. So, until machines can make us continually more advanced machine-helpers, we'll need a healthy population of those people to keep us going. Sooner or later everything we do is gates, voltage, magnetism, and wavelengths of light.

If a long term gloomy vision of the future doesn't do it for you, here is something quite selfish to get upset about. Being a .net or java developer is pretty much guaranteed to get you a job somewhere pretty quickly after graduating - but what next? While you're immediately productive from the first day, a lack of appreciation of the science behind what you do (how computers store and retrieve data, how memory is referenced and logical operations performed) will hold you back. You'll always be up against a steeper learning curve than your compsci-based compatriots, forced into learning about new things as a delta from what you know (an implementation like .net), rather than as their own unique systems - and that's before we even mention the wider appreciation of technology as a whole required to make a decent architect (or even make good implementation decisions). That's a serious competitive disadvantage in the workplace.

How did we get here...

We'll I'm not entirely sure we're "here" yet, but I tell you what, we're not far off. This is a trend thats getting stronger, and I think there are 2 key causes:

Firstly, people value immediate employability over long term careers. We're in competitive times, and now more than ever people view their classmates as competition for the next big step - turning that degree into gainful employment. Courseware is changing to accommodate this. We also live in times where people change jobs every few years - and total career changes are becoming very common and happening later and later in people's careers. This means there is a lot less incentive for students to consider the longer term, bigger picture when planing their studies, after all, they might only be in software for a couple of years and then become inspired by writing, or groundskeeping, or zoology. So are students and professors right or wrong here? Perhaps individually they're doing the right thing, but that doesn't mean that the combined effects of their decisions isn't going to hurt our trade.

Secondly, there is an imbalance of power between academic and commercial concerns. We are seeing unprecedented levels of cooperation between the academic world and the commercial world - which I totally, wholeheartedly, endorse by the way - there are joint research projects, internship programmes, technical community workshops and speakers, and a whole bevy of activities which help keep technology relevant and give willing students the ability to see how the science (key word there) they study can be applied to solve real life business problems. Sounds awesome, so why is it going wrong? To put it bluntly, I think academia are bending over for companies. Organizations are obviously only interested in participating in (or sponsoring) the aforementioned events if there is something in it for them - I know, I do it (responsibly!) myself. The most immediately obvious benefits are recruitment (hey, we have to work to get employees these days) and the ability to influence study programmes to favor the needs of the company in question. Here is where I think the academic side of these partnerships has a duty to push back - a balance needs to be maintained between what companies need today and making engineering, as a profession, sustainable. If we load all our papers and study plans to suit next quarter's recruitment targets where will we be in 10 years? I can see that this is the path we're starting down.

Can we save ourselves...

Sure we can - all is not lost. We might not be able to do much about the social pressures that form inputs into people's study plans, but we can at least help them appreciate the value that sound logical thinking, reasoning, and research/presentation skills brings to any career choice.

The single biggest thing we can do today is get the message to tomorrow's engineers - focus on becoming an engineer; being a developer will come naturally after that. You might have to study a bit harder, and look more strategically for your first few roles, but believe me - you are in the strongest possible position after you and your non-engineer peers have had 5 years to average out in the real world.

Faculties need to take action too - don't forget what you're for. It's always nice to have companies interested in working with you but you have a duty to maintain that balance. It isn't in the best interests of any organization to do long term damage to the core capabilities of the engineering industry, so you have to make them understand that's what they're heading for. If they are even slightly responsibly technical citizens (and I can't think of any I know not to be) they'll appreciate this, know that it is in their own long term best interests, and support it. Don't sell papers.

Rise up, good engineers, for we are becoming few!

Friday 19 September 2008

Business Value of Back End Redux

A few posts back I wrote about the business value of good architecture, but I didn't mention at the time what I consider to be the single most important benefit...

The greatest advantage your business will ever have from a rock solid platform is time spent closer to customers.

If you have your availability, security, scalability, maintenance, technical usability, and quality nailed; then you can afford to spend more time right up against your customers - building the features they want, understanding how they use your technology and what they'd like to see in the system.  This is the stuff that matters to them, this is why they'll give you their time, attention, and hard earned currency.

When you are able to apply your best minds and your greatest investments into things that work directly for customers, thats you sweating your platform. The creative, innovative, unique things you can get out that your customers directly interact with will bring you material, easy to measure reward. And guess what, you're only in the fortunate position to be comfortable focusing here because you got that platform business sorted out early...

You won't usually gain business by being more secure, but you will definitely lose it by violating your users trust.  You won't usually get more customers by being more available, but you will definitely turn them away by being down. You wont usually gain business by having a scalable file system, but you will definitely lose it by failing to save and retrieve critical information in busy times.

Customers don't value a good platform explicitly, they just expect it to fundamentally be there. Want proof? Just watch them leave in droves if you don't have it sorted.

Tuesday 16 September 2008

Internal Candidates

I think it is important that existing staff get opportunities to apply for any new roles you're looking for. This can help you to help them with their career plans, will aid retention, and someone internal will hit a new job with an existing network and familiarity with your organization. Quite aside from all this, a lot of countries have legislation requiring you to provide equal opportunities to internal and external candidates!

As well as your usual interview process, there are a couple of additional things I've found it pays to look into. Firstly, make sure you know why they want the role - it's important to filter out those who might be clutching at the role for progression for progressions sake. Thats not a bad thing to ask external candidates too. You should also check into what the internal applicant's succession plan is for their current role in your organization - sometimes you're no better off just by moving a hole around the company.

You should also be paying attention to any patterns that seem to be emerging. Does it look like a disproportionately large number of people from a certain department, role, or manager are always applying for other things in the company? If so, you might have something unsustainable going on that you should look into.

Ultimately, internal candidates are not always the best candidates, and sometimes the right thing to do is bring in new thinking, but it is important they get a fair chance.

Friday 12 September 2008

Heroism Hides the Truth

A while back, my trusty compatriot Dan Creswell posted this which I consider to be pretty good advice, but an even more powerful message when you think about it in terms of bad habits to avoid forming.

You have to avoid these bad habits because they become vicious cycles that are difficult to break. Consider Dan's point on heroics:

"Some managers encourage heroic behavior when they focus too strongly on can-do attitudes. By elevating can-do attitudes above accurate and sometimes gloomy status reporting, such project managers undercut their ability to take corrective action. They don’t even know they need to take corrective action until the damage is done. As Tom DeMarco says, can-do attitudes escalate minor setbacks into true disasters."

My biggest issue with this is that you've hidden the true cost of a piece of work from the business. How can you possibly make valid, prudent decisions that are in the best interests of the business if you don't have a true picture of what things cost? Cost is one of the most basic, and critical, inputs to how a business is run. Even though these heroics are often embarked upon with the best of intentions, hiding the actual cost of a feature actually does the business a huge disservice.

Once you're in this pattern, how to you reset expectations? There is no point at all in talking about the triangle and other real life constraints if you artificially modify the values though this kind of behavior. You also lose credibility with the rest of the organization - because working this way is unsustainable you'll eventually find yourself trying to get things back on the rails, at which point you'll be met with doubt; and fair enough too - you've been doing it so far, right?

It is hard to break out of these cycles and get your team back to a better work/life balance and so, much like smoking, my advice is the best way to quit is not to start in the first place!

Wednesday 10 September 2008

Bootcamp Eases the Migration

I'm taking the plunge and going fully Mac-native, and I have to say, boot camp is really helping make the transition doable - I wonder if this was part of the strategy in the first place?

I like the hardware. It looks nice, feels nice, and you're assured of a fairly good build quality. I like the robustness of the platform, and the no-brainer compatibility (anything apple just works with anything apple in my experience), although the significantly smaller software library is a drawback, albeit a steadily reducing one.

I'm not new to Macs, I've pretty much always had a PC, a Mac, and an experimental-frankenstein-exotic-flavors-of-Linux machine (usually my current PC minus 1) concurrently. I've just never actually done anything of much significance with the Mac - it's been pretty much iTunes and web browsing territory for me.

Now that I've decided to switch my use of Mac and Windows so that I'm using the apple as a primary machine, I have to work out how I'm going to be able to do any work. My problem isn't knowing my way around OSX, it's all the little applications, tools and utilities for the things I do every day that's the issue. The best way to get to grips with something is to do it, but sometimes the stuff I have to do won't wait until I find out what the Mac version of EA or Visio is, or feel my way around a new IDE. In these circumstances, is really handy to have the backup of being able to reboot into Windows, get the urgent task done in the familiar environment, and then go back to Mac - without lugging 2 notebooks around. For example, this post was brought to you by MarsEdit, and I was formerly a Windows Live Writer man...

Actually, an 'equivalency' site would be an awesome idea - something like a "this on Mac is like that on Windows" to make it easier for more people to make the switch. If anyone knows of such a thing out there, drop me a link.

Sunday 7 September 2008

Would you do it in real life?

Caution: may contain traces of rant.

Guess what, the internet world is just like real world.  The dotcom bust was a harsh teacher, before then we believed anything would make money if it was done on the web; and we learned that this was not so.  We learned that even when you run a business online you still needed a solid business plan, you still needed to understand your market, know your customers and control your costs.  In other words, we learned that you still needed to run a proper business.

I am starting to wonder if that lesson stuck.  We received the message from an investment point of view, but how has that translated operationally?

Additional customer-facing channels for traditional operations is a high growth area of the web.  Banks, utilities, transport, retail, and a whole bunch more - all these things started almost exclusively face to face, some had a postal 'interface' with their customers, they added call centers as telephony matured, and now the web is the next way in.  Most companies did pretty well when adding call centers to their repertoire (heavily accented outsourced operators you can barely understand excluded), but in the jump online, a lot of them are much less successful in my experience.

My reasoning is as so; it looks to me like many organization decided, for some (presumably well-researched) reason, that their web booking/ordering/support/purchasing process should be different to their stores and call centers.  Why?  I'm the same guy, I've been to your branches for years, called you contact center, filled in your forms and met your staff.  I like the internet, it's the most convenient channel for me, but why can't I do the same things the same way?  Why isn't my experience of your operation consistent?

An example from yesterday.

I booked an overnight ferry trip through the web.  As part of said booking process, I was offered the option of booking dinner at one of the onboard restaurants, and receiving a small discount when paying for both ticket and meal together online.  The Scotsman in me was totally sucked in by this and I went for it, but alas, it was not to be.  I was offered a choice of 4 restaurants on the boat, and each time I selected one, I was informed by the site that it was fully booked.  Not being fed?  Yikes!  Nonetheless I girded up my loins and prepared myself for a hungry crossing.  As I wandered around the ship that evening (no wifi at sea...) I was greeted by the welcome sight of many empty restaurants.  I was pretty happy about that at the time so off I went for a meal.  Saved from the brink of starvation [OK people that know me, I admit this would have taken years], I began to wonder why I couldn't book online and lay rightful claim to my 10% off.  A brief investigation was in order and I swiftly conducted one...

Anyone who works in the industry knows that failures happen; mistakes, technical faults and data errors - these things I am much more than averagely forgiving about, in the hope that the great circle of BGP karma will ensure my users are with me.  But this was not the case; what I found was quite simply 2 easily-avoidable counts of a process that works well in real life but wasn't translated to the web particularly well.

Count 1 (the root cause); apparently it is not possible to make advance restaurant bookings less than 24 hours before the ferry departure time, as they cannot get the message to the boat in time, and I was booking for that very evening.  Cool - but why make me go through each individual restaurant one by one?  Why mislead me that they're fully booked, when the real reason is simply not enough notice?  In fact, why offer that step at all?  If I was being talked through my booking by contact center staff, would they offer to book a restaurant, then just tell me it's not possible when I accept?  No, so why do it online?

Count 2; when I called up, the helpful lady in the contact center said I should have called them during the online booking process and they could have clarified the reason (less than 24 hours notice) at the time.  Hang on, use the call center to check up on the truth of the website?  When I'm booking over the phone, should I be expected to use the website to check up on the validity of what I'm being told by the operator?  It sounds a little absurd when put that way around, so why engineer the system so that it's necessary this way?

OK machines are dumb, I'm cool with that, but you can do better than this.  It's not difficult to teach your system simple rules about these restrictions (if departure time < 24 hours from current time then do not show restaurant options page) or at least set the right expectations (display a 'not enough notice, please book in person on the ferry' message instead of a 'restaurant fully booked' message).  It's just good customer service.

So if you're considering taking a traditional process online and you're thinking about modifying it slightly for the web, first ask yourself if you'd do it in that way real life.  If it wouldn't make sense when done face to face or over the phone, then it probably isn't a smashing idea online either - don't put your customers through it!

Remember, if it's a dumb idea in real life, it's a dumb idea online.

Wednesday 3 September 2008

The Law of Conservation of Complexity in the Business

Last month, I wrote about conservation of complexity and how it applies to how we design systems. It's also a factor in the organization too, for a company to achieve a certain result there is a minimum amount of effort that needs to be expended by someone. Just like with technology, you can make it a whole lot harder on top of this with excess bureaucracy and orthogonal activities, but there is a certain amount of minimum trouble people need to go to. If someone (or some department) does less, then someone else must do more, else the result will not occur.

A big picture example of this is how adopting agile software delivery, and sticking to the triangle, changes marketing and communications.

One of the fundamental tradeoffs you might have made if you're using agile is exchanging [perceived*] certainty about what will happen in the future for the flexibility to make it whatever you need it to be as you move forward.

This means it's much harder to make promises about exactly when exactly what features will be available. You can usually hardcode a scope and get it when it's done, and you can usually hardcode a shipping date and get what's finished by then - but both are a rare luxury, the exclusive playground of those with deep pockets, easy problems, and a lightweight attachment to reality.

This is greatly upsetting to marketing departments, because their role is to get the word out (ideally in advance), raise awareness, and generally get people into the site. When they can't have what they consider to be some pretty basic information, like an exact date when everything will be fully online with all bugs banished to the ether, that makes it hard for them to ensure cash is rolling in from day 1. They have to smarter, they have to be more creative in how they get the message out and they have to work more closely with engineering.

I agree that could be easier (read: complexity for marketing could be reduced), but guess what? It's hard (read: more complex) for me to predict exact shipping dates, ferociously defend scope, and cope with all unforeseen technology and human resource issues. It has to be hard for someone, and to me that's simply the law of conservation of complexity at work in the organization.

* that one's for you, Ewan.