I'm into availability at the moment, so much so that I'm leading an organization-wide change initiative that targets cultural, technical and process habits (busting established, forming new) in order to deliver a more consistent experience. Our goal is to always have something to offer our customers regardless of any maintenance we're performing or failures we're experiencing.
This is how I discovered The Great Divide. The namesake of this post is that gap between the infrastructure and the software - because of which we offer our product to our users much less often than we could.
Here is how it works:
We've got a pairs of firewalls that can fail over while maintaining session state. We've got tiers of load balancers that can reroute traffic around down network devices. We've got clustered databases that can move active systems between nodes in a couple of minutes.
But guess what else we've got?
We've got applications that lose session information without contiguous sequence numbers. We've got applications that cant match users to activity if their traffic suddenly comes from another IP address. We've got applications that depend so heavily on their databases that death occurs within a few seconds of separation.
You don't get any partial credit in product uptime - your customers will not award you a bonus point if your site is down but your servers are up. If they cant log in they cant log in, if they cant place orders they cant place orders; they're quite a binary bunch.
For us product = infrastructure + software + operational know-how to run it. We need to stop worrying about server/network availability and start worrying about product availability - because guess what, that's what our customers are measuring us on.
Close that gap and let your customers see the benefit of those cool devices.