The SysAdmin Network

No more hiding in the server room

Robert Chipperfield

Clustering and replication: the data loss vs. downtime tradeoff

Yesterday I was speaking to a potential customer who was interested in how email archiving could be integrated with their Exchange replication technology. Which got me thinking...

How do you trade off potential data loss against extended downtime in a replicated / clustered environment?

Simplifying hugely, there's two ways you can handle this: you can either replicate data instantly, and not report that an action is complete until all replicas are consistent, or you do it asynchronously, and let the client know it's complete as soon as your active copy is compete.

Synchronous replication is safest, because you guarantee that all copies are always consistent, and contractually, if you've told a client (think here: remote mail server, email client, but it generalises) that you've done something, you know it's hit the disk everywhere, and so things would have to go pretty badly wrong for that data to be lost. But with safety comes a performance hit: you need to accept the data, store it locally, push it over to the other copy, maybe over a slow WAN link to a DR site, wait for that server to acknowledge it, and only then can you allow the request to complete.

Asynchronous replication can be much faster: you tell clients everything's OK as soon as you've committed it locally, then periodically ship those actions over to the other copies, which become "eventually consistent". That might happen once a day, once an hour, or once every few seconds or minutes. Obviously the more often you do it, the less data you stand to lose if your primary server goes up in literal or metaphorical flames. Whatever you do, though, there's a window of vulnerability that exists before it gets copied over.

And here's the problem: at what point do you decide to fail over to your DR site or server? If you fail over automatically, there's the risk that a temporary glitch causes the failover and loses a few minutes' data that hadn't yet been replicated*. But if you're cautious and require manual intervention to initiate the failover, you extend the amount of down-time users see.

So I'm curious - how do you handle this trade-off? Not just email, but other systems as well...

* Again, this simplifies somewhat: if your replication technology is intelligent, you might be able to merge the two sets of data back again after the fact. I believe Windows DFS does this, for example, but that's not always possible, especially if the changes are conflicting.

Views: 21

Tags: DR, failover, replication

Dave Hall Comment by Dave Hall on January 27, 2010 at 12:41pm
I've wasted countless hours trying to come up with the perfect solution to this. :)

The best system I came across was Apache CouchDB, which followed the "eventual consistency" model you described. This meant that the system felt snappy to the end-user, but had the security of being able to replicate asynchronously to other nodes. The update interval was pretty much instant for small clusters, making the risk of data loss small.

The way the system was set up, however, meant that you could use a quite sensitive automatic failover and merge the changes back together automatically. Each record was identified by a UUID, so nodes could operate independently in a true multi-master system.

It all sounded awesome, but then I ran into the brick wall that everyone seems to have run into: it's impossible to enforce unique constraints in a distributed multi-master system. There are about a million blog entries on the subject, some for Google Bigtable, some for CouchDB, but all on the same topic. The best you can ever get is 'pretty certain', which is always a little worrying.

Which is why pretty much everyone is still operating on a happy mix MySQL replication and hope.
Dave Rovai Comment by Dave Rovai on February 2, 2010 at 2:07pm
Instead of clustering have you looked at the Stratus ftServer Family (www.stratus.com) of completely fault tolerant hardware (memory, processors, NIC,s, hard disks, fans, power supplies, etc)? All transacations are run through both systems simultaneous and if any hardware starts to fail the failing hardware bows out gracefully of the transactions. They also provide 24 hour support services that monitors your hardware and overnights parts out that start showing signs of failure. If you don't want the support services you can also look at the product line from NEC.
Robert Chipperfield Comment by Robert Chipperfield on February 2, 2010 at 2:50pm
Hi Dave,

Thanks for the comment - basically that's the "synchronous" flavour of replication I was talking about above, where you make sure everything hits both / all nodes before accepting the transaction (in the operations sense, rather than necessarily the database sense) as committed.

In the case of Stratus and the like, you're avoiding (most of) the cost of synchronous replication by keeping the redundant components close. Which is great for speed, but of course doesn't help if your data centre has a liquid ingress event :-).

Dave Bermingham tweeted a couple of interesting links on the way some technologies handle it, Exchange in CCR mode, and DataKeeper.

Cheers,
Rob

Comment

You need to be a member of The SysAdmin Network to add comments!

Join The SysAdmin Network

© 2012   Created by Elizabeth Ayer and Michael Francis.   Powered by .

Badges  |  Report an Issue  |  Terms of Service