Friday, February 6, 2009

9 Ways to Face the Perils of Cloud Computing

The cloud *might* go up in smoke.
Are you sitting on it? Be smart.


I like to think I'm pretty cautious when it comes to safeguarding my stuff, but I admit I was taken by surprise when the spiffy, cluefully designed social bookmarking system ma.gnolia underwent a pretty catastrophic FAIL event a little more than a week ago.

Meaning: The service went down. And: the data (about half a terabyte of it) was lost. And: there was a backup, but it went down the drain, too.

What happen? Did somebody set us up the bomb? Actually, no. The file system got corrupted, which in turn corrupted the database backup. Ma.gnolia founder Larry Halff has admitted his backup system wasn't robust enough.

Larry must have had a hell of a week. This event is of course bad enough for him and for ma.gnolia, but it is also a nightmare scenario for the heavy users who had thousands of links, annotations and ratings in there.

On a personal level, I had been collecting bookmarks there for a social software course that I'm developing. While I hadn't stored a zillion links in ma.gnolia, it's still a disappointment. But really, I should have known better than to assume that the services I rely on were properly backed up and just couldn't melt down or disappear. As Christopher Null writes, "you can't trust an online service any more than you can trust your hard drive not to crash. Sure, the vast majority of the time everything will be fine, but eventually all technology products fail, and even the best safeguards are often imperfect."

From a user standpoint, there are two things to think about regarding incidents like these. The first obvious one is how to recover from such a loss, and the second one is how to guard against eventual occurrences. Let's look at each one in turn.

A. The service died on me! What now!?

It turns out that there might be a couple ways to at least partially recover when your content vanishes.
  1. Google and other search engines keep cached copies of publicly accessible web pages. If you're lucky, the googlebot will have crawled and saved a copy of pages with your content. Identify keywords that appeared on your pages or in URLs. Then do a search within the site and click the "Cached" links in the search results. (I found a few of my lost bookmarks that way.)
  2. The Internet Archive crawls the Web and makes what it found available a few months later through its Wayback machine. You may find some of your older stuff there, though in my experience the archive is quite incomplete (for good cause, if you consider the size of the Web!).
  3. Some of the aggregators out there might have picked up your content if it was available as a feed. Here's an example: The excellent Microcontent News weblog had a short life, but the site has been all but dead for several few years. However, Bloglines still has a copy of the feed, as it was left in 2003. (Unfortunately this one was a teaser feed, with only the first sentence of each post; had it been a full feed, the content would be there.) Over at HubLog, Alf Eaton has explained how to get at a feed's historical items using the Google Reader API.
  4. Some sites republish feeds for fun and/or profit. Even if your original feed is no longer accessible, you may find traces of your content elsewhere. To find it, take a clue from Obi-Wan: Use the Search, Luke.
B. How do I protect against an eventual meltdown?

Here are five ways to minimize the impact of a service going down on the ground:
  1. Use several services redundantly. For instance, social bookmarking service Diigo has a feature baked in that lets you post to multiple services. If one goes down you still have your links in other places.
  2. Save local copies to your hard drive. For pictures and videos, you usually start from a drive, so this is a non-issue. But for blogs and social bookmarks, very often the "original" copy is stored with the service you're using. If it offers integral exports, use them. If not, local backup is less practical -- but have a look at #3.
  3. Produce full feeds of your content and subscribe to them in local or Web aggregators. I'd have to do more research to see which ones keep items for the longest time, but I'm pretty sure that some local aggregators can be set to never throw anything away (If you know of some, please comment!)
  4. Use an on-demand archiving service like (the amazing) Webcitation.org, which will keep a copy of any page on demand.
  5. Make things public as much as you can. Public information naturally tends to get crawled and replicated. The public bookmarks on ma.gnolia have seen a better rate of recovery than the private ones.
  6. (extra tip! thanks to Dave in the comments) With some services, you can email your updates to yourself. Just store those emails and you've got a safety copy.
Now, to be honest, this is the kind of advice that I read but rarely apply. Getting hit close to home will hopefully make me take one good look at all the services I'm using and to make sure I'm following at least some of my own advice, lest I burn myself again!

4 comments:

  1. I'm a little obsessive about backup. My main blog is on WordPress, along with a couple of private ones, and I set them to email me a weekly backup (posts and comments). In addition, once a month I use FTP and my external hard drive to back up everything (including all the WP files). I kick the backup off, then can do other stuff while it hums along.

    However, even as my use of delicious grows, I hadn't though of backing those up, so your post is timely.

    Backup's like exercise and healthy eating: ideas whose time never quite seems to be right.

    ReplyDelete
  2. Very good stuff. Thank you for writing and sharing.

    Troy
    www.nibipedia.com

    ReplyDelete
  3. Thanks Dave! I've added the email-yourself strategy in my post.

    ReplyDelete
  4. See the data independence Checklist ;)
    http://lists.w3.org/Archives/Public/www-archive/2008Nov/0010

    ReplyDelete