T-Mobile and Microsoft announced that a Microsoft subsidiary had suffered a “data-service disruption” that wiped out all Sidekick users’ contacts, calendar entries, to-do lists, and photos. In the joint statement, Microsoft/Danger and T-Mobile said its teams were working “around the clock in hopes of discovering some way to recover this information.” However, it noted that the likelihood of doing so “is extremely low.” – From NewsFactor.com (Oct 12, 2009)

Google Search and Google News performance slowed to a crawl, while an outage seemed to spread from Gmail to Google Maps and Google Reader. From ComputerWorld (May 4, 2009)

It’s hard to believe in this day and age that we should hear of data recovery being an issue, isn’t it?  Even the government has explicit input into this worrisome problem.  Yet in the past six months we’ve seen two major Cloud Computing corporate faux pas.  More correctly, here we are talking about Business Continuity or drilling down one level, Workforce Continuity

One of my colleagues, who shall remain nameless, was aghast with these news releases and asked if it would negatively affect the push we are seeing in industry toward consumption based delivery of IT services.  Specifically, if Citrix technology was associated with one such disaster as a part of the Citrix Service Provider program, would we end up with a “black eye” and thus a negative brand implication?

Ironically, when I was working on our CSP TCO/ROI calculator, the question came up about Disaster Recovery and whether or not service providers offer it as a part of their subscription/hosting business.

The next logical question is ‘Do service providers also provide some form of disaster recovery for themselves?’  It’s one thing to back up data for the end customer, but what if the service providers’ whole farm goes down?  Well… this is really a great question, but as we’ve seen from the recent press, it may be a matter of big fish vs. small fish.  For example, smaller hosting/service providers can and do back up their data using larger enterprises such as Amazon’s S3. Why?  The costs are relatively low and the processes relatively easy to use. 

Also, because storage arrays are relatively inexpensive and technologies such automated failover are available, many smaller scale service providers opt to use their own backup and recovery systems on premise.

So one might ask, what about the big guys (Google, Amazon, Microsoft)?  Who provides their data recovery systems?  Well… based on the performance recorded in the press over the past few months, that appears to be a very good question.  There are speculations that because large Cloud Compute companies use (very) low cost equipment (servers and storage arrays) that duplicating real-time data for instantaneous recovery is just a part of their operations.  But is it really?

One of the challenges with scale is that you have to have enough compute power and storage to not only service the masses, but to provide continuity (and backup) in the event of a catastrophic failure.  Will negative press such as that from Google and Microsoft’s “Danger” (what a name for a DR company!) keep businesses from using service providers for their mission critical data?  Anecdotally I’ve got to say no… at least at the SMB level because the data shows an increase in off premise IT services.  But maybe Google and Microsoft need to take a closer look at how they handle these types of services, especially for the large enterprises.

I’ve got a question for you. When was the last time you actually tested your Business Continuity system?  I mean, really tested a failure to see if your processes meet your users’ expectations?  Don’t get caught in the news answering the question like these guys did!