Perhaps the most surprising element of the story comes in the fact that Danger, the company behind Sidekick, is a subsidiary of Microsoft, and the lost content was hosted at a Microsoft data center. Microsoft has invested heavily in its data center infrastructure to support its growing involvement in cloud applications, including a number of planned enterprise-oriented products. The Inquirer reported that the problem was caused by an upgrade gone wrong, which was made without the creation of the necessary back-ups, and Microsoft said that the data center involved was one it gained when it acquired Danger, which had not been updated to support its latest technology.
Of course, Microsoft is not alone in having suffered from problems: in recent weeks Google has suffered several Gmail outages, although in this case rather than having their data lost, users were unable to access services for some time -- thereby coming somewhat lower down the scale, but still hugely inconvenient. In both cases, the affected services have been targeted at consumers, rather than being premium enterprise services. But if Google and Microsoft have had problems with their infrastructure, then the messages for businesses should be clear: be very careful to examine all of the possible scenarios before entrusting mission-critical data to the cloud.
Clearly, the two scenarios experienced by Google and Microsoft would be catastrophic for a corporate. The loss of data could cost thousands in recovery efforts and might impact the ongoing business to such an extent that existing contracts are lost and new deals missed. And downtime of mission critical applications could also have high costs --it was reported that Google had itself suffered, due to its endorsement of Gmail for internal purposes.
The key lesson appears to be that before trusting important applications and data to a partner, whoever this may be, it is vital to ensure that there is a "plan b" for when an issue occurs. While partners are vital in enabling cloud services and the flexibility they bring, with SLAs providing a high level of comfort, ultimately having measures in place to deliver business continuity in the event of a failure remains the responsibility of the corporate IT department.
More enterprise issues
Moving back to the enterprise, human resources and financial applications SaaS player Workday saw its services rendered unavailable for 15 hours -- more than one working day, and enough to cause chaos for businesses relying on the vendor. While Workday was believed to have communicated effectively with customers about its issues, the loss of access to key financial applications may well have caused real problems, and it was noted that the September 2009 outage came close to the end of the quarter, when finance departments are at their busiest.
Information Week raised an interesting issue: it noted that while "it's easy to like Workday; it's the little guy trying to make a difference", the same level of tolerance may not have been exhibited if it had been an issue with "Oracle, SAP, Salesforce.com, or heaven forbid, Google". And as more and more businesses move to the cloud, levels of tolerance of outages is likely to plummet, as the impact is magnified among a wider user base.
For a little light reading, Network World has compiled "a short history of cloud computing outages".