X
2011

MICROSOFT AND AMAZON DATA CENTRES STRUCK BY LIGHTNING

August 10, 2011 0

On Sunday there was a lightning which struck in Dublin. This lightning knocked out two data centres which were of Microsoft and Amazon. The backup systems were even disabled. It resulted in downtime for twelve hours approximately.

Amazon reported that a transformer was struck by the lightning resulting in a series of problems for the company. There was fire at first, followed by an explosion and later on an outage of power in totality. That was not all. Amazon has a bad time out when its Elastic Block Storage (EBS) and Elastic Cloud Computing (EC2)were even seriously affected. The software giant even had to face similar problems as its BPOS service was even down.

Amazon’s Service Health Dashboard said that the power of the bolt was severe and it can be known looking at the phase control system which was partly disabled. This control system synchronizes the generator’s backup. It stated that even after twelve hours of the issue occurrence, 100% access was still not restored. The connectivity investigation began at around 03:00 GMT.

eWEEK Europe UK, as reported by Microsoft noted that when lightning struck Amazon’s data centre, they even witnessed issues relating to connectivity which was caused by a widespread power outage. The connectivity issue was in regards to its customers who were using BPOS in Europe. However the services in their case was restored after seven hours of the problem had occurred. Last year even there was a similar problem which was faced by Microsoft for BPOS. The outages were worldwide for it. For the time being, Microsoft is working hard to shift the customers to Office 365 which has been launched recently.

A week back, an article was posted on the Daily Telegraph website which said that the software giant’s data centre in Dublin has an inclusion of a “comprehensive system of secondary electricity sources”. It even assured that in case of a “major catastrophe”, the switching would be done seamlessly to Amsterdam. There was no information as to was this system used or not, but looks like it was not used.

Even though 75% of the EC2 instances had been recovered by 03:00 GMT, it indicated that if the remaining EC2 instances and EBS volumes have to be restored, then manual intervention was necessary. After the problem, it was said that the issue would take 24-8 hours to be resolved. It continued saying, “In some cases EC2 instances or EBS servers lost power before writes to their volumes were completely consistent.” Amazon however promised that in a few selected cases a snapshot recovery would be provided to customers instead of getting their volume restored. This would help them in validation of their volumes’ health before they return it to service.

Some of the websites which were affected because of this disruption were, the Edinburgh Book Festival and the Telegraph’s puzzles page. SLA (Service-level agreement) terms are usually kept private, but in the given case it can be reasonably assumed that considering future downtime in the current year, uptime of Microsoft is 99.92%, while Amazon stands at 99.86%. they would therefore be liable to pay a certain sum of penalty to their customers on an assumption that SLA % held by most of them is 99.99%.