The Skype mystery: Why blame the August Windows updates?

Skype finally came out for an explanation of last week's. 36-hour network outage: The August Windows update caused a whole bunch of Skype clients to reboot, exposing a bug in the company's peer to peer network.

What I don't get though, is why didn't this happen in July? Microsoft puts these updates out every month, so why'd the crash happen now?

Like me, Internet Storm Center handler John Bambenek doesn't think Skype is doing a very good job of explaining what happened, so I asked John what questions put to Skype. His questions and Skype's answers are below.

Warning, if you're hoping for a straight answer on any of this, you're going to be disappointed. These answers come from Jennifer Caukin, a Skype spokeswoman. To her credit, she warned me first that there's nobody in the US who can answer questions in any detail today. Maybe by tomorrow we'll get some real answers.

Q -- Why did it take a full 24 hours after patching and rebooting for the

outage to occur?

A:   The disruption was triggered by a massive restart of our user's

computers across the globe within a very short timeframe as they

re-booted after receiving a routine set of patches via Windows Update.

The high number of restarts affected Skype's network resources. This

caused a flood of log-in requests, which, combined with the lack of

peer-to-peer network resources, prompted a chain reaction that had a

critical impact.  The 36 hours required to get the network back up was

due to the time needed to get the proper number of available

peer-to-peer network resources up and running.

OK I don't think she quite got this question. Maybe Skype can explain why the outage didn't start on Tuesday or Wednesday, when Microsoft's patches were released.

Q -- With the reboots distributed across many timezones, how did the end up

buckling your capacity?

Why didn't it happen last month too (and months prior)?

A:   Normally Skype's peer-to-peer network has an inbuilt ability to self-heal, however, the day's traffic patterns combined with the large number of reboots revealed a previously unseen fault in the network resource allocation algorithm Skype uses.  Consequently, the peer-to-peer network's self-healing function didn't work quickly enough. Regrettably, and as a result of this disruption, Skype was unavailable to the majority of its users for approximately two days.

Q -- How do you know it wasn't a DoS?

A: The issue has now been identified explicitly within Skype. We can confirm categorically that no malicious activities were attributed or that our users' security was not, at any point, at risk.

Q -- Has Microsoft been contacted and what is there take on the situation?

A:  Yes they have been contacted. 

Microsoft told me that they didn't do anything different with their updates in August (they've blogged about the issue here). So why did this release kick off the problem? Nobody is saying.

Q - What are the details of the bug that they fixed?  Was it a result of

something added recently?

A: The "abnormality" occurred in Skype software.  To clarify: Skype's peer-to-peer core was not properly tuned to cope with the load and core size changes that occurred on 16th August. The reboots resulting from software patching merely served as a catalyst. This combination of factors created a situation where the self-healing needed outside intervention by our engineers.

What are your plans to avoid similar capacity problems?

A:  This disruption was unprecedented in terms of its impact and scope. We would like to point out that very few technologies or communications networks today are guaranteed to operate without interruptions. We are very proud that over the four years of its operation, Skype has provided a technically resilient communications tool to millions of people worldwide. Skype has now identified and already introduced a number of improvements to its software to ensure that our users will not be similarly affected in the unlikely possibility of this combination of events recurring.

More comment on the thinness of Skype's explanation can be found here and here.--Robert McMillan

To comment on this article and other CSO content, visit our Facebook page or our Twitter stream.
Healthcare records for sale on Dark Web