inset
Windows Genuine Advantage Outage
Sep. 10, 2007

A failure of the Windows Genuine Advantage (WGA) validation servers caused approximately 12,000 Windows customers to be told incorrectly that they were running a non-genuine copy of Windows, and it downgraded key Windows Vista features, such as the Aero graphical interface, for affected customers. Although the outage displayed some operational naivete on Microsoft's part, Microsoft refused to call the failure an outage or acknowledge that customers losing access to Genuine Windows features of Vista experienced any reduced functionality. Consequently, the outage calls into question Microsoft's broader ambitions to offer software as a service for critical business functions.

What Happened?

Microsoft originally instituted the WGA program in Sept. 2004 as a voluntary pilot program to cut down on software piracy. It requires users to demonstrate that they have a genuine copy of Windows before they can download certain software from Microsoft Web sites. The program is now worldwide and mandatory and has been expanded in scope to cover most downloads (except patches for security vulnerabilities, which are still available without validation). WGA can also control access to certain OS Vista features, such as the Aero user interface.

Users observed incorrect validation results from the afternoon of Aug. 24, 2007, until the morning of Aug. 25. Affected customers immediately had the "genuine-only" features disabled or reduced in functionality, including Aero, Windows ReadyBoost, Windows Defender (which continues to scan for all threats, but only warns of severe threats it finds), and Windows Update (which offered only "optional" updates). Users also had to put up with a persistent message on their desktop that read: "This copy of Windows is not genuine." The message and loss of features continued until a successful validation was performed, so customers had to revalidate and reboot their computers to restore the genuine Windows experience, once problems with the validation service had been resolved.

According to a blog posting by the Senior Product Manager for Genuine Windows, Alex Kochis, two problems had occurred: preproduction (untested) code was accidentally deployed to production servers, and even after rolling back the update, operators of the service did not notice that the effect of the preproduction code on the validation service continued. Although Kochis acknowledged the problem, he would not call it an outage.

Bigger Implications

All services will suffer failures, but what distinguishes a good service is its reaction to failure. Microsoft's reaction to the validation problem raises questions about the company's transparency and its operational expertise.

Specifically, the Genuine Advantage blog stated that "no one went into reduced functionality mode as a result." This is an odd claim to make, given that the loss of features such as Aero and ReadyBoost is characteristic of a reduced functionality mode caused by a WGA validation failure, according to Microsoft's own documentation. Arguing over the meaning of "outage" and "reduced functionality" makes it look like the company is more interested in playing word games than it is in accepting responsibility for the problem.

Furthermore, the failure raises questions about Microsoft's operational expertise:

  • How did untested preproduction code get onto customer-facing production servers?
  • Why did the service's operators fail to shut it down once problems were reported, since a shutdown would have suspended all validation checks with no noticeable impact on customers? In fact, a full "outage" would have had less customer impact than what actually occurred.

The outage may lead to some changes in the Windows validation process, although Microsoft has little incentive to alter the service as long as customers are still buying Windows. However, the lack of transparency and operational expertise call into question Microsoft's ability to run critical software as a service. Coming at a time when Microsoft is trying to position its Windows Live Web services as a more reliable offering than that delivered by competitors like Google, the loss of confidence is likely to be the most lasting outcome.

Availability and Resources

Microsoft's Genuine Software Web site is at www.microsoft.com/genuine.

The WGA Blog is at blogs.msdn.com/wga.

Reduced functionality in Vista caused by "non-genuine" validation failure is described at support.microsoft.com/kb/925582.