| A Bug's Life |
| Jun. 30, 2003 |
|
Bugs are an unfortunate fact of life in software development, and Microsoft, like all large software companies, uses a well-defined, companywide process for tracking and fixing them. But even with rigorous design and code reviews, Microsoft must balance its desire to fix every bug with the reality that even the smallest change can lead to additional, potentially more severe bugs. Customers involved in prerelease programs can improve their chances of seeing their reported bugs fixed by understanding this process and submitting detailed, reproducible bug reports early in the product development cycle. How Microsoft Handles Bugs A bug, simply put, is a flaw in a software program or hardware component, and although the term is often attributed to Admiral Grace Hopper and the appearance of a moth in the Harvard Mark II, it was used in print to refer to faults in electrical systems as far back as 1896. Tracking bugs is a standard part of almost every software development methodology. At Microsoft, a bug goes through several distinct steps. Found. Bugs can be found in many ways: by a testing team during the normal test process, by other members of the product team during ad-hoc use of the product, by beta testers, or in the worst case, by users after the product ships. Entered into bug-tracking system. Once a bug is reported, it is entered into an internally developed database application (known as "RAID"). RAID provides a way for bugs to be recorded and assigned to developers, and provides useful metrics, such as the rate at which bugs are being found compared with the rate they are being fixed. RAID is so critical that it is one of the few pieces of the software development process to be standardized throughout Microsoft—all product groups, from OSs and device drivers to games and online services, use the same database system, terminology, and methodology to track bugs. In RAID, each bug is automatically assigned a unique bug number. Part of entering the bug is describing, in as much detail as possible, the steps necessary to reproduce the bug. Prioritized. Once entered, a bug goes through an initial triage process, usually led by the program management team, during which it is sorted and evaluated. The most important part of the triage process is assigning a severity level and priority level to the bug. These two factors play a large role in determining whether or not the bug will be fixed. Severity is an attempt to quantify, on a 1 to 5 scale, the impact of the bug on the user. Bugs that crash the product or otherwise result in the loss of user data are the most severe and are thus assigned a severity of 1. Less severe bugs are those that don’t crash the product but which also don’t have obvious workarounds. The least severe bugs are those which are really suggestions for future product improvements or new feature requests. Priority also uses a 1 to 5 scale and measures how important the bug is, usually based on the likelihood of it being encountered. High-priority bugs (assigned a priority rating of 1) are found in mainline customer scenarios and therefore are likely to be encountered by a large number of users. Lower priority bugs are those found in more obscure areas of the product or which require a very complex set of circumstances in order to appear. For example, "Word crashes when I try to print" would be a severity 1 bug (because the product is crashing) and a priority 1 bug (because almost everybody prints). Assigned and resolved. Once the initial severity and priority have been set, a bug is assigned to a developer. The developer will examine the bug and try to determine what is causing the behavior and resolve the bug in one of the following ways:
Verified. Once a developer has resolved the bug, it is assigned to the testing team for verification. If the bug was fixed, the testing team will verify that the product now works correctly. If the bug was resolved in some other fashion, the testing team has a chance to disagree with the resolution and reassign the bug back to the developer. For example, if a bug was resolved "Not Reproducible," the tester might be able to supply additional information. Closed. When the tester finally agrees with the resolution, the bug is closed. However, even closed bugs can reappear later in the product cycle—a situation called a "regression." Regressions are given special attention because they indicate that a bug was more complex than initially thought or that an area of the product is somewhat fragile. Sometimes the test team will develop specific tests that look for regressions in previously fixed bugs. Why Bugs Persist Although it is ideal to fix every bug in every product, such perfection is impossible to achieve for several reasons. First, not all bugs can be found. Although rigorous planning coupled with design and code reviews can reduce the number and significance of bugs that end up in the product, they can’t prevent them entirely. The number of combinations of hardware and software coupled with the ever-growing set of features included in software products is simply too large to guarantee that all possible test cases have been covered. Second, not every bug that is found can be fixed. It is a fact of software engineering that every change made to an existing line of code runs the risk of breaking another line and introducing new bugs. Early in the product cycle, developers tend to fix every bug that is found because there is plenty of time to address the new bugs that may be caused by the fix. Later on, however, teams must make calculated trade-offs to determine whether the risk to customers of shipping with the bug is greater than the risk of introducing new bugs, and whether the product schedule needs to be adjusted to accommodate additional testing. At some point, control over what bugs get fixed moves from the individual developer to a set of team leaders (often called the "war room") with experience in making these trade-offs. At a certain point, only the most serious "show-stopper" bugs are fixed. What Customers Can Do In the end, Microsoft, like all software companies, makes decisions on what bugs to fix based on how many customers are affected and how seriously the effects are, but in doing so it must accept the fact that attempting to fix each and every bug that is found will not result in a more reliable product, but only one that is late to market. In short, as the title of a famous essay on software engineering by Frederick Brooks of IBM indicates, there is "No Silver Bullet." Given that fact, there are several steps customers can take to influence the process: first and foremost, be as detailed and as accurate as possible when submitting bug reports to the company. Bugs that contain incomplete information are more likely to be resolved "Not Reproducible" than ones that include detailed information. Second, bugs that are reported early in the development cycle are more likely to be fixed than those reported later in the cycle because of the need to lock down the product and avoid churning the source code and accidentally introducing even more bugs. While show-stopper bugs will always be fixed, customers reporting less dire defects late in the cycle will likely have to wait for a service pack in order to see their bugs addressed. Resources For information on how Windows Error Reporting is helping Microsoft gather information on bugs, see "Windows Error Reporting Tracks Down Bugs" on page 3 of the Jul. 2003 issue of Update. Many books have been written about software engineering, but none have had the influence and longevity of Frederick Brooks Jr.’s The Mythical Man Month: Essays on Software Engineering, ISBN 0201835959, first published in 1975. Readers looking for a more lighthearted and Microsoft-specific view of software development should read Dynamics of Software Development, ISBN 1556158238, by Jim McCarthy. For more information on Microsoft’s development practices, see "Bill Gates Reviews Focus Product Development" on page 29 of the Apr. 2003 Update and "Program Managers Drive Product Design" on page 20 of the July 2002 Update. |