| Server to Aid Voice Response Applications |
| Dec. 8, 2003 |
Windows Speech Server, a server application for Interactive Voice Response (IVR) applications, has entered its second and final beta and is expected to ship in 2004. IVR applications are typically accessible by telephone and allow users to input information, such as an account number, either by speaking or pressing numbers on the telephone keypad. The application then responds with a combination of prerecorded prompts or with computer-generated speech. Microsoft hopes to leverage its strengths in developer tools to make Windows Server and ASP.NET the preferred platform for building such applications. Not General Purpose Recognition Unlike the general-purpose speech recognition system included in Office, IVR applications do not need to correctly understand arbitrary speech. Instead, they need only to differentiate among a finite set of responses, such as a list of cities to which an airline flies, or the digits of an account number. (For an example of the difficulties of general purpose speech recognition, see the sidebar "Hearing Voices" on page 17 of the June 2001 Update.) Limiting the set of possible inputs greatly increases the likelihood of recognizing speech successfully, avoiding the well-known scenario in which the phrase "recognize speech" is interpreted as "wreck a nice beach," for instance. It also allows IVR applications to work without the extensive user training required by general-purpose speech recognition systems. In addition to traditional phone-based IVR applications, Speech Server allows developers to create applications that can be voice-controlled via a Web browser. A mobile sales professional could use a Pocket PC, for example, to access an internal sales tracking application, and then use voice commands to retrieve the specific sales data for a given customer. Pass the SALT Please Speech Server and its associated tools allow developers to create Web pages that use the Speech Application Language Tags (SALT)—a set of XML tags, such as <prompt> and <listen>—which specify how a user can interact with the application via voice. For example, a developer can use SALT to define the set of voice prompts produced by the application (such as "please state your order number"), the set of voice inputs that it accepts as responses, and how tones from the telephone keypad should be interpreted. SALT tags are processed either on the server or on the client, depending upon how the application is accessed. When a user accesses the application via a telephone, for example, all SALT processing, speech recognition, and text-to-speech functions are performed on the server. When accessed via Internet Explorer on a PC, however, a special add-in to the browser performs these functions. (For an illustration of this architecture, see "Speech Application Architecture".) Tools Simplify Development Microsoft is using the same strategy to simplify IVR development as it has used with mobile application development: provide a consistent set of developer tools that allow developers to build many different types of applications without having to learn a new set of tools for each. Along with the Speech Server, Microsoft is making available a Speech Application SDK (SASDK). The SASDK includes the following: ASP.NET Speech Controls automatically generate the appropriate mixture of HTML and SALT so that the application can be used by both voice-only and multi-modal clients. For example, a DataGridNavigator component allows users to navigate through a set of data by either clicking on buttons or by saying commands such as "Next" and "Back." Specialized voice application tools help developers create and manage the set of prerecorded prompts and define the grammar (the set of expected voice inputs) used by the application. Resources The home page for Speech Server is www.microsoft.com/speech. Details on the Speech SDK and SALT can be found at msdn.microsoft.com/library/en-us/dnanchor/html/netspeechanchor.asp. |