| SharePoint, SQL Server Anchor Enterprise Search |
| Feb. 21, 2005 |
As an increasing amount of corporate information is stored in digital form, organizations need tools to find data on intranets and let customers find data on their Web sites, regardless of where data are stored. Microsoft's approach toward this market opportunity—known as enterprise search—has been to develop a single core search technology which the company's product groups then customize for their own audiences. As a consequence, the company faces the intense competition in enterprise search with a fairly consistent, quickly evolving search technology, but also with some gaps in its enterprise search product line. The company's recommended solution for most types of enterprise search is SharePoint Portal Server (SPS) 2003; SQL Server and Windows Server can also be used for some subsets of enterprise search. However, SPS is costly and complex for smaller businesses, SQL Server requires all information to be stored within the database, and the search functionality built into Windows Server—Microsoft's recommended solution for enabling search on simpler Web sites—is based on older technology. What Is Enterprise Search? Enterprise search tools help organizations (including small and mid-size businesses—not just enterprises) find information in text documents and other unstructured data stored in a variety of sources. In general, enterprise search tools can be used to search internal data sources (such as intranets, messaging systems, file shares, databases, and local hard drives) and Web sites (public or controlled-access), but many solutions are focused on only one of the two scenarios. Enterprise search is similar to Internet search, which is supported by the public Google Web site, MSN Search, and other providers. In both types of search, material must be aggregated from a wide variety of locations, users must have an interface to create queries and review results, and results should be arranged in some kind of order (e.g., hits that are most relevant to the query appear at the top of the list). One critical difference between enterprise and Internet search tools is the scope of material aggregated. Internet search tools attempt to catalog as many public sources as possible but make little effort to catalog material stored behind firewalls, in databases, or on controlled-access Web sites. Enterprise search tools don't need to catalog as many sources of data but must aggregate more types of data and provide access control, so that sensitive information is unavailable to unauthorized viewers. In addition, algorithms that work well on Internet search may not work as well for enterprise search—for instance, Google's PageRank system, which rates relevance partly based on how many other Web pages link to a particular page, is not as effective in a corporate environment where hyperlinking among resources is less common. As a consequence, enterprise search tools use algorithms different from Internet search tools. In addition, enterprise search tools generally let the data owner exercise some programmable control over the results, such as the following:
Today, dozens of enterprise search tools are available, from free open-source tools that perform a single function (usually public Web site search) to costly installations from companies such as Autonomy and Verity. IBM and Oracle offer intranet search as part of larger solutions incorporating data storage, and a growing number of companies are offering software (Fast Search and Transfer, MondoSoft) or hosted services (Atomz) specifically for Web site search. Finally, although Google is best known for Internet search, it also offers search appliances ranging in price from US$5,000 to US$250,000 or more that can be used for both intranet and Web site search. Microsoft Search: One Team, Many Products Microsoft does not offer a single product or solution devoted exclusively to enterprise search. Rather, the company has incorporated search functionality in multiple business products, several of which can be used for enterprise search. For the last several years, a team known as Microsoft Search (MS Search) has been responsible for developing basic search functionality for the company. When a product group wants to add full-text search capabilities to its product, it incorporates the latest available version of MS Search technology, then tailors it for the product's needs. (Microsoft Research contributes expertise and research to Microsoft's enterprise search efforts as well, particularly by improving the algorithms used to determine relevance, but it is not directly involved in product planning or strategy.) Today, MS Search is part of the Office Server Group headed by Vice President Kurt DelBene; this group also includes SharePoint Portal Server (SPS), which is the Microsoft business product with the most recent version of MS Search technology. (For details on the common functions provided by MS Search, see the sidebar "Important Microsoft Search Technologies".) Products that have incorporated MS Search technology include Site Server 3.0, Windows Indexing Service (first included as part of Internet Information Services [IIS], Microsoft's Web server, and later built into Windows Server and XP), Exchange, SQL Server, Office, SPS, and—most recently, in Dec. 2004—MSN Desktop Search. Many of these products share common features, such as the ability to index the same file types (through a technology called IFilters), and a common SQL-like query language (used to translate users' requests into queries that the search engine understands). At the same time, because each product incorporated a different version of the MS Search technology, is intended for different uses, and has its own update schedule, their search-related capabilities vary. Examples of these differences include the following:
(For a chart showing currently available Microsoft products with MS Search technology, see "Search in Microsoft Products".) These disparities and a lack of clear, up-to-date documentation about search in Microsoft products can make it difficult for organizations to decide which Microsoft technology is the best choice on which to build an enterprise search solution. SPS, SQL Server Lead the Way Microsoft positions SPS 2003 as its preferred enterprise search solution. SQL Server can be used in instances where all the information to be searched is stored within SQL Server, and its full-text search capabilities will be significantly improved in the next release. SPS 2003: Recommended Solution As of Jan. 2005, Microsoft recommends SPS 2003 for enterprise search, including search of internal resources and search of an organization's public-facing (including controlled-access) Web sites. SPS 2003 incorporates the most recent version of Microsoft's search technology available in any business product, meaning that it indexes the most types of data, supports the most languages, uses the most advanced ranking algorithms, and has the most sophisticated tools for administrators to manually influence search results. However, SPS is costly for small businesses, particularly as a solution for Web site searches. SPS 2003 costs about US$4,000 per server in most volume programs, but customers must also buy SQL Server, which starts at about US$4,000 per processor. For search of internal corporate resources, the organization must buy Client Access Licenses (CALs) at a base price of US$70 per employee. Public Web site search is even more expensive, as a US$30,000 External Connector must be purchased for each SPS server. In addition, customizing SPS's search functionality or enabling it to search certain types of information (such as material stored in a corporate application) requires fairly intensive development. Smaller businesses that merely want to enable search on a simple Web site can turn either to the Windows Indexing Service in IIS, which uses older and less sophisticated algorithms than SPS and requires a skilled developer to customize, or to alternatives such as the Google Mini appliance (US$5,000) or hosted services offered by companies such as Atomz. Combination with Content Management Server (CMS) likely. Microsoft has an opportunity to address these issues with the next release of SPS, which is expected to appear about the same time as the next version of Office (code-named Office 12)—probably in 2006. This next version of SPS will probably be merged with CMS, Microsoft's product for creating, managing, and maintaining complex Web sites, given that the CMS product group was recently moved into the SPS team. (Today, CMS can take advantage of SPS's search functionality with the assistance of a free Connector released in early 2004.) CMS comes in a Standard Edition for smaller businesses that's significantly cheaper than SPS: only US$7,000 with no CALs or External Connector required. In addition, although the current version of CMS is fairly complex to program (it requires programmers with knowledge of ASP.NET to create the initial site templates), Microsoft says the next version will have more out-of-the-box capabilities to make it useful without custom programming. What about Windows SharePoint Services (WSS)? Although Microsoft does offer full-text search within WSS, a service that comes with Windows Server 2003 and is used by SPS, this search capability is limited to material stored on a single WSS site and is not suitable for intranet search. In theory, a smaller business could use WSS to build a searchable Web site, as long as all the information the business wants on the site can be stored in WSS team site pages on a single server. Cost might be an issue, though, as full-text search in WSS requires companies to buy SQL Server rather than using Microsoft SQL Desktop Engine (MSDE), the desktop version of SQL Server that comes bundled with the product. SQL Server 2005: Significant Improvements SQL Server was one of the first Microsoft products to support full-text search, beginning with SQL Server 7 in 1998. Any application that uses SQL Server as a back-end data store can tap into its full-text search capabilities. However, unlike SPS, SQL Server can only index data that's stored within the database itself—it cannot crawl external sources. Additionally, while SPS provides a built-in search interface on every portal page, a developer using SQL Server as the basis for search must design a user interface and find a way to translate user entries into well-formed SQL queries. So, although SQL Server can be used for some subsets of enterprise search—for example, to enable the search of Web-based information (e.g., Web pages, items in a product catalog) stored in SQL Server databases or material in a content management system—SPS remains Microsoft's general-purpose enterprise search product. SQL Server 2005, expected in summer 2005, will be updated to Microsoft's most recent search technology and will receive several enhancements as a result, including the following:
These improvements will find their way into other Microsoft products, including the next versions of SPS and Exchange. Microsoft has also changed the way SQL Server 2005 gathers and indexes data, leading to significant performance improvements—the company claims to have cut the time necessary to build a full-text index by more than an order of magnitude. In addition, each instance of SQL Server will have its own dedicated instance of the full-text search service, greatly improving manageability. Previously, the search service on a machine was shared among SQL Server, Windows, and other applications, meaning that updates to the service (such as new service packs) to support one application could alter the behavior of SQL Server's full-text search. Other Products The recent burst of search-related activity at MSN has confused the picture further: MSN Desktop Search is currently the company's most advanced and useful product for end-users to search local and networked information. Meanwhile, Microsoft has revealed no concrete plans to improve the search capabilities in two of its most popular products, Office and Windows. Nonetheless, based on the current state of their search technology, update history, and roadmaps, it's reasonable to speculate as follows: MSN. In Dec. 2004, MSN released a free Desktop Search tool, based on underlying technology developed by MS Search, that lets users find material from a wide variety of data sources (including e-mail messages and documents) stored locally and on networked drives. However, this tool is not suitable for centralized enterprise search for several reasons: most notably, administrators cannot control how often the tools index sources on file shares. MSN has no stated plans to turn Desktop Search into an enterprise product. In all likelihood, instead of revamping MSN Desktop Search for enterprise search, Microsoft would improve search in Office 12 and Exchange 12 to offer similar functionality as MSN Desktop Search. Office. Microsoft has not officially announced plans for updating search in Office 12. However, given that SPS is part of the Office Server group, any updated search features in Office 12 will probably be coordinated with search in the next version of SPS. For example, Office today creates a local index for information stored on the user's PC; Office 12 might provide an integrated interface to search both this local index and SPS's centralized index (for networked resources) at the same time. Indexing Service and Windows. Microsoft suggests that the Indexing Service is still appropriate for enabling search on Web sites where content is exclusively stored within IIS, and for enabling the search of content in a single Windows file share. However, because the Windows Indexing Service is based on an older version of Microsoft's search technology, and because no updates have been announced for the next version of Windows Server (code-named R2 and due in late 2005), companies wanting to build any more extensive enterprise search solution should consider alternatives. The MS Search team says that Microsoft is committed to improving search in "Longhorn," but no further details are known. The company might improve the Indexing Service in the Longhorn client to offer better search of local files, then update the search function within WSS in Longhorn Server (expected in late 2006 or 2007) to better support enterprise search. Finally, WinFS, a new file system originally planned for Longhorn but now expected in 2007 or later, will enable improved search of material stored in the file system—for example, it will let users or applications apply custom properties to files, such as the subject matter of a document or the geographic coordinates of a digital photo. Although not sufficient as a solution for enterprise search, which must aggregate data stored in many sources (not just the file system), WinFS will certainly be a factor in the evolution of enterprise search. Resources SPS's search functions are described in a downloadable white paper at www.microsoft.com/sharepoint/server/techinfo/administration/search.asp. Likely integration between CMS and SPS was covered in "New Content Management Pricing, Strategy" on page 18 of the July 2004 Update. The CMS 2002 Connector for SharePoint Technologies is available for download at www.microsoft.com/cmserver/default.aspx?url=/cmserver/downloads/sharepointconnector. SQL Server 2005's improvements to full-text search are described at msdn.microsoft.com/SQL/2005/2005Articles/default.aspx?pull=/library/en-us/dnsql90/html/sql2005ftsearch.asp. A useful blog tracking full-text search in SQL 2005 and other Microsoft products is at spaces.msn.com/members/jtkane/. A command-line tool for administering full-text search in Exchange 2003 can be downloaded from www.microsoft.com/downloads/details.aspx?familyid=46fd5644-bd0d-4cfa-95f8-64ba34bde6a7&displaylang=en. A broad overview of MS Search technologies is at eu.microsoft.com/technet/prodtechnol/sppt/sharepoint/evaluate/featfunc/mssearch.mspx. MSN Desktop Search is described in "MSN Launches Search Tools" on page 22 of the Jan. 2005 Update. The delay of WinFS is discussed in "Longhorn Components on Windows Roadmap" on page 3 of the Oct. 2004 Update. |