inset
SharePoint 2007 Improves Enterprise Search
Feb. 19, 2007

SharePoint Server 2007 offers improved enterprise search capabilities, which enable search of resources on a network, such as intranet sites, file servers, and business applications. These capabilities and a new low-cost edition for search will help SharePoint Server compete with more expensive solutions from enterprise search specialists, such as Autonomy, as well as low-cost enterprise-search providers, such as Google and IBM/Yahoo. Lack of desktop search integration and a confusing array of licensing choices are sore points.

Microsoft's Approach to Enterprise Search

Enterprise search tools help organizations (including small and midsize businesses—not just enterprises) find information in unstructured data sources, such as text documents and messaging systems, and in structured data sources, such as transactional databases, and business applications.

Although these tools often employ a user interface similar to Internet search tools (such as Google, Yahoo Search, and Microsoft's Live Search), enterprise search involves some critical differences. Enterprise search tools deal with smaller volumes of data but must be able to understand more types of data (e.g., information in databases, or file formats that are uncommon on the Internet but that may be widely used within a particular organization), and must provide access control to ensure that information restricted to certain groups or management levels is not seen by all searchers. In addition, enterprise search tools rank results in a different way and give data owners some programmable control over those rankings.

SharePoint Still Central to Enterprise Search

Microsoft has long offered search capabilities in many of its desktop and server products, including the Windows client and server OSs, individual Office applications, and Exchange and SQL Server. Organizations could approximate some enterprise search features by using these products in combination (e.g., Exchange and Outlook to search e-mails, SQL Server to find information in corporate applications or database-driven Web sites).

However, SharePoint Portal Server (the precursor to SharePoint Server 2007) was Microsoft's first true enterprise search product, able to crawl a wide variety of enterprise data sources and make them searchable through a single user interface. (For background on Microsoft's search offerings, see "SharePoint, SQL Server Anchor Enterprise Search" on page 3 of the Mar. 2005 Update.)

SharePoint Server 2007 is built on top of Windows SharePoint Services (WSS) 3.0, the latest version of Microsoft's free Windows Server component for Web-based collaboration, and many of the improvements discussed in this article are available in WSS 3.0. In addition, both products offer the same end-user experience and administrative interface, which was not the case in past versions. However, as with past versions, WSS 3.0 can be used to search only a single WSS server and is therefore not an enterprise search tool.

Aiming for the Middle

Microsoft's goal is for SharePoint Server to occupy a middle position between the two categories of enterprise search tools:

  • Expensive solutions from search specialists such as Autonomy or Fast, or from traditional enterprise vendors (such as IBM, Oracle, or SAP), which often bundle search into broader data storage or portal solutions
  • Inexpensive enterprise search products for smaller businesses, such as Google's appliances or IBM's Omnifind Yahoo Edition, which was introduced in Dec. 2006 and is available in a free edition.

To compete with high-end solutions, Microsoft will offer many of their features at a significantly lower price. In particular, enterprises that already have large investments in Microsoft software can probably add enterprise search for very little extra money: the most expensive part of the SharePoint license is typically the per-user Client Access Licenses (CALs), which are already included in the Core CAL purchased by many enterprises.

To compete with low-end solutions, Microsoft is also introducing a lower-priced, search-specific version of the product, SharePoint Server 2007 for Search, that requires no CALs. Although SharePoint Server 2007 for Search is more expensive than some low-end enterprise-search competitors, it offers more features than these products, particularly more fine-grained administrative control over search results.

What's New in SharePoint Server 2007?

Enterprise search in SharePoint Server 2007 benefits from architectural changes. Most notably, in past versions, each SharePoint portal site maintained its own search index. This meant that several servers could be crawling the same data sources simultaneously, which took up valuable computing resources on both the servers and the machines being crawled. This decentralized architecture also made management difficult. With SharePoint Server 2007, a single centralized search index is used for all searches. (For details, see the illustration "SharePoint 2007 Enterprise Search Architecture".)

SharePoint Server 2007 also includes many improvements to specific components that should add up to more relevant search results for users. In addition, SharePoint Server 2007 boasts a new user interface that should help users find relevant results more quickly, adds new capabilities for finding information about people, and can more easily index data stored in corporate business applications.

Easier Tailoring for Administrators

SharePoint Server 2007 has new features that can help administrators tailor search results more effectively to improve relevance. These features are all available through a Web-based administrator interface, reducing the need for custom code.

Search scopes. As in earlier versions, administrators can define named search scopes (e.g., "product specifications") that limit searches to particular subsets of content. However, earlier versions limited each resource to being part of only one scope, and administrators had to manually select each result that would appear within a particular search scope.

With SharePoint Server 2007, administrators can define search scopes based on generic file properties, such as file location, content type, or author. Administrators can also create search scopes based on multiple rules, such as "all product specifications created after Jan. 2006."

Authoritative pages and demoted sites. With SharePoint Server 2007, administrators can also designate authoritative pages, which will be weighted more heavily for relevance whenever a user enters an appropriate query. For example, a product team could designate its intranet home page as an authoritative page, ensuring that users searching on any appropriate term (e.g., their product's name) would see that page high up in search results. Conversely, administrators can also designate demoted sites that have lower relevance rankings.

This is similar to a Best Bets feature offered in earlier versions but not as blunt: users don't have to enter the precise query to get the designated top result like they do for a Best Bet. (The Best Bets feature is still available in SharePoint Server 2007.)

Reports. SharePoint Server 2007 can provide detailed reports of user queries and results, including top queries, click-through rates, and queries with zero results. Administrators can use these reports to diagnose problems, such as a frequent query going unanswered, and then influence search results accordingly.

Improved Ranking and Indexing

Relevance of search results in SharePoint Server 2007 has been improved by better relevance ranking algorithms and index propagation.

Specifically, relevance ranking considers many factors that were previously ignored, such as the following:

  • The number of clicks it takes for a user to travel from a result to an administrator-designated authoritative page
  • Information in the address of each result—for instance, shorter addresses are usually more relevant than longer ones
  • Content within a resource for resources whose names seem generic or irrelevant (e.g., "document1.doc")
  • File type—for instance, by default, SharePoint Server 2007 considers HTML pages more relevant than PowerPoint presentations, which are more relevant than Word documents.

Organizations can fine-tune the relative weight of each of these factors, as well as other properties (e.g., author, date created). However, this requires developers to write code, and may have unintended effects on search results. Therefore, organizations should first attempt to use the broader tools available through SharePoint Server 2007's administrator interface, such as authoritative pages and search scopes, before resorting to property weighting.

The SharePoint Server 2007 indexing service also improves relevance by continuously updating querying servers while rebuilding its index, leading to more up-to-date search results. With previous versions, the indexes on querying servers were updated only when the central indexing service had completed a rebuild, meaning that search results were not always up-to-date.

Search Center Interface

SharePoint Server 2007 boasts a new interface for conducting searches, as well as enhancements to the results page.

Search Center's query interface resembles Web search sites such as Google or Live Search, with a single box for entering search queries and several administrator-customizable tabs for tailoring those queries (available in the full version of SharePoint Server 2007 only). Like past versions, however, SharePoint Server 2007 does not support some common features for composing search queries, such as Boolean operators (e.g., "AND", "OR," "NOT,") , wild-card searches (e.g., "*ing" runs a query for all words that end in "ing"), or proximity (e.g., "find all documents in which 'Microsoft' appears within 10 words of 'cell phones'").

The Search Center results page incorporates many new features that are customary on Web search sites but that were lacking in earlier versions, such as hit highlighting (in which the search term is made boldface in results summaries) and query correction for mistyped entries (in which a "Do you mean..." prompt on the search results page might include alternate spellings of one or more search terms). From a results page, users can subscribe to an RSS feed or sign up for e-mail alerts for a particular query, which will help them track when results are updated.

Organizations can continue to use search Web Parts created for SharePoint Portal Server 2003 to access SharePoint Server 2007's search function. However, Microsoft warns that these older Web Parts will not work with versions of SharePoint Server beyond this release.

People Search

The full version of SharePoint Server 2007 has new features that can help users find other people in their organization—for example, the sales manager for a particular product.

As with SharePoint Portal Server 2003, the indexing service in SharePoint Server 2007 crawls user profiles—required data about each SharePoint user that is entered by administrators (or imported from Active Directory) and can be modified by end users. User profiles can include information such as areas of expertise, job titles, group memberships, and reporting relationships.

With SharePoint Server 2007, administrators can also configure the indexing service to scan Active Directory (or any LDAP directory, such as group lists on SharePoint Server 2007), even if this information has not been incorporated into user profiles. In addition, People Search results can now be arranged by social distance—the number of personal contacts between the searcher and a particular result.

To make People Search more readily available, the default Search Center user interface has a new People tab, in addition to the tabs for standard and advanced searches.

Previously, SharePoint Server 2007 was slated to include the Knowledge Network, which scanned incoming and outgoing e-mail and used this data to identify a person's areas of expertise. However, this feature will not be included in the product, but will instead be available only as a free, unsupported add-on.

Line-of-Business Applications

SharePoint Server 2007 introduces a Business Data Catalog (BDC) feature that allows enterprises to connect SharePoint Server 2007 to business applications, such as enterprise resource planning (ERP) and customer relationship management (CRM) systems.

Once connected through the BDC, SharePoint Server 2007 will index structured data in these applications, making it searchable. This data can be treated the same way as any other data in the index—for example, CRM data could be incorporated into a particular search scope. (The BDC also supports other data integration tasks, in addition to search. For example, it can continuously import data from an application into SharePoint lists to make the data Web-accessible.)

However, BDC integration still requires some development work and is not something that can typically be done by SharePoint Server 2007 administrators. BDC developers must be familiar with the business applications, as well as with the XML-based language used to program BDC connections to the applications. Moreover, the BDC can only be used to connect applications via a Web services interface or Microsoft's ADO.NET data access API; applications that support neither of these methods must still be indexed by using custom iFilters, used by SharePoint to index various data types, and protocol handlers, used to retrieve material over different types of data connections, such as the Hypertext Transfer Protocol (HTTP) or the File Transfer Protocol (FTP).

Drawbacks Remain

Although SharePoint Server 2007 is a better enterprise search tool than its predecessor, two areas could still cause confusion for customers: the lack of integration with Microsoft's other search solutions, particularly for desktop search, and a confusing array of licensing choices.

No integration with other searches. Most enterprise search products offer a unified interface for users to search corporate resources and information stored on their own PCs. In addition, some tools—including low-cost tools from Google and Yahoo—include Internet search in the same interface.

In contrast, Microsoft customers must use the Search Center (or a SharePoint Server 2007 Search Web Part) to search networked resources, the built-in search pane in Windows Vista or the Windows Desktop Search tool (WDS) for Windows XP to search local resources, and a separate site such as Google.com or Live Search to search the Internet.

Microsoft had planned to release a unified client for SharePoint Server 2007, desktop, and Web search (using Live Search) at about the same time as SharePoint Server 2007, but this project has been postponed indefinitely. Until this issue is resolved, these multiple search interfaces could cause confusion among customers and hamper adoption of Microsoft's search solutions. (For more information about the postponed unified search interface, see "Enterprise Search Offerings Postponed" on page 19 of the Feb. 2007 Update.)

Licensing choices. Although Microsoft is trying to position SharePoint Server 2007 as its enterprise search tool, there are three different editions of the product with different capabilities, and some features require each user accessing them to have a new higher-priced SharePoint Server 2007 Enterprise Client Access License. (For details and prices, see the chart "Search Features in SharePoint Editions".)

Although this segmentation helps Microsoft attract different sets of customers with different budgets, it could also alienate customers who discover an advertised feature isn't in the edition they planned to buy.

Resources

Microsoft's SharePoint Server 2007 home page is www.microsoft.com/sharepoint.

A series of technical articles on enterprise search in SharePoint Server 2007, including more architectural details and information on programmatically influencing search results, are available on MSDN at msdn2.microsoft.com/library/ms497338.aspx.

More technical articles about SharePoint Server 2007, including articles covering deployment of SharePoint Server 2007 in a server farm, can be found at www.microsoft.com/technet/prodtechnol/office/sharepoint/default.mspx

The BDC is detailed in an MSDN article at msdn2.microsoft.com/library/ms563661.aspx.

The SharePoint team maintains a blog at blogs.msdn.com/sharepoint.