| Analysis Services Matures in SQL Server 2005 |
| Jan. 24, 2005 |
Compared with its forerunners, SQL Server 2005 Analysis Services is a more fully realized platform for corporate developers and ISVs building online analytic processing (OLAP) and data mining applications. OLAP and data mining are important techniques for analyzing a business to make its processes more efficient and exploit its market more effectively. A more general data model, support for thin clients, increased scalability, and a development environment based on Visual Studio will further strengthen SQL Server’s position in the OLAP market and could help drive upgrades to SQL Server 2005. As with previous versions, however, Analysis Services is primarily a platform for application development—business users will need additional products, such as Excel, to navigate Analysis Services data. What Is Analysis Services? SQL Server Analysis Services is a development and management platform for creating OLAP and data mining applications, which help business users analyze historical business data and spot correlations and patterns in those data (analyzing quarterly sales results and forecasting revenues, for instance). These applications typically consist of client tools for viewing and filtering data that interact with an OLAP server—a database management technology that serves as the engine for updating, querying, and managing the OLAP infrastructure. The most important component of the OLAP infrastructure is the OLAP "cube," a central database containing historical business data summaries that are typically derived from data warehouses. (For more information on OLAP, data mining, and associated terms, see the sidebar "Where Do OLAP, Data Mining Fit?". For an overview of cubes and related concepts, see the illustration "What Is a Cube?".) Analysis Services coupled with Integration Services and Reporting Services make up Microsoft’s business intelligence (BI) platform. (Integration Services supplies tools for extracting data from operational databases, cleaning them to remove anomalies, and loading data warehouses; Reporting Services is used for creating and distributing reports.) This platform has become popular with businesses looking to gather, summarize, and analyze business data in-house, and also with ISVs, who use the platform to build BI capabilities into their own products. For example, the Microsoft Business Solutions group builds SQL Server-based BI capabilities into products such as Great Plains and Navision; external software makers, such as business analysis software maker OutlookSoft, also provide analysis capabilities based on the SQL Server BI platform. Microsoft’s BI platform is modular and data-source independent; companies can use the platform’s components independently or collectively, and with data sources other than SQL Server. For example, a company could use Analysis Services to create cubes from data stored in an Oracle data warehouse. (For an overview of Microsoft's BI platform, see the illustration "Business Intelligence in SQL Server 2005" on page 14 of the Dec. 2004 Update.) Why Microsoft Cares Microsoft entered the OLAP market with SQL Server 7.0 in late 1998. Analysis Services (then called OLAP Services) gave developers infrastructure for creating cubes, and programming interfaces to build client applications that could access and view those cubes. In SQL Server 2000, Microsoft further fleshed out the core platform and client APIs and added capabilities to support data mining. Microsoft’s entry into the OLAP market proved a boon for the company—it created new sales opportunities for SQL Server, helped get Microsoft’s foot in the door of the enterprise database management market previously dominated by Oracle and IBM, and helped spur sales of products like Office (specifically, Excel for viewing cubes). A More Flexible Development Platform Compared to its predecessors, Analysis Services 2005 gives corporate developers and ISVs more flexibility in modeling and working with business data sources and Analysis Services objects (such as cubes). In addition, the platform introduces several new programming constructs and algorithms that will help developers’ deliver significantly more intelligence to business users, with an accompanying reduction in custom or third-party code. The goals of the update are ambitious—Analysis Services 2005 aims to redress many of the compromises of traditional OLAP platforms (previous versions of Analysis Services among them) by doing the following:
Data Model Generalized A new data modeling tool, called Data Source Views (DSVs), is a general mechanism for aggregating, correlating, and viewing multiple data sources and their relationships. DSVs provide a logical, unified view of database objects (such as tables and their relationships) that Analysis Services uses to build and populate cubes. Rather than working directly with those objects, developers construct a logical "view" of one or more underlying data sources and use that view as the basis for defining cubes. DSVs can work with many types of data sources, including database systems such as SQL Server and Oracle, Web services, and files. DSVs provide several advantages over the more rigid data model of previous Analysis Services releases. First, they allow developers to create cubes from multiple underlying sources; in previous versions, cubes could be constructed from a single data source only. For example, a developer in a large company could find he needed to incorporate tables from data warehouses maintained by separate organizations or subsidiaries. To do so in Analysis Services 2005, a DSV could bridge those multiple sources with a single, logical representation of the tables of interest (and their relationships) from the various sources. (DSVs are created and manipulated with tools in Analysis Services development environment, described below.) Once they create a DSV, developers use it to construct cubes, much as they would construct cubes directly from a single, physical data source in past versions. In addition, DSVs insulate developers from the underlying raw data sources; for example, developers can manipulate and extend the DSV by assigning friendly names to tables or creating new columns based on calculated values (none of which would be possible in previous Analysis Services versions without altering the underlying data source). Cube and Dimension Models Extended Analysis Services 2005 is less restrictive in the way cubes (and their constituent dimensions, hierarchies, and attributes) are defined, enabling them to represent information and relationships in underlying business data stores in ways that were difficult or impossible in previous versions. This general approach makes cubes more useful for business analysis and easier for developers to create and maintain. In addition, Microsoft hopes to position the cube as a general mechanism for multidimensional modeling of business data (Microsoft refers to this as the Unified Dimensional Model), and thus the repository of choice for other business applications—such as reporting applications—that have traditionally targeted the data warehouse. For example, previous versions of Analysis Services dimensions were structured around the notion of hierarchies—dimensions contained a single hierarchy, of which all attributes were required to be members. In contrast, dimensions in Analysis Services 2005 are oriented around their attributes. Dimensions can contain any number of attributes, which can be arranged in multiple hierarchies, and different hierarchies can contain the same attributes. For example, the same time dimension could contain a calendar year hierarchy with attributes for month, calendar quarter, and calendar year; and a fiscal year hierarchy with attributes for fiscal quarter and fiscal year. (In fact, such time-related hierarchies are so common that Analysis Services 2005’s wizards can create them automatically.) To accommodate separate but related hierarchies (or attributes that did not fit in the hierarchy) in Analysis Services 2000 involved creating separate dimensions, which increased the cube’s complexity and size, made it harder for business users to navigate, and slowed its processing and query response times. Similarly, Analysis Services 2005 cubes can contain multiple fact tables (i.e., the table containing the data that the cube’s measures are derived from); previous versions of Analysis Services restricted cubes to a single fact table. Thus, instead of creating and maintaining a large number of cubes, each containing a single set of measures indexed by the same (or similar) dimensions, developers in Analysis Services 2005 can construct fewer cubes containing multiple sets of measures. The flexibility of Analysis Services 2005 makes possible the creation of very large cubes encompassing multiple fact tables and dimensions with large numbers of attributes and hierarchies. Such cubes could become cumbersome for applications accessing them and difficult for business users to understand and navigate. To shield business users from the complexity of large cubes, developers can specify "perspectives"—logical subsets of the objects and data (such as measures, dimensions, and hierarchies) in a cube. For example, a developer could create regional perspectives on a cube summarizing a company’s worldwide sales such that analysts in the United Kingdom would see sales summaries only for companies in Europe, without being distracted by other regional sales summaries. Cubes Get Closer to Real-Time To address data latency, Analysis Services 2005 introduces a feature called "proactive caching," which allows administrators to configure when and how frequently cube data (for example, measures, aggregations, and dimensions) are refreshed from their underlying data source (a data mart or data warehouse, for instance). In previous versions of Analysis Services, administrators were required to make architectural decisions regarding how and where cube data were physically stored and processed. For example, if the fastest possible response to client queries was the goal, administrators could opt to have cubes preprocessed and stored on the Analysis Services server. However, this configuration trades query performance for data currency—data are only as current as the most recent time the cube was processed. Conversely, administrators could opt to place more of the query processing load on the cube’s underlying data source (e.g., on the data warehouse server), which reduced latency but often increased query response time. Caching aims to alleviate the limitations inherent in this compromise. Queries are handled by the Analysis Services server, while cube data is regularly refreshed without administrator intervention from underlying stores, thus reducing latency. Better Business Intelligence, Less Work Several new programming constructs in Analysis Services 2005 will reduce the amount of custom code developers must write to include sophisticated business intelligence capabilities in applications. Specifically, Analysis Services 2005 introduces the following constructs: Key performance indicators (KPIs). KPIs track specific business metrics against predetermined goals. For example, a large, multinational retailer might track and measure actual revenues against revenue targets by region, or at a more granular level, by individual store. Analysis Services 2005 allows developers to construct KPIs based on cubes; for example, the metric tracked by the KPI could be based directly on a cube measure (such as quarterly sales in a region) or could be a calculation based on a measure (quarterly profits for the same region, for instance). For each KPI, developers also define target values for the metric, status of the metric (which is used to provide a qualitative indicator of a metric’s current value), and a trend for the metric. The status and trend of a metric will typically be calculated values. For example, the status of a quarterly sales metric will vary over time depending on the current value of the metric relative to its target and also how close the current date is to the end of the quarter. Analysis Services 2005 provides no end-user tools or clients for viewing KPIs. However, the platform components for constructing, organizing, and storing KPIs and client APIs for accessing them could save considerable time for developers building business monitoring or dashboard applications. New data mining algorithms. Analysis Services 2000 shipped with two relatively simple data mining algorithms. A clustering algorithm identified groupings within existing data. For example, based on sales data, a clustering algorithm could help an outdoor equipment retailer determine that Denver area residents purchase ski and mountain bike equipment with greater frequency than residents of other metropolitan areas. Such information could help the retailer target promotions or advertising accordingly. A more complex decision trees algorithm was used to build predictive models. Analysis Services 2005 adds five new data mining algorithms that will help BI developers create more sophisticated applications without having to integrate (often costly) third-party algorithms or create their own. Among the new algorithms, an algorithm called "Naïve Bayes" provides a relatively fast mechanism for predictive modeling. Developers could use the algorithm to quickly determine which customers are most likely to respond to a mail insert as a function of their demographic profile (e.g., age, gender, and location). In addition, a new time-series algorithm uses historical data to uncover time-dependent trends in data, which can be used in forecasting applications—for example, to estimate future sales growth. Thinner Clients Analysis Services 2005 eliminates the PivotTable Service, a client-side programming interface that Analysis Services 2000 applications used to access and manipulate data on Analysis Services servers via extensions to Microsoft's OLE DB data access API. The PivotTable service provided client-side (or mid-tier) calculations and caching. With its elimination, all calculations and caching are pushed to the server, allowing for lighter-weight clients or mid-tier applications. In recent years, businesses have tended toward lighter-weight, browser-based clients for several reasons. Thick clients often require complicated software and configurations on users’ desktops and thus engender higher administrative overhead (such as keeping clients up-to-date and patched) compared with browser-based clients. Furthermore, thick clients often require active connections to back-end servers or databases, which can limit server scalability as the number of clients grows (e.g., by limiting the number of clients a given server can support). Consequently, Analysis Services 2005 should prove more scalable than its predecessors in at least one important respect—an individual Analysis Services server or mid-tier server should support many more clients than previously. Standards Support Analysis Services 2005 implements protocols and data access APIs that support cross-platform access to Analysis Services applications and data. In 2001, Microsoft (and Hyperion) released the XML for Analysis (XMLA) specification, which defined a Web services interface to allow applications to access OLAP and data mining services over the Internet. The specification was followed by an SDK that developers used to create Web-based applications for Analysis Services; such an application could be hosted on a company’s intranet to permit users to view cube data in a Web browser, for instance. However, the XMLA SDK required installation and maintenance of separate components and was effectively a shim—Analysis Services 2000 still used the PivotTable Service and the OLE DB data provider to communicate with Analysis Services servers. In Analysis Services 2005, XMLA becomes the native communications protocol; all communications between clients (or mid-tier servers) and the Analysis Services server are Simple Object Access Protocol (SOAP) messages that conform to the XMLA specification. This architecture offers flexibility and cross-platform compatibility that did not exist in previous versions—developers can create Analysis Services clients or Web-based applications on any platform that supports the SOAP protocol (such as the Apache Web server running on the Linux OS). Of course, Analysis Services 2005 also toes the .NET framework line. It ships an API called ADO MD.NET, which developers use to create managed Web applications or clients for Analysis Services. Such applications inherit the benefits built into the .NET Framework (such as improved security and reliability), and other benefits specific to Analysis Services. For example, the Analysis Services API can compress and encrypt XMLA instructions before sending them to the server. (For an illustration of the Analysis Services 2005 client components arranged in a three-tier, thin-client architecture, see "Analysis Services 2005 Client Architecture".) Integrated Development Environment, New Wizardry Analysis Services 2005 comes with a new integrated development environment, called the BI Development Studio. Based on Visual Studio 2005, the BI Development Studio replaces Analysis Manager, which developers used to create and configure Analysis Services 2000 objects, such as cubes. (The BI Development Studio also provides tools and utilities for building Integration Services and Reporting Services projects.) The BI Development Studio makes use of the familiar project paradigm in Visual Studio and inherits capabilities such as source-code control (which is built into Visual Studio). These features could prove particularly interesting to corporate developers and ISVs working on larger Analysis Service projects, allowing them to better organize and collaborate on Analysis Services projects. In addition, the BI Development Studio incorporates and updates a number of wizards from previous versions of Analysis Services. (For an illustration of the Analysis Services development environment, see "A New Development Environment for Analysis Services".) Offline Project Development Supported Analysis Services 2005 supports offline development via integration with Visual Studio, making it much easier for large teams of developers to create Analysis Services applications. Developing applications with Analysis Services 2000 required an active connection to an Analysis Services server (for example, a production Analysis Services server or a replica of a production server in a test environment). In effect, developers were working online, directly manipulating objects in an Analysis Server database (cubes and dimensions, for example) as they worked. Although an online development mode is workable for small teams, it becomes increasingly difficult to manage as the size and complexity of projects increase. In addition, all Analysis Services 2000 objects and their metadata were stored in the database; thus, enforcing standard software development procedures, such as source-code control, during the development of those objects became a major challenge. In contrast, the BI Development Studio used by Analysis Services 2005 is a full-fledged project development environment, thanks to its integration with Visual Studio. For example, the metadata for key objects (such as cubes and DSVs) for Analysis Services projects are stored as XML files. These objects can be modified and updated without a live connection to an Analysis Services server and can be managed via source-code control. The Visual Studio project paradigm simplifies the development environment and makes it more consistent among groups of developers. For instance, a large development team using Analysis Services 2000 might have required that multiple replicas of a production Analysis Services database be maintained and synchronized. Analysis Services 2005 allows for more modular development: for example, a single developer (or group of developers) could be tasked with designing and maintaining DSVs. Those DSVs could be centrally managed in a source-code control system and made available to other developers, who could use them as the basis for a variety of Analysis Services projects or applications. Streamlined Cube Creation Previous versions of Analysis Services provided wizards for common tasks, such as creating cubes, defining facts and dimensions, and configuring data mining models. Analysis Services 2005 brings these wizards forward, rolls them into the BI Development Studio, and updates them to simplify familiar tasks. Perhaps most significantly, a new cube-creation wizard streamlines the often difficult and cumbersome process of defining cubes and their constituent measures and dimensions, and allows cubes to be built from any number of data sources, regardless of the schema of those sources. Cube construction in Analysis Services 2000 was limited in several respects. For example, the previous cube wizard required that source data were arranged in a specific structure (referred to as a "star schema," a common way of arranging data in data warehouses and data marts). In addition, as described above, cubes could only be based on a single fact table from a single data source. The Analysis Services 2005 cube wizard builds cubes from DSVs, which presents the wizard with a logical schema representation that can span any number of data sources. (Data sources can include, of course, data warehouses among many other sources, even production databases, such as a manufacturer’s inventory database.) To create cubes, the new wizard analyzes the objects and relationships encapsulated by the DSV. For example, the cube wizard will analyze the schema and suggest likely fact tables. Additionally, it will examine relationships between a candidate fact table and related tables and suggest candidate dimensions. Similarly, by examining relationships among columns in the tables used to create dimensions, the wizard will suggest likely hierarchies. Considerations Analysis Services 2005 is a more flexible platform for OLAP and data mining application development than its predecessors and should help development teams create more sophisticated applications with less custom code. However, prospective customers should consider the following: New concepts, working model to learn. Developers familiar with Analysis Services 2000 should devote some time to becoming familiar with the new architecture in Analysis Services 2005 and the new development environment and wizards introduced with the BI Development Studio. For example, the more general structure of dimensions in Analysis Services 2005 could cause some confusion among OLAP purists accustomed to the rigid hierarchical cubes and dimensions in previous versions. Furthermore, the flexibility of constructs such as DSVs could lead to problems if they are not carefully employed. For example, DSVs introduce the possibility of creating cubes directly from data in operational data stores (such as a database that records sales transactions), which could be tempting for companies wanting to analyze operations in near real-time. However, architects will still need to balance the risks and complexity inherent in such scenarios against the need for real-time analysis. For example, providing OLAP analysis capabilities against operational systems introduces the risk of interfering with the main function of those systems (quickly recording transaction details) and ignores the problem of data cleansing that data migration tools, such as Integration Services, ordinarily perform. New client architecture could slow query responses. The client architecture has changed markedly in Analysis Services 2005. In previous versions, a significant amount of cube or data mining processing would occur on a user’s local machine. Analysis Services 2005 opts for a thin-client architecture in which all processing and caching takes place on the server. Although the new architecture is more flexible and open (for example, it is amenable to cross-platform client access to Analysis Services objects and data), and likely more scalable than previous Analysis Services architectures, developers and users could find that response times for certain queries actually increase in Analysis Services 2005. For example, simple queries (or queries whose results could previously be delivered from a user’s cache) may actually run more slowly on Analysis Services 2005. On the other hand, computationally intense queries (such as those aggregating and sorting cube data along a large dimension) will likely benefit from the processing power of the server. No end-user tools. Finally, as is the case with previous iterations, Analysis Services 2005 is primarily a platform for development. Companies will need additional products, such as Excel or third-party applications from Microsoft partners such as Panorama Software, for business users to view and analyze cube data, and they will need to develop or purchase other applications to enable workers to view KPIs or work with data mining models. Although the company has not revealed plans to do so, such capabilities could be built into future releases of products such as Excel to take advantage of new Analysis Services capabilities (KPIs, for instance). Finally, it is not yet clear how Microsoft will make use of technology acquired in its Mar. 2004 purchase of ActiveViews, which had been working on client software that allows business users to construct reports from SQL Server databases. Microsoft has indicated that the ActiveViews technology will be available in the third beta of SQL Server 2005 in the form of a feature called Report Builder, and it will likely give business users some ability to create reports from Analysis Services cubes. Availability and Resources Microsoft released the second beta of SQL Server 2005 in July 2004 and plans to release the product in the third quarter of 2005. (A third beta will be available in the first quarter of 2005.) Analysis Services will be included with the Standard, Enterprise, and Developer editions of SQL Server 2005 at no additional charge. The SQL Server 2005 site is www.microsoft.com/sql/2005. Information about BI and data warehousing capabilities in SQL Server is at www.microsoft.com/sql/evaluation/bi/default.asp. Integration Services in SQL Server 2005 is described in "Data Integration Improved in SQL Server 2005" on page 13 of the Dec. 2004 Update. SQL Server Reporting Services is described in "SQL Reporting Services Launches" on page 3 of the Mar. 2004 Update. The Microsoft acquisition of ActiveViews is discussed in "Self-Service Reporting Technology Acquired" on page 21 of the June 2004 Update. New features for developers in SQL Server 2005 are outlined in "SQL Server Gets .NET, XML" on page 5 of the Jan. 2004 Update. |