inset
SQL Server 2005 Partitions Large Databases
Sep. 19, 2005

A new mechanism for partitioning databases in SQL Server 2005, Microsoft's database management and business intelligence (BI) platform, will improve the performance and manageability of large databases, such as data warehouses. Improving SQL Server's ability to manage large databases could help Microsoft wrest market share from IBM and Oracle, which dominate the market among large enterprises for the most demanding database applications. However, recent benchmarks indicate SQL Server 2005 still trails Oracle and IBM in raw database performance.

SQL Server's Enterprise Push

SQL Server is the dominant database management system in small and midsize businesses and an important money maker for Microsoft—the product generates more than US$1 billion in revenue for Microsoft's Servers and Tools division and consistently posts double-digit annual revenue growth. This success, however, creates a conundrum for the company—how to continue to grow SQL Server revenues when the product has achieved dominance in its target market.

Microsoft has employed several strategies to keep SQL Server sales growing, including the following:

  • Bundling ancillary features, such as services and tools to support the creation of BI applications, with the core SQL Server database engine
  • Building SQL Server dependencies into other products, such as Great Plains, SharePoint Portal Server (SPS), and Systems Management Server (SMS)
  • Positioning the product as a suitable database engine for large enterprises.

Although Microsoft has had success with the first two strategies, the company has had a tougher time placing SQL Server in large enterprises. In small and midsize companies, purchase decisions are often driven by cost and ease-of-use considerations, making SQL Server a logical choice. But in large companies, where cost considerations often take a back seat to raw performance and scale, SQL Server continues to run a distant third behind Oracle and IBM. (In 2004, according to IDC, IBM and Oracle collectively owned about 70% of the US$15 billion worldwide database market, compared with Microsoft's 13% share of the market.)

Consequently, SQL Server's ability to meet the scalability and performance demands of the enterprise database market, and chip away at the lead enjoyed in this market by IBM and Oracle, plays a critical part in the product's growth prospects.

Although SQL Server 2005 has not yet been released, performance benchmarks posted by the Transaction Processing Performance Council (TPC) show that the product outperforms its predecessor on both the TPC-H benchmark, which simulates queries against a large decision support database (such as a data warehouse), and the TPC-C benchmark, which simulates transactions in an online transaction processing (OLTP) database, such as an order processing database. However, SQL Server 2005 has yet to post benchmarks that rival, in terms of raw performance, those of IBM's and Oracle's latest database releases. Microsoft will undoubtedly push to close this gap as SQL Server 2005's ship date approaches.

(For a summary of recently published SQL Server 2005 performance benchmarks, see the illustration "SQL Server 2005 TPC-H Benchmarks".)

Partitioning Improves Performance and Scalability

One key scalability improvement in SQL Server 2005 is better support for database partitioning. Database partitioning is a method for scaling databases by splitting them into partitions that reside on different storage locations on the same server (local partitioning) or on separate servers (distributed partitioning).

Partitioning improves the performance and manageability of large databases in several ways. First, locally partitioned databases improve query performance by spreading work across multiple disk systems and taking advantage of parallel processing in multiprocessor machines. For example, a common query in data warehouses and other databases used for analysis involves data aggregation (summing regional sales results for a year, for instance). In a properly designed partitioned database on a multiprocessor machine, much of the workload of aggregating data can be done in parallel—e.g., multiple processors, each working on a separate partition, can simultaneously execute aggregations, which are subsequently combined into a single result. Similarly, distributed partitioning can improve performance by spreading the load across multiple servers.

Partitioning can also lessen the impact of database maintenance operations, such as archiving data or performing database consistency checks, because those operations can be limited to a specific partition of the database and will not block queries being handled by other partitions.

SQL Server 2005 and previous versions of the product support horizontal partitioning, which splits tables at row boundaries. For example, a horizontal partitioning scheme might split a large membership table (e.g., containing information such as last and first names, phone numbers, home and work addresses, age, and gender in columns) into 26 smaller tables, with each of the smaller tables containing the rows for members whose last names start with the same letter.

Partitioned Tables for Local Partitioning

SQL Server 2005 includes a new feature for local partitioning (called partitioned tables) that brings Microsoft one step closer to feature parity with IBM and Oracle. With partitioned tables, database administrators and architects split large database tables into smaller partitions, but all partitions remain within one database in the form of smaller base tables.

(For a graphical overview of how partitioned tables work in SQL 2005, see the illustration "Table Partitioning Concepts in SQL Server 2005".)

SQL Server 2005 stores partitioned tables and their definitions as single logical entities in the database, and developers, administrators, and other users can query, update, and modify data in partitioned tables as they do any other database table. Thus SQL Server 2005 masks the complexity of a partitioned table's physical implementation from database users—they are not required to understand how data are distributed among the partitioned table's constituent base tables.

Managing Partitioned Tables

The new feature is the evolution of local partitioned views, a feature in SQL Server 2000. (Local partitioned views will continue to be supported in SQL Server 2005.) Although the two features offer developers, administrators, and other database users many of the same advantages, partitioned tables are significantly easier to create and manage than their SQL Server 2000 predecessors.

For example, SQL Server 2005 administrators can specify the number of partitions in a partitioned table, and how data are distributed among those partitions; once specified, SQL Server 2005 creates the base tables automatically and can subsequently direct queries and updates at the appropriate underlying base tables. In contrast, SQL Server 2000 administrators create base tables separately and must specify for each base table a set of constraints that dictate what data are allowed (or disallowed) in those tables.

Furthermore, SQL Server 2005 administrators can direct maintenance activities (such as rebuilding indexes or performing consistency checks) against a partitioned table rather than working directly with the partitioned table's base tables. With SQL Server 2000, administrative activities are performed against the individual base tables.

New Commands Target Data Warehouse Tasks

SQL Server 2005 provides several new commands and command options to manage partitions after the partitioned table has been created. For example, SQL Server 2005 includes command options that allow administrators to split a single partition into two partitions, merge partitions, and add new data to (or delete data from) a partitioned table by switching partitions into and out of an existing partitioned table. Such capabilities seem specifically designed to address common, often cumbersome, maintenance operations in data warehouses or other large databases used for analysis.

For example, companies often maintain a "sliding window" of data in data warehouses, such as retaining the most recent year's worth of sales data online and archiving older records. A monthly maintenance procedure might involve adding all sales data recorded in the most recent month and archiving and deleting the oldest month's data from the data warehouse. Assuming a carefully designed partitioning scheme (12 partitions, each containing one month's worth of records in the above example), SQL Server 2005's partitioned tables simplify and speed such operations—administrators could simply create a new table, load the table with records from the most recent month, and then insert that table as a new partition while removing the partition containing the oldest month's worth of data.

No Change for Distributed Partitioning

Partitioned tables are strictly for local partitioning—breaking up a database on a single server to reduce the impact of maintenance operations and improve query performance. For distributed partitioning, which spreads load across multiple servers, SQL Server 2005 still supports distributed partitioned views, introduced in SQL Server 2000. To use this feature, administrators create independent base tables on multiple servers. They then create a view that spans the separate tables. The view shields users and developers from the implementation details of the base tables (such as the names of those tables and how table data are distributed among them), effectively providing an alias for the underlying base tables.

However, SQL Server 2005 has not simplified management of distributed partitioned views. Administrators must still carry out maintenance on the individual tables making up the partitioned view, rather than performing a single operation on the view. To change the partitioning of a view (for example, to go from two to three servers), administrators must create new tables, manually move data, and update the definition of the view spanning the distributed databases, among other operations. Furthermore, distributed partitioned views generally yield performance gains for only certain types of applications, such as applications whose queries, for the most part, do not require data from more than one server making up the partitioned view.

Consequently, organizations will likely avoid distributed partitioning with SQL 2005 except where performance gains clearly exceed the costs of added administrative overhead. For many organizations, and many applications, ease of maintenance coupled with the availability of lower cost 64-bit and multiprocessor servers will likely make partitioned tables a more attractive and cost-effective solution than distributed partitioned views.

Considerations

Although table partitioning in SQL Server 2005 affords several improvements in the management of large database tables compared with SQL Server 2000's partitioned views, it also introduces new concepts and commands that architects, administrators, and developers will need to master.

In addition, customers should consider the following:

No automated migration. Because they are fundamentally different concepts, SQL Server 2005 provides no utilities to migrate tables partitioned with partitioned views to SQL Server 2005's partitioned tables. However, customers can use the new table partitioning commands to switch base tables in a SQL Server 2000 local partitioned view into SQL Server 2005 partitions.

Careful design required. Although partitioning can improve the performance and manageability of large tables, partitioning schemes must be carefully designed to realize performance and scalability gains. Developers and architects must consider how data is accessed by users and applications and the types and frequency of administrative operations. A large table for which maintenance activities have begun to noticeably interfere with users' queries could be a candidate for partitioning. Likewise, if database queries can be correlated with easily identified subsets or ranges of data, query performance may benefit from partitioning. (For instance, a sales fact table in a large data warehouse, where most queries are specific to fiscal months, may lend itself to partitioning.) However, if such data access patterns cannot be identified, partitioning could result in a net degradation of query performance rather than an improvement.

Finally, administrators and architects must weigh the promise of performance and scalability gains against the added administrative and maintenance complexity inherent in partitioned databases—partitioned database tables are more complex than single, monolithic tables.

Old news for Oracle and IBM. Oracle and IBM have supported table partitioning for several iterations of their database management systems. Thus, adding support for partitioned tables brings SQL Server closer to parity with its main rivals' features for large database management; it does not, however, afford Microsoft any competitive advantage from a strictly functional standpoint.

Availability and Resources

Microsoft intends to launch SQL Server 2005 on Nov. 7, 2005. The sixth and final SQL Server 2005 community technical preview (CTP) was released in Sept. 2005. (CTPs are interim product releases between major milestones, such as beta releases; however, CTPs do not undergo the same level of testing as beta releases.) Partitioned tables will be supported in the SQL Server 2005 Enterprise and Developer Editions only.

The SQL Server 2005 Web site is www.microsoft.com/sql/2005/.

The most recent SQL Server 2005 CTP can be downloaded at www.microsoft.com/sql/2005/productinfo/ctp.mspx.

A detailed white paper describing SQL Server 2005 partitioned tables is at msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/sql2k5partition.asp.

The IDC Web site is www.idc.com/.

Recent SQL Server 2005 performance benchmarks, as measured by the Transaction Processing Performance Council (TPC), are at www.tpc.org/.

Analysis Services in SQL Server 2005 is described in "Analysis Services Matures in SQL Server 2005" on page 3 of the Feb. 2005 Update.

Integration Services in SQL Server 2005 is described in "Data Integration Improved in SQL Server 2005" on page 13 of the Dec. 2004 Update.

Updates to Reporting Services in SQL Server 2005 are outlined in "Reporting Services Updated for SQL Server 2005" on page 15 of the July 2005 Update.

New features for developers in SQL Server 2005 are outlined in "SQL Server Gets .NET, XML" on page 5 of the Jan. 2004 Update.