inset
Windows 2000 DataCenter Targeting High-End Unix Servers
May 22, 2000

In late summer, selected manufacturers will begin shipping a new generation of Intel-based servers running Microsoft’s new Windows 2000 DataCenter Edition. The first generation of these systems will not be running Intel’s upcoming 64-bit Itanium processors, but rather 32-bit Intel Xeon processors. Nevertheless, these systems will be able to scale far beyond the CPU and memory limits of today’s eight-way systems. At the same time, these hardware manufacturers and Microsoft will partner to provide levels of testing, certification, and support never before available on Windows NT–based systems. To be successful, both parties must overcome cultural and technical obstacles, and it may take years before the effort pays off.

A Demanding and Growing Market

While Microsoft has enjoyed great success in the market for smaller distributed servers with Windows NT, it has been less successful in penetrating the market for large, centralized, mission-critical servers at the enterprise level. Neither Windows 2000 Server/Advanced Server nor its NT 4.0–based counterparts have been able to meet this market’s long list of stringent technical requirements.

Without a compelling offering for the high-end, Microsoft cannot fully participate in a lucrative and rapidly growing segment. In addition, this product gap may even impact Microsoft’s medium-sized server segment. Despite the many benefits of Microsoft’s existing server platform, organizations may hesitate to invest in a platform that might not scale to meet future throughput and reliability requirements. Microsoft and its hardware partners hope to turn this situation around with Windows 2000 DataCenter Edition and a new generation of server hardware.

Requirements for Enterprise-Class Servers

Enterprise servers are not simply scaled-up PC-based servers—they are more like a mainframe. To compete against the class of high-end Unix-based machines typified by the Sun Enterprise 10000 ("StarFire") running Solaris, Microsoft must address many complex issues.

Enterprise-class server systems must meet the following requirements:

  • Run indefinitely without crashing due to hardware, operating system, or application problems
  • Run for months without the need for a scheduled restart
  • No single component failure may bring down any other part of the system
  • Scale up as workloads grow without having to add more computers or having to rebuild or reinstall operating system and application software
  • Run multiple applications without any risk of one application crashing another
  • Support high availability or nonstop availability of applications, allowing multiple instances of an application to run on separate cluster nodes with provisions for automatic, rapid, and transparent fail-over
  • Include facilities for remote monitoring and management
  • Have a single point of vendor support, with complete installation and configuration services, as well as guarantees of 7x24 support, parts availability, and onsite services

While Windows NT Enterprise Edition and Windows 2000 Advanced Server systems have made inroads in addressing some of these needs—for example, by supporting up to eight CPUs, two-node clustering, and 3GB of application memory—no Microsoft-based system offered today meets all these requirements. However, the first version of DataCenter will significantly advance the current Microsoft server platform in the above respects.

What’s Fueling the Demand for Enterprise-Class Servers

Several trends are driving growth in the market for large-scale enterprise-class systems—making this segment increasingly important to Microsoft’s future.

Server consolidation. Organizations have learned that spreading loads across multiple systems imposes a high total cost of ownership (TCO) for several reasons. At the hardware and network level, all other things being equal, larger servers are actually less costly on a per-client basis than multiple smaller servers. This is especially true as higher-bandwidth WAN connections drop in price, enabling centralization of servers. Considering administrators’ time, the additional complexity of managing multiple systems (especially when dispersed geographically), and the cost of various additional software licenses for each separate server, the economies of bigger systems become apparent. To reduce TCO, most large IT organizations are moving toward fewer, larger file servers, web servers, mail servers, database servers, and application servers.

E-commerce. As firms rapidly embrace electronic commerce between both customers and other businesses, the systems supporting these mission-critical services must support rapid growth, high performance demands, and high availability.

Data warehousing/data mining. The increasing use of Online Analytical Processing (OLAP) tools requires servers of sufficient power to rapidly process huge databases without interrupting the online transaction systems creating the data.

Enterprise Resource Planning (ERP). The growth of ERP systems such as Baan, Oracle Financials, PeopleSoft, and SAP has put mainframe-like demands on servers. Any interruption in the availability of these services can cripple a company.

Application service providers. As advances in network bandwidth enable migration of processing from the client to the server, and firms begin providing applications for a fee, servers must be able to scale up to accommodate very large workloads without degrading or interrupting service levels.

In aggregate, these trends increase the urgency for Microsoft and its hardware partners to offer a viable solution for the enterprise-class server market.

How DataCenter Differs from Advanced Server

Windows 2000 DataCenter Edition shares the same code base as other versions of Windows 2000 and will offer a superset of the other Windows 2000 server products, features, and APIs. The big changes are in reliability, scalability, availability, performance, and in the way it will be sold.

New Features

System administrators will not immediately notice much different about DataCenter. Nearly all the changes are in the kernel space, with the only user interface changes being those necessary to manage the low-level changes.

Large memory support. DataCenter will increase memory support from 4GB to 64GB. Even though the current generation of IA-32 processors is nominally 32-bit, it has actually had a 36-bit internal data path since the Pentium Pro days. Through the use of a technique termed Physical Address Extensions (PAE), the extra 4 bits can be used to address up to 64GB of memory. DataCenter itself can automatically exploit this additional memory, but each application process is still limited to 3GB unless the application is modified to utilize the new Address Windowing Extension (AWE) APIs. In a manner similar to the old DOS Extended Memory Specification, the AWE creates a virtual application address space of up to 64GB. With enough physical memory, database and OLAP applications could load and process very large datasets and transactional loads without paging to disk.

Four-way clustering. DataCenter increases to four the number of clusterable nodes from the two node clusters supported by Windows 2000 Advanced Server. Other than a requirement that the storage be Fibre Channel–based, the technology, cluster APIs, and management interface is the same as in Advanced Server. The cluster technology improves on the Microsoft Cluster Server component of Windows NT 4.0 Enterprise Edition (formerly known as Wolfpack.), but it is still an "availability" solution. All four nodes can be active and can be running applications, but there is no support for balancing an application’s workload across multiple nodes. Microsoft will require applications certified for DataCenter server to be cluster-aware and to use the Windows 2000 cluster APIs. Any existing applications that already use the Microsoft cluster APIs should run fine on DataCenter’s four-way clusters.

32-processor support. DataCenter supports up to 32 processors, compared with Advanced Server’s support for eight CPUs. The Intel ProFusion hardware architecture used on today’s eight-way systems cannot scale efficiently to this many CPUs. DataCenter OEMs are resolving this problem by utilizing crossbar matrix switch designs that directly connect CPUs to main memory, solving the problems of bus contention while still providing symmetrical CPU and memory access. (See the sidebar "New Generation of Hardware Needed for DataCenter to Scale Beyond Eight CPUs".) This design can scale to even larger numbers of CPUs in the future. If an application has been designed with symmetric multiprocessing (SMP) in mind—such as intelligently using multithreading to allow multiple functions to execute simultaneously—then the application’s performance should scale up to use the added processor support without hitting a hardware-imposed ceiling.

Hardware partitioning support. Mainframes and some large Unix systems have long had the ability to group resources (CPUs, memory, hard disks) into logically discreet "partitions" (Sun terms them "domains"). Each of these partitions runs its own instance of the operating system and has exclusive control of the resources allocated to it. This essentially creates multiple computers running on a single physical machine. Administrators can still shut down and restart the operating system on one partition without affecting the others.

Partitioning is useful for many reasons. Partitions can become nodes of a cluster and communicate directly among themselves at very high speeds without needing any network interconnects. Partitioning allows systems administrators to better exploit the reliability and economies of scale offered by larger systems while having the flexibility to easily shift resources as needs change. Administrators can create test partitions that are identical to production partitions, allowing developers to test their applications in an environment identical to that of the production systems. Partitions provide the ability to create separate environments for applications that could not coexist on the same instance of the operating system. For example, one application may not support a particular release or service pack of DataCenter that another application requires. Partitioning can enable one instance of DataCenter to directly read disk data owned by another partition without using the network. Partitioning can even allow completely different operating systems to run on the same machine.

In its first release, the DataCenter operating system does not create or manage partitions, but it will run fine on hardware partitions created prior to installation of the operating systems using the hardware manufacturer’s configuration tools. Any changes to the allocation of system resources require a shutdown of the system, but DataCenter’s Plug and Play subsystem will be able to dynamically detect those changes during startup.

Support for system area networks via Winsock Direct. TCP/IP is an excellent, flexible way to interconnect computers, especially between end-user devices and servers, and also between servers separated by a WAN or servers that only occasionally communicate with each other. However, when two or more servers communicate continuously, even TCP/IP over Gigabit Ethernet adds latency and unnecessary overhead that can degrade overall performance. Examples of situations that could exploit faster server-to-server communications include inter-cluster messaging, database replication (or mirroring), parallel database "clustering," and connections between middle-tier servers (Web servers, transaction servers, etc.) and top-tier database servers.

The need for more efficient performance drove development of system area networks (not to be confused with a storage area network—unfortunately, both concepts have claimed the same acronym, SAN). Although high-availability vendors such Tandem and Stratus have developed their own proprietary interconnects and the IBM mainframe world has for some time had fast channel connections (such as ESCON), open, fast, low-latency, low-overhead interconnect technologies have just recently become available to the server world. Compaq, Intel, and Microsoft published a specification, termed Virtual Interface Architecture (VI), in 1997 to address this need. The most dominant product in this area today is Giganet’s cLAN. Rather than requiring the CPU to handle such chores as addressing, error detection, and flow control, these functions are all offloaded onto the Giganet adapter hardware, eliminating the need for the TCP, IP, and Ethernet layers and their resulting performance hit.

Rather than develop a new driver architecture to communicate with these network interfaces, Microsoft just extended the Winsock API, calling it Winsock Direct. Giganet or other SAN technology vendors can use the standard Windows 2000 NDIS 5.0 network interface driver model, but instead of communicating with the TCP/IP protocol stack, the SAN driver communicates directly with the Winsock Direct protocol. Higher-level services can thus communicate with other servers using the normal Winsock API without requiring alteration. To the higher-level services, Winsock Direct emulates TCP/IP Sockets, even though the communications completely bypass TCP/IP and talk directly to the SAN adapter.

Process control. DataCenter will add new control capabilities that allow administrators to regulate the resources that each application process can use. Administrators will be able to control memory usage, processor affinity (forcing an application to execute on one or more specific processors), process priority, and other limits to ensure that applications do not overconsume resources and destabilize the system.

OEM Involvement

For Windows 2000 to be as reliable as the big Unix systems, Microsoft could not simply sell DataCenter as a shrink-wrap product. The majority of Windows NT and Windows 2000 reliability problems are a function of Microsoft’s attempt to support a huge list of devices, enhancements, service packs, and configuration options. While these choices offer flexibility and drive down hardware and software costs, it is impossible to test all possible permutations to ensure that everything will work together properly. Microsoft wisely decided that the only way DataCenter will be sold is through a tightly controlled OEM program. Even Select or Open Licensing customers must still purchase DataCenter through a bundled hardware/software offering.

At the time of this article, the following hardware manufacturers plan to offer DataCenter: Amdahl, Compaq, Dell, EMC/Data General, Fujitsu, Hewlett-Packard, Hitachi, IBM, NEC, Stratus, and Unisys. Each will offer DataCenter as part of one or more certified platforms, and will be required to

  • Be a Microsoft Authorized Support Center (ASC) and provide a single point-of-contact to support both hardware and software. These ASCs will be tightly integrated with Microsoft’s Product Support Services (PSS) and Quick Fix Engineering (QFE) teams. Each OEM must offer a 7x24 support program, with an option for onsite support. Microsoft has not yet announced what effect a Premier support agreement will have on DataCenter’s support options.
  • Certify any cluster offerings in a four-node configuration.
  • Offer a service-level agreement for 99.9% (or better) hardware and operating system availability. OEMs will not be required to sell clustered systems, but they must provide a system that meets the 99.9% availability requirement (9 hours or less unplanned downtime per year).
  • Certify their entire fully populated system with the Microsoft Hardware Compatibility Lab, including a 14-day stress test. Any application or utility that installs a kernel-level component must be included as part of this certification. This includes antivirus products, backup tools, management software, and applications such as Exchange 2000 and SQL Server 2000. Once a system passes the 14-day test, OEMs will be permitted to use shorter tests to certify alternative utility products and smaller configurations.
  • Recertify the system with a new 7-day stress test if anything in the software or hardware changes from the originally certified configuration.

In addition to the testing, certification, and support programs, many manufacturers will offer DataCenter on an entirely new class of hardware.

What’s Still Not There

The combination of DataCenter’s new features, a new generation of hardware, and tighter OEM control will go a long way to making Windows 2000 competitive with enterprise-class Unix systems. However, in its first release DataCenter will still lack some of the capabilities found in systems such as the Sun E10000.

Full 64-bit support. Until Microsoft releases a 64-bit version of DataCenter and OEMs certify it on the forthcoming Intel Itanium systems, the 32-bit systems will not have comparable performance and scalability to the 64-bit RISC offerings from Compaq, HP, IBM, Sun, and others. Some manufacturers are designing their first generation of DataCenter hardware with IA-64 in mind and will offer board-level upgrades to Itanium processors at a future date.

Dynamic partitioning. For hardware partitioning to really meet its potential, administrators need to be able to add or reallocate system resources without taking down the system. While Microsoft plans to add dynamic partitioning to a future release of DataCenter, it may be some time before this becomes a reality. Just as with the Plug-and-Play initiatives, Microsoft and the DataCenter OEMs will have to reach agreement on a common architecture that can accommodate on-the-fly reallocation of system resources. Each partition’s instance of Windows 2000 DataCenter and the OEM’s hardware management system must be able to intercommunicate and gracefully reallocate resources as directed by the system administrator. Because there are no current industry standards for managing partitions, it is unclear at this point how Microsoft will create a scheme that will work with each vendor’s implementation.

Cluster load balancing. While DataCenter clusters will now support more than two nodes, they still provide no native load-balancing mechanisms for running an application across multiple nodes. Microsoft is working on a technology called COM+ Load Balancing (CLB) that will allow true load balancing of COM+ objects. CLB was originally slated to be included with Windows 2000 Advanced Server and DataCenter, but in Sept. 1999 Microsoft decided to remove this feature and build it into a separate product named AppCenter. (See "Microsoft Announces AppCenter Application Server" on page 3 in the Nov. 1999 Update.) AppCenter will not automatically "cluster enable" existing applications—they must be redesigned as COM+ objects before they will be able to "scale out" via clustering. DataCenter also includes another technology, Network Load Balancing (NLB), that can be used to distribute client connections across multiple servers. While useful in some applications, this technology is not true clustering and does not function at the application level. (See "Valence Acquired for Its NT-Based Load Balancing Technology" on page 3 in the Oct. 1998 Update.)

File system snapshot. Many high-end Unix and mainframe storage subsystems provide a "snapshot" feature that allows the state of the file system to be frozen in time while applications continue to run and write data to disk. This feature is enormously useful for performing tape backups and for splitting off databases for testing or for data mining. Until Microsoft builds this feature into the NT file system, the only option is to use third-party products that either function at the application level (for backups) or are managed externally to the operating system at the storage device level.

Storage area network support. A storage area network (SAN) lets multiple computers flexibly connect to one or more back-end disk farms, usually via Fibre Channel connections. Each disk farm can also be divided into partitions, each assigned to a system. Like the hardware partitioning mentioned earlier, SANs give administrators great flexibility to optimize and allocate disk resources, even among disparate types of systems. Compared with competing Unix and mainframe systems, the first release of Windows 2000 DataCenter will not provide the low-level support needed for building true SANs. Microsoft will need to change the way Windows 2000 addresses disk devices and add support for redundant Fibre Channel paths before it will work well in a SAN environment.

Obstacles and Challenges

If Microsoft succeeds with Windows 2000 DataCenter, it will remove the final barrier keeping it out of the core data center. Success should also grow its server applications business, especially SQL Server, since customers will know that they can select those applications and scale them to the same levels as their Unix-based counterparts. However, success is by no means assured. The required changes will not come easily, and it may take Microsoft years of effort before it starts seeing returns. Some serious obstacles stand in its way.

High availability requires more than technology. High availability is a blend of people, process, and technology, and the technology portion is probably the least significant of the three. Microsoft has typically focused on the technology and only rarely gets directly involved in the other two areas. It will have to partner with OEMs and customers as never before to make certain that all three areas get adequately addressed.

OEM enthusiasm and commitment is key. Microsoft will go nowhere with DataCenter if its OEMs do not develop enterprise-class hardware and drive the sales effort. Since most of these OEMs also offer Unix or mainframe systems, many will be reluctant to cannibalize their high-margin flagship products by selling DataCenter-based systems. This concern may be reduced if the next generation of Itanium-based systems can really run Unix with the same performance as their proprietary RISC-based systems. If this is true, then OEMs may be impartial to the choice of operating system.

Code release procedures will require a new level of discipline. Microsoft is determined not to fragment the code base for the various versions of Windows 2000. While this strategy has many benefits, it poses a special challenge for DataCenter. When reliability, availability, and scalability are the primary goals, Microsoft must introduce changes cautiously. Will it be willing to hold back releases of Server and Advanced Server while it fixes bugs and tests DataCenter? Microsoft is known for feature-creep as it attempts to be all things to all customers. It will have to carefully control this tendency or DataCenter’s reputation could suffer.

Microsoft’s sales and marketing approach must change. For DataCenter to succeed, Microsoft’s sales and marketing approach must change to address three realities. First, sales cycles for the type of systems on which DataCenter will run tend to be very long and often result in the sale of only a few units. This goes against Microsoft’s corporate sales model, which is still focused heavily on desktop-based enterprise agreements. Second, sales people must learn to partner with OEMs, which will require changes in field sales culture and incentives. Microsoft has always resisted "playing favorites" with its partners, and this will require a very delicate touch. Third, because of the way that three-tier applications are built, the back-end databases often have relatively few connections. Unless Microsoft carefully manages its licensing model, it may find that its corporate sales force has insufficient incentives to push these high-end systems.

Existing customer perceptions must be overcome. Even if Microsoft and its partner OEMs get it just right, perception will lag reality. Windows NT and now Windows 2000 are still perceived as being inferior to Unix for running large-scale mission-critical applications. Most IT managers view it as a "departmental" product, not an "enterprise" product. Because the stakes are so high, this area is the last place IT managers will want to take risks. Furthermore, Hewlett-Packard, IBM, Sun, and others have worked hard to penetrate the classic mainframe market with their Unix-based servers. Once corporate IT makes an expensive decision, it resists change. Windows 2000 DataCenter will have to prove itself over the long haul before it has any chance of dislodging these systems. It may get its chance with start-ups that are not replacing existing systems, but even those firms may have a hard time convincing their venture capital backers that the risk is worth taking.

Resources

For an overall description of the Windows 2000 DataCenter program and related links, see www.microsoft.com/WINDOWS2000/news/bulletins/windcprog.asp.

For the Windows 2000 DataCenter press release, see www.microsoft.com/presspass/press/2000/Feb00/DatacenterPR.asp.

For an overview of 64-bit Windows, see www.microsoft.com/WINDOWS2000/guide/platform/strategic/64bit.asp.

For more information on the Winsock Direct specification, see www.microsoft.com/DDK/DDKdocs/Win2k/wsdpspec_1h66.htm.

For more information on Intel’s Physical Address Extensions (PAE) architecture, see www.microsoft.com/WINDOWS2000/news/fromms/intelpae.asp.

For more information on Compaq’s licensing of Unisys CMP architecture, see www.unisys.com/news/releases/2000/feb/02156867.asp.

For more information on future support for dynamic partitioning, see www.unisys.com/news/releases/1999/oct/10276798.html.