Updated: December 13, 2024 (December 13, 2024)

  Blog

Microsoft’s Silicon Journey: Why It Matters

My Atlas / Blog

1,070 wordsTime to read: 6 min
Barry Briggs by
Barry Briggs

Before joining Directions on Microsoft in 2020, Barry worked at Microsoft for 12 years in a variety of roles, including... more

I’ve always been fascinated by hardware – and today innovation is happening faster than ever. Whether it’s tuning existing server chips for the specifics of the public cloud, building new chips optimized for specific workloads, or offloading some of the overhead of public could infrastructure to increase performance, custom silicon processors are becoming increasingly popular with public cloud providers. Over the past few years Microsoft has begun to develop its own custom silicon processors for Azure. Arguably, they remain behind their chief cloud competitors — AWS and Google Cloud — but the company clearly sees that specialized, proprietary hardware could be a competitive advantage. But is Microsoft doing enough and how do its chips stack up?

Cobalt: ARM-Based CPU

Following the lead of AWS and others, Microsoft released its Arm-based CPU into general availability in October 2024, for use in Azure VMs (the Dpsv6 and Epsv6 series, not available in all regions). Cobalt is not Microsoft’s first Arm-based processor; VMs with Ampere’s Altra processor were introduced in 2022.

Cobalt, a 64-bit processor, is built by the Taiwanese chip foundry TSMC using a 5nm process and can support 96 vCPUs and up to 192GB RAM. Windows and Linux VMs are supported. 14 Azure regions (out of 64 total) are supported with more planned.

By contrast, the fourth generation of AWS’ Graviton CPU, also based on an Arm design, was released in July 2024. AWS claims that it has built more than 2 million Gravitons, and they are available in 33 geographical regions in 150 instance types; however, it’s too early to tell if Microsoft is playing catchup with AWS or has leapfrogged them.

Maia: Custom AI Processor

In August 2024, Microsoft released additional technical details about its custom AI processor, Maia. It’s impressive: also fabricated by TSMC using a 5nm processor, on paper it competes well with others in the space, supporting 64GB of High Bandwidth Memory (HBM, a hardware technology that vertically “stacks” memory for higher density and bandwidth than traditional GDDR GPU memory – and is thus particularly well suited for AI which needs the bandwidth for high-dimension vector calculations).
But Maia has some powerful competition, including Google, whose Trillium (the sixth generation of its Tensor Processing Unit) supports the next generation of HBM. (Google recently announced that 100,000 Trillium chips power the training of its Gemini 2.0 LLM.)

AWS offers two separate processors: Inferentia for AI inferencing (now on its second generation) and Trainium, as the name implies, for AI training. And of course, the overwhelming market leader is NVIDIA, whose A100 GPUs support up to 80GB of HBM.

While Maia has some promising features, the acceptance of such silicon depends less upon the hardware features and more upon the software stack and its compatibility with existing workloads – and here NVIDIA’s market-leading CUDA API gives it the advantage.

Maia has yet to be released into production and it’s likely that it will be primarily used for internal Microsoft AI workloads, at least initially.

Boost: Offloading the CPU

The first generation of Azure Boost, generally available in November 2023, is an intelligent PCIe card incorporating custom silicon programmed to accelerate network and storage functions. For example, by parsing network packet headers or managing low-level storage protocols such as NVMe in hardware, these functions can be offloaded from the hypervisor or the customer’s host OS, thus improving performance, particularly for workloads involving massive amounts of data, such as machine learning or data analytics.

At Ignite 2024, Microsoft announced a Data Processing Unit (DPU) version of the core Boost processor. Industry insiders may recall Microsoft’s acquisition of Fungible some years ago; the Boost DPU as it is called appears to be its fruit. The Boost DPU is intended to offload such low-level functions as data compression and encryption from the CPU – thus accelerating application performance.

Again, however, Microsoft faces competition. AWS’s Nitro, first introduced in 2017 and now on its third major revision, provides hardware based network and storage acceleration. NVIDIA’s Bluefield DPU “platform” is also in its third generation and is offered by Azure.

Azure Integrated HSM: Root of Trust

Finally, at Ignite 2024 Azure CTO Mark Russinovich announced a custom security processor, an onboard Hardware Security Module (HSM) for Azure servers. HSMs securely store cryptographic keys and perform cryptographic operations such as encryption and digital signing.

Azure offers network-connected third-party HSM appliances which require roundtripping to perform operations; the new device appears to be a PCIe card which Russinovich claims Microsoft will install in every Azure server in 2025, thus eliminating network latency. The new HSM complies with FIPS 140-3 Level 3 (a set of US standards for cryptographic modules); Level 3 includes requirements for tamper resistance and other strict security features.

In the blog post, Russinovich also refers to Microsoft’s work in quantum-resistant encryption – this is exciting stuff which we’ll cover in a future post.

So Where is Microsoft in Hardware?

While Microsoft has been pouring investment into AI – and the stock price reflects it – it seems clear that it is behind its competitors in custom silicon.

Is that important? Does anyone care?

Well, yes, even if not so much in the short term. AI workloads, and cloud computing generally, are expensive and use prodigious amounts of electricity. If Microsoft (or anyone) can dramatically reduce the power requirements of AI, they will gain a clear advantage.

Even if customers aren’t interested in AI workloads, custom silicon can help lower costs for customers. Custom silicon is often more power efficient, enabling more computing power to be hosted in a given rack, which translates to lower operating costs and generally lower prices for customers.

On the other hand, they face headwinds. Others are ahead. Their preferred manufacturer, TSMC, is also the preferred vendor for nearly everyone, most notably Apple who contracts with TSMC to produce the custom processors in the more than 200 million iPhones it sells every year. How much capacity does TSMC have after taking care of Apple, NVIDIA, AWS, and the rest? Can Microsoft get their place in line? And can Microsoft reduce industry dependence on NVIDIA’s CUDA? We’ll have to see.

It’s clear that the days when we could assume that every server was an x86 are over, and it’s a welcome change. We’re in the “let a thousand flowers bloom” period and it’s going to be fun to watch.

Think I need an implanted AI chip? Have other questions or comments? Let me know at bbriggs@directionsonmicrosoft.com.