February 5, 2026

Maia 200 Aims to Lower AI Inference Costs

Greg brings with him over two decades of engineering, product and GTM experience. He has held leadership positions at premier... more

The new generation of Microsoft custom AI silicon offers better price/performance for large-scale inference.

It’s technically competitive with similar options being delivered from Google and AWS.

It’s most useful for customers running their own custom models.

Maia 200, Microsoft’s newest generation of custom AI silicon, is being deployed in a handful of Azure regions in the United States, offering much of the performance benefits of GPUs but at a lower cost. Like Google’s Tensor Processing Units (TPUs) and AWS’s Trainium and Inferium, Maia 200 is a custom-designed chip optimized for the kinds of mathematical operations widely used in AI applications. Although not as fast as the highest-end NVIDIA GPUs, custom AI chips typically offer much better price/performance.

With Maia 200, Microsoft is specifically focusing on inference (the process of using AI models) instead of training (the process of building AI models). This would lower the cost for organizations that build and run their own models.

Atlas Members have full access

Get access to this and thousands of other unbiased analyses, roadmaps, decision kits, infographics, reference guides, and more, all included with membership. Comprehensive access to the most in-depth and unbiased expertise for Microsoft enterprise decision-making is waiting.

Membership Options

Already have an account? Login Now