Inside Microsoft’s AI Superfactory Plan
<p>Microsoft has started linking its AI datacenters across the U.S., creating what it calls a new kind of connected system — an “AI superfactory.” The first two sites in this […]</p> <p>The post...
Microsoft has started linking its AI datacenters across the U.S., creating what it calls a new kind of connected system — an “AI superfactory.” The first two sites in this network are located in Atlanta, which has been operational since October, and Wisconsin, which was introduced publicly this week. Linked by a high-speed private fiber backbone, these facilities are designed to work together on massive AI workloads, splitting and syncing jobs across sites in near real-time.
This connected architecture, which Microsoft refers to as Fairwater, represents a shift from isolated cloud regions to a unified and more task-specific infrastructure model. Rather than juggling millions of smaller workloads, the company says these sites are optimized to run compute-heavy AI jobs using hundreds of thousands of GPUs.
More Fairwater locations are expected to come online in the coming months as Microsoft expands its AI infrastructure footprint. Microsoft executives say this approach marks a fundamental shift in how large-scale AI systems will be built and operated.
“This is about building a distributed network that can act as a virtual supercomputer for tackling the world’s biggest challenges in ways that you just could not do in a single facility,” said Alistair Speirs, Microsoft general manager focusing on Azure infrastructure.
“A traditional datacenter is designed to run millions of separate applications for multiple customers,” he added. “The reason we call this an AI superfactory is it’s running one complex job across millions of pieces of hardware. And it’s not just a single site training an AI model, it’s a network of sites supporting that one job.”
Most cloud datacenters are built as single-story warehouses to support a broad range of applications. Fairwater takes a different approach, using a two-story design to stack more GPU racks in less space. This vertical layout shortens the distance between components, which helps reduce latency and speed up communication between systems.
Inside the racks, Microsoft is using NVIDIA’s GB200 NVL72 systems — pre-configured clusters of 72 GPUs designed for large-scale AI jobs. The company says this setup allows the Fairwater architecture to scale to hundreds of thousands of GPUs across sites. According to Microsoft, the chip and rack design delivers the highest throughput per rack of any cloud platform it currently offers.
Keeping that hardware cool is another key difference. Instead of traditional cooling towers or constant water intake, Fairwater facilities rely on advanced closed-loop liquid cooling systems. The company says this setup uses almost no ongoing water and supports the heat demands of tightly packed AI accelerators. Intelligent networking within the site helps GPUs talk to each other efficiently, while each location is also hardwired into Microsoft’s private fiber network — connecting to other Fairwater sites as part of a larger distributed system.
“Leading in AI isn’t just about adding more GPUs – it’s about building the infrastructure that makes them work together as one system,” said Scott Guthrie, Microsoft executive vice president of Cloud + AI.
“We’ve spent years advancing the architecture, software and networking needed to train the largest models reliably, so our customers can innovate with confidence. Fairwater reflects that end-to-end engineering and is designed to meet growing demand with real-world performance, not just theoretical capacity,” he said.
Tying it all together is a private network built specifically for AI. Microsoft has laid more than 120,000 miles of fiber to connect its Fairwater sites — not for general cloud traffic, but for high-intensity training jobs that depend on speed and tight coordination. The company built a custom protocol to move data between sites with minimal lag, so even facilities separated by hundreds of miles can operate like one machine.
As more sites come online, the network is designed to grow with them. Each facility follows the same layout, plugs into the same interconnect, and helps spread the energy load across different regions. The idea is to scale without hitting grid limits — and without having to reinvent the architecture every time.
“To make improvements in the capabilities of the AI, you need to have larger and larger infrastructure to train it,” said Mark Russinovich, CTO, deputy CISO, and technical fellow, Microsoft Azure. “The amount of infrastructure required now to train these models is not just one datacenter, not two, but multiples of that.”
Microsoft is making a big bet — not on bigger chips, but on smarter infrastructure. The company believes the future of AI won’t rely on isolated supercomputers, but on tightly connected sites working together as one. Fairwater is its first attempt to prove that idea at scale. Each datacenter plugs into the next, forming a kind of AI mesh across the country.
Others are moving fast too, but so far, no one has publicly tied their facilities together this way. Whether that turns into a lasting advantage or just one path forward, one thing is clear: the AI arms race is heading deeper into the datacenter — and in that race, architecture could matter just as much as raw compute.
Related Items
Powering Data in the Age of AI: Part 3 – Inside the AI Data Center Rebuild
OpenAI Aims to Dominate the AI Grid With Five New Data Centers
The Great Unbundling: Is the All-in-One Data Platform Dead?
The post Inside Microsoft’s AI Superfactory Plan appeared first on BigDATAwire.
