Nvidia Blackwell Ultra: Microsoft launches GB300 cluster with 4,608 GPU

Shero King

Updated: 12 Oct, 2025 • 0 • min read

Table of Contents

Nvidia Blackwell Ultra: Microsoft launches GB300 cluster with 4,608 GPUs

Source: Microsoft

After Nvidia Hopper and the first Blackwell accelerators, the Blackwell Ultra expansion stage is now entering the market. The world's first supercomputing cluster based on Nvidia's GB300 NVL72 with 4,608 GPUs has now been put into operation at Microsoft Azure. The racks are tightly packed and rely on liquid cooling.

First supercomputing cluster with Blackwell Ultra

Whenever a new generation of Nvidia has reached series maturity, is produced in larger quantities and has finally reached the first important customers, there are corresponding partner announcements from both sides. As Microsoft announced on the Azure blog and Nvidia itself in a further post, the world's first supercomputing cluster based on Nvidia's GB300 NVL72 with a total of 4,608 GPUs has now been put into operation at Microsoft Azure. The newly added computing power will be made available exclusively to OpenAI.

Designed for demanding inferencing

Nvidia had announced the mid-cycle refresh on Blackwell Ultra at the GTC in March and had promised it for the second half of the year. Among other things, Blackwell Ultra was developed for the higher requirements in the inferencing of AI reasoning models, which have to quickly process and spend several hundred thousand tokens per request. At OpenAI, Blackwell Ultra is to be used for particularly demanding inferencing, as Nvidia explains on the company blog.

50 Percent more HBM3e

To meet these requirements, Nvidia is expanding Blackwell Ultra's memory to 288 GB HBM3e for each B300 GPU, starting from 192 GB HBM3e at Blackwell with the B200 GPU. According to Nvidia, Blackwell Ultra should deliver 1.5 times the FP4 inferencing performance compared to Blackwell, the company is talking about 15 PetaFLOPS for FP4 Dense, that is, without the sparsity acceleration, with which 30 PetaFLOPS are possible.

Tightly packed racks with liquid cooling

The new generation is available in two versions: GB300 NVL72 and HGX B300 NVL16. The GB300 NVL72 is a further development of the well-known rack from GB200 NVL72, which combines 72 Blackwell GPUs with 36 Grace CPUs with Arm architecture in a tightly packed server cabinet with liquid cooling. HGX B300 NVL16 is the variant in the 8U chassis with air-cooled GPUs and x86 processors, which is easier to integrate into existing data centers and server cabinets that are not yet completely designed for liquid cooling.

Microsoft Azure now has 64 GB300 NVL72 racks, which means: 72 GPUs, 21 TB HBM3e, 130 TB/s NVLink bandwidth, 36 CPUs, 40 TB LPDDR5X and 800 Gbit/s scale-out bandwidth with Quantum-X800 InfiniBand per rack. 18 compute trays with four GPUs and two CPUs each are used per rack, and nine NVLink switches are also installed per rack. In summary, this results in a performance of 1.44 ExaFLOPS for FP4 Tensor Core per VM. At Azure, the new VMs are called ND GB300 v6 and follow the ND GB200 v6 with GB200 NVL72 announced in March this year.

Shero King About Shero King – Independent News, Fearless Journalism

Nvidia Blackwell Ultra: Microsoft launches GB300 cluster with 4,608 GPU

First supercomputing cluster with Blackwell Ultra

Designed for demanding inferencing

50 Percent more HBM3e

Tightly packed racks with liquid cooling

Post a Comment