What’s the difference between Nvidia’s Blackwell family and when do you use them?

February 21, 2025 / Max Weinbach

A few days ago, xAI announced Grok 3 and it’s easily the best LLM I’ve used. Good vibes, great quality, fast token generation, and happened to be trained on the most compute for any model (200,000 H100s). During the stream, Elon Musk said xAI is already working on Grok 4’s data center, which will be based on Blackwell chips. This, combined with Google Cloud announcing they are the first to offer both GB200 and B200 nodes, lead to an interesting thought: What’s the difference between B200 and GB200, and when would you use each of them?

Nvidia GB200

GB200 is, as it sounds, the Grace Blackwell 200. It’s 2 Blackwell GPUs connected using NVLink C2C to a Grace CPU. A single GB200 is 2 Blackwell GPUs to 1 Grace CPU. A single GB200 chip has up to 384GB HBM3e at 16 TB/s memory bandwidth. The NVLink Bandwidth is 3.6TB/s. This is chip to chip interconnect (not to be confused with NVLink C2C which is the chiplet interconnect), which matters a lot for the roofline arithmetic intensity for multi-chip training. Essentially the bandwidth between the two Blackwell GPUs on the GB200 and other GB200s in a node.

These GPUs are connected with a PCB level NVLink C2C interface (same interface connecting the two GPUS) to the CPU. The CPU is a Grace CPU with 72 Arm Neoverse V2 cores with up to 480GB of LPDDR5x memory and 512GB/s memory bandwidth.

GB200 comes in 3 main versions: GB200 Grace Blackwell Superchip, GB200 NVL4, and GB200 NVL72. GB200 Superchip is a single version of the GB200, 2 GPUs connected to 1 CPU. GB200 NVL4 comes with 4 GPUs and 2 CPUs, essentially 2 GB200s on a single board already connected with NVLink. This is a good rack level solution.

GB200 NVL72, on the other hand, has 72 GPUs with 36 CPUs. This is a full rack/cabinet solution. It connects all 72 GPUs to function as one node, or a effectively a single GPU.

TL;DR: GB200 is 2 GPUs to 1 Arm CPU, comes in 3 sizes: 2 GPU to 1 CPU, 4 GPU to 2 CPU, 72 GPU to 36 CPU.

There’s a lot that goes into Transformer model AI training (LLMs), but simply put the higher the memory bandwidth between chips in a node, the more efficient the model training can be, and the more compute you effectively have access to. Because of NVLink’s bandwidth being so much higher when on the same PCB vs normal networking, GB200s have an advantage over B200s in training.

The benefit of Grace CPU here is the vertical control and NVLink. On other systems, you connect a CPU to GPUs over PCIe lanes and have 1-2 CPUs managing around 8 or so GPUs. PCIe Gen 5 x16 bandwidth is 128 GB/s. The just finalized and upcoming PCIe Gen 6 doubles that to 256GB/s. But that’s still a fraction of what Grace offers using NVLink in GB200s, which is 3.6TB/s. In order to limit latency, you need direct chip connections and that’s going to require some sort of vertical control. Nvidia has that in GB200 and is able to reduce the latency and bandwidth, which makes training more efficient.

If you want the most effective compute with the best memory bandwidth to train the best models, some version of GB200 makes the most sense.

Nvidia B200

B200, unlike GB200, is a single GPU. Each GPU has 180GB of memory and is, as far as I can tell, the same actual GPU die used in the GB200. This is closer to the standard replacement for the H100. Scales up the performance and memory with the Blackwell architecture, but still allows the data centers to use, and this is the important part, x86 CPUs.

GB200 is Arm through and through, its 2 GPUs connected at a board level to an Arm CPU. There is no choice, no option. If you want the best training on Blackwell, you need to use the Arm CPU. While this is something that makes sense for large scale training, it may not for inference.

I think it’s safe to say the majority of AI inference today is running on GPUs connected to x86 CPUs. I don’t think that changes. You also don’t need 72 GPUs connected to each other to run inference of a model like Grok 3, GPT-4o, DeepSeek R1, Claude 3.5 Sonnet, etc. These are all estimated to be between 300-800B parameters, or around 600GB to 1.2TB of memory needed to run the model.

It would be inefficient, though really fast, to run this on a GB200 NVL72. Frankly, it makes sense to just get a node of 8xB200 and connect to an Intel or AMD CPU, and use those for inference. I believe this makes sense for hyperscalers and enterprises looking to install their own inference framework due to the maturity of x86 software, maintenance cost, and practicality.

So, after all of this, the difference is GB200 is a system connecting 2 GPUs to an Arm CPU and B200 is a single GPU you can put into any data center system. GB200 is best for training, B200 is the best value for inference. You can use GB200 for inference, just as you can use B200 for training but you want to use these pieces of silicon in the most effective way possible.

There’s a lot more that goes into the GB200 systems, like the NVLink switch chip, networking spine, etc, but in terms of comparing the Blackwell family, this is the easiest way to understand it!

What I think will be most interesting going forward is seeing which one has a higher install base within the next 18 months, GB200 or B200. Do hyperscalers feel training silicon is more important than inference? Where and when do we see the pivot? What size customers buy which piece of silicon?

The expectation is many large organizations will begin to deploy their AI solutions to their organizations at scale in the second half of the year. While I’m not convinced this is the reality of AI models and ecosystem today, this matters because it means much more ROI in terms of customer dollars and inference at scale means monetizing their infrastructure investments. But, many customers still are chasing training and will need what GB200 brings offers in performance and efficiency. As the year rolls on, this mix of GB200 vs. B200 will be something we are watching closely.

Join the newsletter and stay up to date

What’s the difference between Nvidia’s Blackwell family and when do you use them?

February 21, 2025 / Max Weinbach

Join the newsletter and stay up to date

Trusted by 80% of the top 10 Fortune 500 technology companies