That’s it, the Nvidia Ampere GTC Keynoteis over. Jensen Huang was back in his trademark black leather jacket, and all was instantly right with the world. We all got to see inside the Nvidia CEO’s kitchen, and marvel at the number of spatulas he has (honestly, why so many spatulas?). 

Kitchen paraphernalia aside, the keynote was made up of eight videos, covering everything from autonomous driving, medical research, conversational AI, and robotics—which are all important for Nvidia, but not us gamers. After all of that you may be left thinking that there was no real information about Nvidia’s next gaming graphics cards. And while there were a few mentions of gaming graphics, there were no direct details about the hardware itself. 

What the GTC keynote did have though, was a whole lot of details on the A100. While the A100 is designed for data centres, the same architecture will make it’s way into next-gen GeForce graphics cards some time in the future.

Nvidia A100 Specs Table
Peak Performance
Transistor Count 54 billion
Die Size 826mm²
FP64 CUDA Cores 3,456
FP32 CUDA Cores 6,912
Tensor Cores 432
Streaming Multiprocessors 108
GPU Memory 40GB
GPU Memory Bandwidth 1.6 TB/s
Interconenct NVLink 600 GB/s | PCIe Gen 4 64GB/s
Multi-Instance GPUs Various Instance sizes with up to 7MIGs @ 5GB
Form Factor 4 / 8 SXM GPUs in HGX A100
Max Power 400W (SXM)

The big take away about the A100 is that it is a big chip, and I’m not just talking about its physical size, although at 826mm square it’s beefy, it’s more the fact that it has squeezed 54 billion transistors in there. It’s not easy to make direct comparisons with the current Turing architecture here, as Nvidia’s previous data center offering was the Volta V100, but still there are some serious step ups in terms of performance. Nvidia claims a 20x performance increase over Volta in certain tasks. For comparison that chip had 21.1bn transistors and measured 815mm square. So the A100 has 2.5x the transistors, but is only slightly bigger. That’s the jump to TSMC’s 7nm process showing its worth.

(Image credit: Nvidia)

If you’re looking for some real time ray tracing, then the second video has plenty of footage of Minecraft RTX, and a delightful marble game called… Marble RTX. There was no talk about ray tracing RT Cores though, as they’re not used in data centers, but Tensor Core performance does appear to have increased considerably. (Although they’ve changed, so there isn’t an easy comparison here). 

The actual number of tensor cores per SM has dropped to four (Turing had eight), but Nvidia still claims an overall performance increase—in some scenarios as much as 2.5x. Those rumours that the tensor performance was going to increase appear to be on the money, we just hadn’t expected there to actually be less cores overall. 

What we don’t know from all of this is what sort of performance we’ll be getting in the next-generation graphics cards. We don’t have the core counts, the clock speeds, or anything like that at this stage, although be sure to check out the everything we know about Nvidia Ampere for the latest info. 

Source Article