In the data center inference results, an Nvidia H200 system that combined eight GPUs with two Intel Xeon CPUs performed best in the new generative AI categories. The system managed just under 14 queries per second for Stable Diffusion and about 27,000 tokens per second for Llama 2 70B. Intel's Gaudi 2 accelerator was the only alternative to Nvidia, delivering less than half the performance of the H100 in an 8-GPU configuration for Stable Diffusion XL. However, Intel argues that if performance per dollar is measured, the Gaudi 2 is about equal to the H100.
Key takeaways:
- MLPerf has added two massive generative AI benchmarks, Llama 2 70B and Stable Diffusion XL, to its inferencing tests. Llama 2 has 70 billion parameters, requiring a different class of hardware, while Stable Diffusion XL has 2.6 billion parameters.
- In the data center inference results, the top performer was an Nvidia H200 system that combined eight of the GPUs with two Intel Xeon CPUs. It managed just under 14 queries per second for Stable Diffusion and about 27,000 tokens per second for Llama 2 70B.
- Intel's Gaudi 2 accelerator was the only alternative to Nvidia in MLPerf’s inferencing benchmarks. While it delivered less raw performance than Nvidia's H100, Intel argues that when measuring performance per dollar, the Gaudi 2 is about equal to the H100.
- In the edge inferencing results, the top performer was a system using two Nvidia L40S GPUs and an Intel Xeon CPU. In the power consumption category, the contest around energy efficiency was between Nvidia and Qualcomm, with the latter focusing on energy efficient inference since introducing the Cloud AI 100 processor.