Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Nvidia Tops Llama 2, Stable Diffusion Speed Trials

Mar 27, 2024 - spectrum.ieee.org
The MLPerf benchmark tests have added two new generative AI models, Llama 2 70B and Stable Diffusion XL, to their inferencing tests. The tests, which have been dominated by computers with Nvidia GPUs, particularly those with the H200 processor, also saw AI accelerators from Intel and Qualcomm in the mix. The Llama 2 model, with 70 billion parameters, requires a different class of hardware, according to MLCommons, the organizer of the tests. Stable Diffusion XL, a text-to-image generation benchmark, has 2.6 billion parameters, less than half the size of GPT-J.

In the data center inference results, an Nvidia H200 system that combined eight GPUs with two Intel Xeon CPUs performed best in the new generative AI categories. The system managed just under 14 queries per second for Stable Diffusion and about 27,000 tokens per second for Llama 2 70B. Intel's Gaudi 2 accelerator was the only alternative to Nvidia, delivering less than half the performance of the H100 in an 8-GPU configuration for Stable Diffusion XL. However, Intel argues that if performance per dollar is measured, the Gaudi 2 is about equal to the H100.

Key takeaways:

  • MLPerf has added two massive generative AI benchmarks, Llama 2 70B and Stable Diffusion XL, to its inferencing tests. Llama 2 has 70 billion parameters, requiring a different class of hardware, while Stable Diffusion XL has 2.6 billion parameters.
  • In the data center inference results, the top performer was an Nvidia H200 system that combined eight of the GPUs with two Intel Xeon CPUs. It managed just under 14 queries per second for Stable Diffusion and about 27,000 tokens per second for Llama 2 70B.
  • Intel's Gaudi 2 accelerator was the only alternative to Nvidia in MLPerf’s inferencing benchmarks. While it delivered less raw performance than Nvidia's H100, Intel argues that when measuring performance per dollar, the Gaudi 2 is about equal to the H100.
  • In the edge inferencing results, the top performer was a system using two Nvidia L40S GPUs and an Intel Xeon CPU. In the power consumption category, the contest around energy efficiency was between Nvidia and Qualcomm, with the latter focusing on energy efficient inference since introducing the Cloud AI 100 processor.
View Full Article

Comments (0)

Be the first to comment!