Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

LLM Training and Inference with Intel Gaudi 2 AI Accelerators

Jan 06, 2024 - databricks.com
Databricks has been working to optimize its large language model (LLM) stack to support a variety of machine learning hardware platforms, including Intel's Gaudi family of AI Accelerators. The company found that the Intel Gaudi 2 accelerator has the second-best training performance-per-chip, only surpassed by the NVIDIA H100. The Gaudi 2 accelerator also matches the NVIDIA H100 system in decoding latency, the most expensive phase of LLM inference. Furthermore, based on public, on-demand pricing, the Intel Gaudi 2 offers the best training and inference performance-per-dollar, beating out the NVIDIA A100-40GB, A100-80GB, and H100.

The blog post also discusses the hardware and software aspects of the Intel Gaudi 2, its performance in LLM training and inference, and its convergence results. The Gaudi 2 accelerator supports both deep learning training and inference for AI models like LLMs and is built on a 7nm process technology. It also has a heterogeneous compute architecture that includes dual matrix multiplication engines and 24 programmable tensor processor cores. The Intel Gaudi SynapseAI software suite enables PyTorch programs to run seamlessly on Gaudi devices with minimal modifications. The post concludes by stating that the Intel Gaudi 2 is a compelling option for LLM training and inference due to its high performance and cost-effectiveness.

Key takeaways:

  • Databricks has been working to optimize its machine learning stack to support a variety of hardware platforms, including the Intel® Gaudi® family of AI Accelerators.
  • The Intel® Gaudi® 2 accelerator has the 2nd best training performance-per-chip tested by Databricks, only bested by the NVIDIA H100, and offers the best training and inference performance-per-dollar.
  • Intel® Gaudi SynapseAI software suite enables PyTorch programs to run seamlessly on Gaudi devices with minimal modifications, making it easier for developers to train custom AI models.
  • The upcoming Intel Gaudi 3 is expected to have more FLOP/s and memory bandwidth than all the major competitors, making it a strong competitor in the AI training and inference market.
View Full Article

Comments (0)

Be the first to comment!