
Share
Databricks reveals Intel's Gaudi 2 excels in large language model training and inference, offering a robust alternative to traditional ML hardware with superior performance and efficiency.
At Databricks, we're always looking for ways to help our customers build and deploy generative AI applications efficiently while maintaining data privacy and control. One key area of focus is optimizing machine learning (ML) hardware, and today we’re excited to share our findings with Intel's Gaudi 2 AI accelerators.
Intel’s Gaudi 2 family of AI accelerators offers a compelling alternative for training and inference workloads. These accelerators are available via AWS (first-generation Gaudi), the Intel Developer Cloud (Gaudi 2), and on-premises through Supermicro and WiWynn. Our tests with Gaudi 2 have shown impressive performance, making it a strong contender in the AI hardware market.
We evaluated the Intel Gaudi 2 for large language model (LLM) training using our open-source LLM Foundry. Here’s what we found:
For inference, we used the open-source Optimum Habana library to profile the performance of the LLaMa2-70B model on an 8 x Gaudi 2 system. The results were impressive:

Since the Intel Gaudi 2 is available via the Intel Developer Cloud (IDC), we could also estimate performance per dollar. Based on public, on-demand pricing from Lambda and Intel, the Gaudi 2 stands out as a cost-effective option for both training and inference workloads.
All our results were measured using SynapseAI 1.12 and BF16 mixed precision training. However, we're looking forward to SynapseAI 1.13, which will introduce support for FP8 training. This is a significant improvement:
The Intel Gaudi 2 AI accelerators offer robust performance for both LLM training and inference, making them a viable option for organizations looking to optimize their AI workloads. With the upcoming enhancements in SynapseAI 1.13, we expect even better results, particularly with FP8 support.
Stay tuned for future updates as we continue to explore optimizations on various hardware platforms, including NVIDIA H100 with FP8 support.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
5 January 2024
88 articles
Related Articles
Related Articles
More Stories