
Share
Explore the intricacies of scaling large language models on TPUs and GPUs with Google DeepMind’s Jacob Austin, revealing secrets to optimizing performance and efficiency in model training.
Training large language models (LLMs) can often feel like navigating a complex, arcane process. However, understanding and optimizing the performance of these models doesn't have to be shrouded in mystery. This guide, part of the "Scaling Book" by Jacob Austin and his colleagues at Google DeepMind, aims to demystify the science behind scaling LLMs on TPUs (Tensor Processing Units) and GPUs. It provides a systems view of how these accelerators work, communicate with each other, and how you can parallelize your models for efficient training and inference at scale.
The book is authored by a team of experts from Google DeepMind and other institutions:
To get the most out of this guide, you should have a basic understanding of LLMs and the Transformer architecture. Familiarity with LLM training and some experience with JAX (a numerical computation library) will also be helpful. If you need to brush up on these topics, consider reading:
The roofline model is a powerful tool for understanding the performance limits of your hardware. It helps you identify whether your model is compute-bound or memory-bound, guiding optimization efforts:

Choosing the right parallelism scheme is crucial for efficient scaling:
Effective communication between devices is essential for parallelized training:
Leverage specific features of TPUs and GPUs to optimize performance:
By the end of this guide, you should feel confident in estimating the best parallelism scheme for a
Tags
Original Sources
↗ https://jax-ml.github.io/scaling-book/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
5 February 2025
88 articles
Related Articles
Related Articles
More Stories