
Share
ThunderMittens harnesses Apple Silicon’s power by porting ThunderKittens, streamlining machine learning tasks on devices like the M2 Pro and pushing the boundaries of edge AI efficiency.
With the increasing demand for on-edge training and inference, optimizing machine learning (ML) models for edge devices has become a critical challenge. Traditional data center GPUs offer massive compute power but are overkill for many edge use cases. Enter ThunderMittens, a project by HazyResearch that ports ThunderKittens (TK) to Apple Silicon using Metal Shading Language (MSL). This initiative aims to bring high-performance ML to the Apple M2 Pro, a chip with unique hardware properties that require a tailored approach.
The Apple M2 Pro is known for its impressive memory bandwidth relative to compute power. It offers around 200GB/s of memory bandwidth and approximately 6.5 TFLOPs of compute. For context, consumer-grade NVIDIA RTX 4090s provide about 1000GB/s of memory bandwidth and 82.58 TFLOPs of compute, achieving a flops-to-byte ratio of 2.5x. This means the M2 Pro has:
To port TK to MSL, the team had to address several hardware-specific challenges:

One of the key challenges was maintaining high occupancy, especially for Fast Attention (FA) kernels. For example, a FA kernel with dimension D=128 can significantly impact performance due to increased register usage and reduced occupancy. The team had to carefully balance these factors to achieve optimal results.
The team conducted extensive benchmarks to evaluate the performance of ThunderMittens on the M2 Pro. The results showed that:
ThunderMittens is a significant step towards bringing high-performance ML to edge devices like the Apple M2 Pro. By addressing hardware-specific challenges and optimizing for performance, this project paves the way for more efficient on-edge training and inference. The insights gained from this port can inform future developments in edge AI, making it more accessible and powerful.
Tags
Original Sources
↗ https://hazyresearch.stanford.edu/blog/2024-11-28-tk-mlx?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
29 November 2024
88 articles
Related Articles
Related Articles
More Stories