
Share
Unsloth's update for Meta’s Llama 3 models slashes finetuning time and VRAM usage, making large-scale language model training more accessible on limited hardware.
Unsloth has just released a significant update for finetuning Meta’s latest Llama 3 models, bringing substantial improvements in speed, memory efficiency, and context length. These enhancements are particularly noteworthy for practitioners working with large-scale language models (LLMs) on limited hardware resources.
Here’s a detailed comparison between Unsloth and the Hugging Face + FA2 setup:
| Model | VRAM | Unsloth Speed | VRAM Reduction | Longer Context | Hugging Face + FA2 | |-------------|--------|-------------------|--------------------|--------------------|------------------------| | Llama-3 8B | 24GB | 2x | 63% | 3x longer | 1x | | Llama-3 70B | 80GB | 1.8x | 68% | 6x longer | 1x |
To achieve these improvements, Unsloth leverages several optimizations:

8B Model on Tesla T4:
70B Model on A100 80GB:
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
26 April 2024
88 articles
Related Articles
Related Articles
More Stories