
Share
Unsloth slashes fine-tuning time for large language models by half and cuts memory usage by 40%, making it an indispensable tool for developers working within the Hugging Face ecosystem.
If you've been frustrated by the painfully slow process of fine-tuning large language models (LLMs), you're not alone. Enter Unsloth, a lightweight library developed by the community to significantly speed up LLM fine-tuning while reducing memory usage and maintaining accuracy. This tool is fully compatible with the Hugging Face ecosystem, including the Hub, transformers, PEFT, and TRL libraries.
Unsloth is a library designed to optimize the fine-tuning process of LLMs by overwriting certain parts of the modeling code with highly optimized operations. The key improvements are:
Unsloth supports a wide range of NVIDIA GPUs, from GTX 1070 to H100, making it accessible for both hobbyists and professionals. It also integrates seamlessly with the entire trainer suite from the TRL library, including:
At the time of writing, Unsloth supports the following model architectures:

Unsloth achieves its performance gains through a combination of manual backpropagation and Triton kernel optimization. Here’s a breakdown:
Let’s look at some benchmarks to see how Unsloth performs compared to standard Hugging Face methods and other optimizations like Flash Attention 2.
| Model | Dataset | 🤗 Hugging Face | 🤗 + Flash Attention 2 | 🦥 Unsloth | 🦥 VRAM Reduction | | --- | --- | --- | --- | --- | --- | | Code Llama 34b | Slim Orca | 1x | 1.01x | 1.94x | -22.7% | | Llama-2 7b | Slim Orca | 1x | 0.96x | 1.87x | -39.3% | | Mistral 7b | Slim Orca | 1x | 1.17x | 1.88x | -65.9% | | Tiny Llama 1.1b | Alpaca | 1x | 1.55x | 2.74x | -57.8% | | DPO with Zephyr | Ultra Chat | 1x | 1.24x | 1.88x | -11.6% |
| Model | Dataset | 🤗 Hugging Face | 🤗 + Pytorch 2.1.1 | 🦥 Unsloth | 🦥 VRAM Reduction | | --- | --- | --- | --- | --- | --- | | Llama-2 7b | OASST | 1x | 1.19x | 1.95x | -43.3% | | Mistral 7b | Alpaca | 1x | 1.07x | 1.56x | -13.7
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
11 January 2024
133 articles
Related Articles
Related Articles
More Stories