
Share
Using FSDP and QLoRA, Answer.AI has developed a system that trains massive 70B-parameter LLMs on desktop GPUs, making advanced AI capabilities accessible to researchers and developers worldwide.
Today, Answer.AI is proud to announce the release of a groundbreaking open-source system that can efficiently train a 70-billion-parameter large language model (LLM) on a regular desktop computer equipped with two or more standard gaming GPUs (RTX 3090 or 4090). This achievement, made possible through a collaboration with Tim Dettmers from the University of Washington and Hugging Face’s Titus von Koeller and Sourab Mangrulkar, marks a significant step towards democratizing AI.
The key innovation lies in the combination of Fully Sharded Data Parallel (FSDP) and Quantized LoRA (QLoRA). Here's a breakdown:
Fully Sharded Data Parallel (FSDP):
Quantized LoRA (QLoRA):
This system is a game-changer for the open-source community and small labs. Traditionally, training LLMs required expensive data center GPUs like the NVIDIA A100 or H100, which can cost hundreds of thousands of dollars. In contrast, a desktop setup with dual RTX 4090 GPUs costs under $10,000 (or even less if using second-hand parts).
Despite the significant price difference, gaming GPUs like the RTX 4090 offer performance comparable to their data center counterparts. The primary limitation has been memory capacity: while data center GPUs can have up to 80GB of RAM, consumer GPUs are capped at 24GB. FSDP and QLoRA address this by optimizing memory usage and enabling efficient training on these lower-memory devices.

Architecture:
Benchmarks:
Teknium, known for creating popular OpenHermes models and datasets (with over half a million downloads), has already embraced this new capability:
“With this capability, we can take huge models to new heights locally, and gigantic, hundreds of billions of parameter models are now accessible by small labs.”
At Answer.AI, our mission is to make useful AI available to everyone. While using pre-trained models from others is valuable, the ability to create personalized models empowers users to control their own AI systems.
The release of this open-source system combining FSDP and QLoRA represents a significant leap forward in making large language model training accessible. It opens up new possibilities for researchers, developers, and small labs, ensuring that cutting-edge AI technology is no longer limited to those with deep pockets.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
11 March 2024
133 articles
Related Articles
Related Articles
More Stories