
Share
Tülu 3 405B builds on its predecessor with groundbreaking post-training techniques, outclassing DeepSeek V3 in safety and math skills while setting new standards for open-weight models.
January 30, 2025
Following the successful launch of Tülu 3 in November, we are excited to announce Tülu 3 405B-a significant milestone in applying fully open post-training recipes to the largest open-weight models. This new release not only scales our post-training techniques to a massive 405 billion parameters but also outperforms DeepSeek V3 and other leading models on various benchmarks, particularly in safety and mathematical problem-solving.
The primary goal of this release was to test and refine our RLVR approach and training infrastructure at a large scale. The training recipe for the 405B model closely follows the methods used for the 8B and 70B models:

Reinforcement Learning with Verifiable Rewards (RLVR) is a key component of our post-training recipe. This method trains language models on tasks with verifiable outcomes, such as solving mathematical problems or following instructions accurately.
To scale RLVR at 405B parameters, we deployed the model using vLLM with 16-way tensor parallelism. The remaining 240 GPUs were used for training. After each RLVR iteration, the weights are synchronized to the vLLM engine using NCCL broadcast, which ensures efficient and consistent updates.
Tülu 3 405B represents a major step forward in scaling post-training techniques to large models. By leveraging our RLVR framework and other advanced training methods, we have achieved superior performance across various benchmarks, making this model a strong contender in the field of AI research and application.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
31 January 2025
88 articles
Related Articles
Related Articles
More Stories