
Share
DeepScaleR-1.5B-Preview outperforms OpenAI’s O1-Preview on math benchmarks, showcasing the potential of reinforcement learning in scaling model intelligence and surpassing competition with open-source advancements.
DeepScaleR, a project led by Michael Luo and Sijun Tan, introduces DeepScaleR-1.5B-Preview, a language model fine-tuned using reinforcement learning (RL) from the base model Deepseek-R1-Distilled-Qwen-1.5B. This 1.5 billion parameter model achieves impressive results on AIME2024, surpassing OpenAI’s O1-Preview with a 43.1% Pass@1 accuracy (+14.3% improvement over the base model). The team has open-sourced their dataset, code, and training logs to foster further advancements in scaling intelligence with RL.
| Model | AIME 2024 (Pass@1) | MATH 500 (Pass@1) | AMC 2023 (Pass@1) | Minerva Math (Pass@1) | Olympiad Bench (Pass@1) | Avg. Pass@1 | | --- | --- | --- | --- | --- | --- | --- | | DeepScaleR-1.5B-Preview | 43.1% | 87.8% | 73.6% | 30.2% | 50.0% | 57.0% | | DeepSeek-R1-Distill-Qwen-1.5B | 28.8% | 82.8% | 62.9% | 26.5% | 43.3% | 48.9% | | O1-Preview | 40.0% | 81.4% | - | - | - | - |
The team leveraged RL to fine-tune the base model, focusing on improving its performance on competition-level math benchmarks. Key aspects of their approach include:

One of the primary challenges in scaling RL is the high computational cost. Directly replicating DeepSeek-R1’s experiments, which involve context lengths of 32K tokens and around 8000 training steps, would require at least 70,000 A100 GPU hours for a 1.5B model. To mitigate this, the team employed several strategies:
The DeepScaleR project is committed to transparency and community involvement. They have open-sourced their dataset, code, and training logs to enable others to reproduce their results and build upon their work. This includes:
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
13 February 2025
133 articles
Related Articles
Related Articles
More Stories