
Share
Exploring how boosting computational resources affects both the training and deployment of AI models through RL-scaling and inference-scaling reveals critical insights for cutting costs and enhancing performance in AI development.
In the ongoing quest to enhance AI capabilities, two primary types of scaling have emerged within the realm of Reinforcement Learning (RL): RL-scaling and inference-scaling. These concepts are crucial for practitioners as they directly impact both the development and deployment costs of AI models.
The initial breakthrough in this area came with OpenAI’s announcement of their first reasoning model, o1. The famous chart from that announcement highlights how both RL-scaling and inference-scaling contributed to the model's performance gains.
Toby Ord, in his previous analysis, demonstrated that the initial move from a base model to a reasoning model primarily benefited from inference-scaling. The RL training did provide a notable boost to performance, even when the number of tokens in the chain of thought was fixed. This is evident in the small blue arrow on the left side of the chart, which shows the base model improving to the trend line for the reasoning model.
However, the real game-changer was the ability to use much longer chains of thought, approximately 30 times longer in this example. These extended chains contributed a significantly larger boost to performance.

Understanding where these capability gains come from is crucial because the implications of scaling inference compute versus training compute are vastly different:
For AI developers and practitioners, this distinction is critical:
The interplay between RL-scaling and inference-scaling is a key factor in the development and deployment of advanced AI models. While RL training provides an initial boost, it is the ability to use longer chains of thought during inference that drives significant performance gains. As we continue to push the boundaries of AI capabilities, keeping these scaling dynamics in mind will be essential for optimizing both cost and performance.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
31 October 2025
88 articles
Related Articles
Related Articles
More Stories