
Share
Researchers at OpenRouter analyzed 100 trillion tokens to reveal how large language models now use multi-step inference for complex problem-solving and creative roleplay, marking a significant leap in AI capabilities.
The landscape of large language models (LLMs) has undergone a significant transformation over the past year, marked by the shift from single-pass pattern generation to multi-step deliberation inference. This change was catalyzed by the release of OpenAI's o1 reasoning model on December 5th, 2024. To better understand this evolution and its real-world applications, a team of researchers from a16z and OpenRouter conducted an empirical study analyzing over 100 trillion tokens of LLM interactions across various tasks, geographies, and time.
Before December 2024, state-of-the-art LLMs were primarily single-pass autoregressive predictors. These models, such as Anthropic's Sonnet 2.1 & 3 and Cohere's Command R, excelled at sophisticated tool use and Retrieval-Augmented Generation (RAG). Open source projects like those by Reflection explored supervised chain-of-thought and self-critique loops during training. However, the fundamental inference procedure remained a single forward pass, producing surface-level outputs learned from data rather than performing iterative internal computation.
The release of OpenAI's o1 model marked a significant turning point. This reasoning model introduced multi-step deliberation inference, allowing for more complex and nuanced interactions. Here are the key technical details:

The study leverages OpenRouter's platform to analyze real-world interactions with LLMs. Key insights include:
The retention analysis identified foundational cohorts-early users whose engagement persists far longer than later adopters. This phenomenon, termed the "Glass Slipper" effect, suggests that early exposure to multi-step inference models can lead to deeper and more sustained user engagement.
The shift from single-pass to multi-step deliberation inference has opened up new possibilities for LLMs. By analyzing over 100 trillion tokens of real-world interactions, this study provides valuable insights into how developers and end-users engage with these models. The findings highlight the importance of a data-driven approach to designing and deploying LLM systems that can meet the evolving needs of users.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
5 December 2025
88 articles
Related Articles
Related Articles
More Stories