OpenRouter's 100 Trillion Token Study Reveals Multi-Step Inference and Creative Roleplay in LLMs

Models & Research

The Engineer

5 Dec 2025 · 3 min read

Researchers at OpenRouter analyzed 100 trillion tokens to reveal how large language models now use multi-step inference for complex problem-solving and creative roleplay, marking a significant leap in AI capabilities.

The landscape of large language models (LLMs) has undergone a significant transformation over the past year, marked by the shift from single-pass pattern generation to multi-step deliberation inference. This change was catalyzed by the release of OpenAI's o1 reasoning model on December 5th, 2024. To better understand this evolution and its real-world applications, a team of researchers from a16z and OpenRouter conducted an empirical study analyzing over 100 trillion tokens of LLM interactions across various tasks, geographies, and time.

Key Findings

Substantial Adoption of Open-Weight Models: The study highlights the growing popularity of open-weight models, which allow developers to inspect and modify the model's architecture and training data.
Creative Roleplay Beyond Productivity Tasks: Contrary to common assumptions, creative roleplay has emerged as a major use case for LLMs, alongside coding assistance.
Rise of Agentic Inference: The study observes an increase in agentic inference, where models can plan and execute tasks autonomously.
Foundational Cohorts and the "Glass Slipper" Effect: Early users of LLMs show higher retention rates compared to later adopters.

Background

Before December 2024, state-of-the-art LLMs were primarily single-pass autoregressive predictors. These models, such as Anthropic's Sonnet 2.1 & 3 and Cohere's Command R, excelled at sophisticated tool use and Retrieval-Augmented Generation (RAG). Open source projects like those by Reflection explored supervised chain-of-thought and self-critique loops during training. However, the fundamental inference procedure remained a single forward pass, producing surface-level outputs learned from data rather than performing iterative internal computation.

The Shift to Multi-Step Inference

The release of OpenAI's o1 model marked a significant turning point. This reasoning model introduced multi-step deliberation inference, allowing for more complex and nuanced interactions. Here are the key technical details:

Multi-Step Deliberation: Unlike single-pass models, o1 can break down tasks into multiple steps, each involving internal computation and decision-making.
Agentic Behavior: The model can plan and execute sequences of actions autonomously, enhancing its ability to handle complex tasks.
Tool Use and RAG: Enhanced capabilities for using external tools and integrating retrieved information into the reasoning process.

Real-World Applications

The study leverages OpenRouter's platform to analyze real-world interactions with LLMs. Key insights include:

Creative Roleplay: Surprisingly, creative roleplay has become a significant use case, indicating that users are exploring LLMs for more than just productivity tasks.
Coding Assistance: The popularity of coding assistance continues to grow, with LLMs helping developers write and debug code more efficiently.
Geographical Distribution: The study reveals diverse usage patterns across different regions, highlighting the global impact of LLMs.

User Retention Analysis

The retention analysis identified foundational cohorts-early users whose engagement persists far longer than later adopters. This phenomenon, termed the "Glass Slipper" effect, suggests that early exposure to multi-step inference models can lead to deeper and more sustained user engagement.

Implications for Model Builders and Developers

Data-Driven Design: Understanding real-world usage patterns can inform better design and deployment of LLM systems.
User Experience: Enhancing the user experience by focusing on creative and agentic capabilities can drive higher adoption and retention.
Infrastructure Needs: As multi-step inference becomes more prevalent, infrastructure providers will need to adapt to support these more complex interactions.

Conclusion

The shift from single-pass to multi-step deliberation inference has opened up new possibilities for LLMs. By analyzing over 100 trillion tokens of real-world interactions, this study provides valuable insights into how developers and end-users engage with these models. The findings highlight the importance of a data-driven approach to designing and deploying LLM systems that can meet the evolving needs of users.