
Share
O1 challenges conventional AI models by prioritizing lengthy internal thought processes over rapid responses, revolutionizing complex tasks like mathematics and coding with enhanced reasoning capabilities.
OpenAI’s latest model, o1, has been making waves in the AI community. While it was initially believed to be a post-trained version of GPT-4o, the real intrigue lies in its unique approach to generating and exploring long internal chains of thought before responding. This shift is particularly significant for tasks that require deeper reasoning, such as math, coding, and research.
Some problems inherently demand more "thinking time" or partial work to reach the correct solution. This is especially true in domains like mathematics, programming, and scientific research. By training a model to use tokens specifically for thinking, o1 aims to achieve higher performance at the expense of longer generation times.
According to OpenAI’s launch post and system card, o1 shows a clear preference over GPT-4o in complex tasks:
When GPT-4 was first announced, OpenAI kept its implementation details under wraps. Over time, it became clear that GPT-4 was a mixture-of-experts (MoE) model, combining 8 copies of a 220B parameter model. This approach was initially seen as a fallback when more innovative ideas were exhausted.

George Hotz famously quipped, “mixture[-of-experts] models are what you do when you run out of ideas”. However, the reality is more nuanced. The Switch Transformers paper demonstrated that MoE models have distinct scaling properties and can be more compute-efficient.
o1 takes a different approach by focusing on how it uses compute at test-time-specifically, the number of response tokens generated for each user query. Here are the key points:
This strategy shifts the paradigm from simply increasing model size to optimizing how compute is allocated during inference. Instead of building a bigger bowl (i.e., a larger model), o1 uses multiple bowls (i.e., more tokens) to achieve better results.
For practitioners, this means:
OpenAI’s o1 represents a significant step forward in how we approach long chain thinking and test-time compute. By training models to use tokens more effectively for reasoning, o1 opens up new possibilities for solving complex problems. As the AI landscape continues to evolve, it will be interesting to see how this approach influences future model architectures and applications.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
11 December 2024
88 articles
Related Articles
Related Articles
More Stories