
Share
OpenAI's breakthrough LLM solves complex IMO problems with gold medal performance, showcasing advanced creative thinking and problem-solving skills previously thought unattainable for AI.
OpenAI has announced a significant breakthrough in artificial intelligence with their latest experimental reasoning language model (LLM), which achieved gold medal-level performance on the 2025 International Mathematical Olympiad (IMO). This achievement marks a major milestone in AI's ability to handle complex, creative problem-solving tasks that have long eluded machines.
The new LLM has demonstrated an unprecedented level of sustained creative thinking and reasoning, surpassing previous benchmarks in both depth and duration. Here’s a breakdown of the key technical advancements:
The model was evaluated under the same conditions as human contestants in the 2025 IMO:

The success of this project is credited to a dedicated team at OpenAI, including @SherylHsu02, @polynoamial, and many others. Alexander Wei, one of the lead researchers, expressed his excitement about the achievement and the collaborative effort behind it.
While the IMO gold LLM is an experimental research model and not yet available for public use, OpenAI plans to release GPT-5 soon. However, they do not intend to release a version with this level of mathematical capability for several months.
In 2021, Alexander Wei's PhD advisor @JacobSteinhardt had him forecast AI’s progress in mathematics by July 2025. At the time, Wei predicted that AI would achieve 30% performance on the MATH benchmark. The reality far exceeded these expectations, with the model not only surpassing the MATH benchmark but also achieving gold medal-level performance on the IMO.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
21 July 2025
88 articles
Related Articles
Related Articles
More Stories