OpenAI’s LLM Achieves Gold Medal Performance on IMO, Pushing AI Reasoning to New Heights

Models & Research

The Engineer

21 Jul 2025 · 3 min read

OpenAI's breakthrough LLM solves complex IMO problems with gold medal performance, showcasing advanced creative thinking and problem-solving skills previously thought unattainable for AI.

OpenAI has announced a significant breakthrough in artificial intelligence with their latest experimental reasoning language model (LLM), which achieved gold medal-level performance on the 2025 International Mathematical Olympiad (IMO). This achievement marks a major milestone in AI's ability to handle complex, creative problem-solving tasks that have long eluded machines.

What Changed Technically?

The new LLM has demonstrated an unprecedented level of sustained creative thinking and reasoning, surpassing previous benchmarks in both depth and duration. Here’s a breakdown of the key technical advancements:

Problem-Solving Horizon: The model's reasoning capabilities have evolved from solving short, straightforward problems (like GSM8K, which takes top humans about 0.1 minutes) to tackling more complex tasks (MATH benchmark, ~1 minute), advanced competitions (AIME, ~10 minutes), and finally, the IMO (which requires sustained effort over ~100 minutes).
Complexity of Solutions: IMO problems often require multi-page proofs that are difficult to verify. The model can now craft intricate, watertight arguments similar to those produced by human mathematicians.
General-Purpose Approach: Unlike task-specific methodologies, this LLM leverages general-purpose reinforcement learning and test-time compute scaling to achieve its capabilities. This approach is more flexible and potentially applicable to a wider range of problems.

Evaluation Details

The model was evaluated under the same conditions as human contestants in the 2025 IMO:

Exam Format: Two 4.5-hour sessions
Tools Allowed: None (no internet or calculators)
Problem Statements: Official problem statements were used
Proof Submission: Natural language proofs

Performance Highlights

Problems Solved: The model successfully solved 5 out of the 6 problems on the 2025 IMO.
Grading Process: Each solution was independently graded by three former IMO medalists, with final scores determined by unanimous consensus.
Total Score: The model earned a total of 35 out of 42 points, which is sufficient for a gold medal.

Why It Matters

Creative Reasoning: IMO problems require sustained creative thinking and the ability to construct multi-step proofs. This level of reasoning has historically been a significant challenge for AI.
Verification Challenges: The difficulty in verifying complex, multi-page proofs means that traditional reinforcement learning (RL) methods, which rely on clear-cut rewards, are not sufficient. OpenAI’s model overcomes this by crafting detailed, verifiable arguments.
General-Purpose Capabilities: By using a general-purpose approach, the model can potentially be applied to other complex reasoning tasks beyond mathematics.

Team and Future Plans

The success of this project is credited to a dedicated team at OpenAI, including @SherylHsu02, @polynoamial, and many others. Alexander Wei, one of the lead researchers, expressed his excitement about the achievement and the collaborative effort behind it.

While the IMO gold LLM is an experimental research model and not yet available for public use, OpenAI plans to release GPT-5 soon. However, they do not intend to release a version with this level of mathematical capability for several months.

Reflections on Progress

In 2021, Alexander Wei's PhD advisor @JacobSteinhardt had him forecast AI’s progress in mathematics by July 2025. At the time, Wei predicted that AI would achieve 30% performance on the MATH benchmark. The reality far exceeded these expectations, with the model not only surpassing the MATH benchmark but also achieving gold medal-level performance on the IMO.