GPT-5.2 and GPT-5.3 Codex: OpenAI's Latest Breakthroughs on NVIDIA Infrastructure

Models & Research

The Engineer

16 Dec 2025 · 3 min read

OpenAI's GPT-5.2 and GPT-5.3 Codex, trained on NVIDIA’s cutting-edge hardware, mark significant strides in AI capabilities, setting new benchmarks and pushing the boundaries of professional knowledge work.

In December, OpenAI unveiled the latest in its series of highly capable models for professional knowledge work with the launch of GPT-5.2. This model was trained and deployed using NVIDIA’s state-of-the-art infrastructure, including the NVIDIA Hopper architecture and GB200 NVL72 systems. Following this, in February, OpenAI introduced GPT-5.3 Codex, an agentic coding model designed to help build itself, which was also trained and served entirely on GB200 NVL72.

Technical Advances and Benchmarks

GPT-5.2 has set new benchmarks for industry standards, achieving top scores in critical evaluations like GPQA-Diamond, AIME 2025, and Tau2 Telecom. These benchmarks are crucial for assessing the model’s ability to handle complex tasks that require advanced reasoning and problem-solving skills. On ARC-AGI-2, a benchmark specifically designed to evaluate the development of Artificial General Intelligence (AGI), GPT-5.2 has set a new bar for state-of-the-art performance.

GPT-5.3 Codex: Enhanced Coding and Reasoning

GPT-5.3 Codex represents a significant leap forward by combining the coding capabilities of GPT-5.2-Codex with the reasoning abilities of GPT-5.2. This integration results in a 25% performance improvement over its predecessor. The model has been rigorously tested across four benchmarks that evaluate coding, agentic, and real-world capabilities:

SWE-Bench Pro: GPT-5.3 Codex set new industry highs.
Terminal-Bench: Another benchmark where it outperformed existing models.
OSWorld: Demonstrated strong performance.
GDPval: Showed robust results.

Pretraining: The Foundation of Advanced AI

The success of these models can be attributed to the three scaling laws in AI: pretraining, post-training, and test-time scaling. Pretraining is particularly crucial as it forms the bedrock of intelligence for reasoning models. These models use inference to tackle complex queries by leveraging multiple networks working together.

However, training such advanced models from scratch is no small feat. It requires tens of thousands, sometimes hundreds of thousands, of GPUs working in concert. This level of scale necessitates excellence across several dimensions:

World-class accelerators: High-performance GPUs like those in the NVIDIA Hopper and GB200 NVL72 systems.
Advanced networking: Efficient communication between GPUs to ensure smooth data flow.
Optimized software stack: A fully optimized environment to maximize performance.

NVIDIA’s Role in Scaling AI

NVIDIA’s full-stack AI infrastructure plays a pivotal role in enabling these advancements. Compared to the previous generation, the GB200 NVL72 systems delivered 3x faster training performance on the largest model tested in the latest MLPerf Training industry benchmarks. This significant improvement underscores NVIDIA's commitment to delivering high-performance solutions for large-scale AI projects.

Conclusion

GPT-5.2 and GPT-5.3 Codex are prime examples of how leading AI builders leverage NVIDIA’s infrastructure to train and deploy models at scale. These advancements not only push the boundaries of what is possible with AI but also set new standards for performance and capability in professional knowledge work.