
Share
OpenAI's GPT-5.2 and GPT-5.3 Codex, trained on NVIDIA’s cutting-edge hardware, mark significant strides in AI capabilities, setting new benchmarks and pushing the boundaries of professional knowledge work.
In December, OpenAI unveiled the latest in its series of highly capable models for professional knowledge work with the launch of GPT-5.2. This model was trained and deployed using NVIDIA’s state-of-the-art infrastructure, including the NVIDIA Hopper architecture and GB200 NVL72 systems. Following this, in February, OpenAI introduced GPT-5.3 Codex, an agentic coding model designed to help build itself, which was also trained and served entirely on GB200 NVL72.
GPT-5.2 has set new benchmarks for industry standards, achieving top scores in critical evaluations like GPQA-Diamond, AIME 2025, and Tau2 Telecom. These benchmarks are crucial for assessing the model’s ability to handle complex tasks that require advanced reasoning and problem-solving skills. On ARC-AGI-2, a benchmark specifically designed to evaluate the development of Artificial General Intelligence (AGI), GPT-5.2 has set a new bar for state-of-the-art performance.
GPT-5.3 Codex represents a significant leap forward by combining the coding capabilities of GPT-5.2-Codex with the reasoning abilities of GPT-5.2. This integration results in a 25% performance improvement over its predecessor. The model has been rigorously tested across four benchmarks that evaluate coding, agentic, and real-world capabilities:

The success of these models can be attributed to the three scaling laws in AI: pretraining, post-training, and test-time scaling. Pretraining is particularly crucial as it forms the bedrock of intelligence for reasoning models. These models use inference to tackle complex queries by leveraging multiple networks working together.
However, training such advanced models from scratch is no small feat. It requires tens of thousands, sometimes hundreds of thousands, of GPUs working in concert. This level of scale necessitates excellence across several dimensions:
NVIDIA’s full-stack AI infrastructure plays a pivotal role in enabling these advancements. Compared to the previous generation, the GB200 NVL72 systems delivered 3x faster training performance on the largest model tested in the latest MLPerf Training industry benchmarks. This significant improvement underscores NVIDIA's commitment to delivering high-performance solutions for large-scale AI projects.
GPT-5.2 and GPT-5.3 Codex are prime examples of how leading AI builders leverage NVIDIA’s infrastructure to train and deploy models at scale. These advancements not only push the boundaries of what is possible with AI but also set new standards for performance and capability in professional knowledge work.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
16 December 2025
88 articles
Related Articles
Related Articles
More Stories