OpenAI Launches GPT-4.1 with Enhanced Coding, Instruction Following, and Long Context Capabilities

Products & Applications

The Engineer

15 Apr 2025 · 3 min read

GPT-4.1 boasts major leaps in coding efficiency and instruction adherence, alongside a vast increase in contextual memory, positioning it as the pinnacle of conversational AI with unparalleled versatility and depth.

Today, OpenAI has announced the launch of three new models in their API lineup: GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano. These models bring significant improvements across coding, instruction following, and long context understanding, along with a larger context window supporting up to 1 million tokens. The refreshed knowledge cutoff for these models is June 2024, making them more current and relevant.

Key Improvements

Coding: GPT-4.1 scores 54.6% on the SWE-bench Verified benchmark, a 21.4% absolute improvement over GPT-4o and a 26.6% absolute improvement over GPT-4.5. This makes it one of the leading models for coding tasks.
Instruction Following: On Scale’s MultiChallenge benchmark, GPT-4.1 scores 38.3%, representing a 10.5% absolute increase over GPT-4o.
Long Context Understanding: On the Video-MME benchmark, GPT-4.1 sets a new state-of-the-art result with a score of 72.0% in the long, no subtitles category, a 6.7% absolute improvement over GPT-4o.

Real-World Utility

While benchmarks are useful for evaluating performance, OpenAI focused on real-world utility during the training process. Close collaboration with the developer community helped optimize these models for practical applications. This approach ensures that the new models not only perform well in tests but also excel in everyday tasks.

Cost and Performance

The GPT-4.1 model family offers exceptional performance at a lower cost, pushing the boundaries of efficiency at every point on the latency curve.

GPT-4.1 Mini: This smaller model is a significant leap forward, often outperforming GPT-4o in various benchmarks. It matches or exceeds GPT-4o in intelligence evaluations while reducing latency by nearly half and cutting costs by 83%.
GPT-4.1 Nano: The fastest and cheapest model available, GPT-4.1 nano is ideal for tasks requiring low latency. Despite its small size, it supports a 1 million token context window and scores impressively on benchmarks:
- MMLU: 80.1%
- GPQA: 50.3%
- Aider Polyglot Coding: 9.8%

Use Cases

The improvements in instruction following reliability and long context comprehension make GPT-4.1 models particularly effective for powering agents-systems that can independently accomplish tasks on behalf of users. These enhancements are crucial for applications like chatbots, virtual assistants, and automated workflows.

Implementation Details

Context Window: The 1 million token context window is a significant leap from previous models, allowing for more comprehensive understanding and use of long-form content.
Latency and Cost: GPT-4.1 mini and nano are optimized for low latency and cost efficiency, making them suitable for real-time applications and budget-conscious projects.

Try It Out

You can try out the new models in OpenAI’s Playground to see how they perform on your specific tasks.