HEADLINE: OpenAI’s gpt-oss Models Offer Efficiency and Competitive Intelligence for Smaller Footprints

Models & Research

The Engineer

7 Aug 2025 · 3 min read

OpenAI's new gpt-oss models strike a balance between computational efficiency and cognitive prowess, offering developers powerful AI tools without the need for massive computing resources.

OpenAI has recently released two versions of its open-source models, gpt-oss-120b and gpt-oss-20b, which are making waves in the AI community. These models offer a balance between intelligence and efficiency, making them particularly appealing for practitioners looking to deploy powerful models with limited resources.

Technical Overview

gpt-oss-120b:

Total Parameters: 116.8B
Active Parameters: 5.1B
Intelligence Index Score: 58

gpt-oss-20b:

Total Parameters: 20.9B
Active Parameters: 3.6B
Intelligence Index Score: 48

Both models are released in MXFP4 precision, which significantly reduces their size and memory requirements:

gpt-oss-120b: 60.8GB
gpt-oss-20b: 12.8GB

This means the gpt-oss-120b can be run on a single NVIDIA H100, and the gpt-oss-20b is easily deployable on consumer GPUs or laptops with more than 16GB of RAM.

Efficiency and Speed

The efficiency of these models is largely due to their sparse architecture. The gpt-oss-120b activates only 4.4% of its total parameters per forward pass, which translates to faster inference times. For instance, the 20B model can generate dozens of output tokens per second on recent MacBooks, thanks to its relatively small proportion of active parameters (3.6B out of 20.9B).

Intelligence and Performance

Both models score impressively well for their size and sparsity:

gpt-oss-120b: Scores higher than o3-mini but trails behind o4-mini and o3.
gpt-oss-20b: Performs similarly across various evaluations, indicating no significant weaknesses.

The gpt-oss-120b stands out as the most intelligent model that can be run on a single H100, while the 20B version is the smartest option for consumer GPUs. This makes them highly attractive for practitioners looking to balance performance and resource constraints.

Comparison with Other Models

While gpt-oss-120b doesn't surpass DeepSeek R1 (score of 59) or Qwen3 235B (score of 64), it is significantly more efficient:

DeepSeek R1:
- Total Parameters: 671B
- Active Parameters: 37B
- Precision: FP8
- Size: Over 10x larger than gpt-oss-120b
Qwen3 235B:
- Total Parameters: 235B
- Active Parameters: Not specified
- Precision: Likely higher than MXFP4

Both gpt-oss models are text-only, similar to competing models from DeepSeek, Alibaba, and others.

Architecture Details

The gpt-oss models use a standard Mixture of Experts (MoE) architecture:

Router: Selects the top 4 experts for each token generation.
Layers:
- gpt-oss-120b: 36 layers
- gpt-oss-20b: 24 layers
Query Heads: 64 per layer
Attention Mechanism: Grouped Query Attention with 8 KV heads
Embeddings: Rotary embeddings and YaRN are used to extend the context window to 128k tokens

Conclusion

OpenAI's gpt-oss models represent a significant step forward in balancing intelligence and efficiency. The gpt-oss-120b and gpt-oss-20b offer competitive performance while being highly resource-efficient, making them ideal for a wide range of applications from high-end servers to consumer devices.