Alibaba's Qwen3.5 Small Models Outperform OpenAI’s GPT-OSS-120B with Superior Multimodal Capabilities

Models & Research

The Engineer

3 Mar 2026 · 3 min read

Alibaba's compact Qwen3.5 models defy size limitations, outstripping OpenAI’s larger GPT-OSS with superior multimodal skills and efficiency for edge devices, redefining AI performance standards.

Alibaba’s Qwen Team, a leading AI research group within the e-commerce giant, has unveiled its latest batch of open-source models, the Qwen3.5 Small Model Series. These new additions are designed to be lightweight and efficient, making them ideal for edge devices where battery life and performance are critical. Despite their smaller size, these models boast impressive capabilities, particularly in multimodal tasks and reasoning, outperforming larger models like OpenAI’s GPT-OSS-120B.

Key Models and Their Capabilities

The Qwen3.5 Small Model Series includes:

Qwen3.5-0.8B & 2B: These are optimized for "tiny" and "fast" performance, making them perfect for prototyping and deployment on resource-constrained devices. They are ideal for applications where battery life is a significant concern.
Qwen3.5-4B: This model serves as a strong multimodal base for lightweight agents. It supports a 262,144 token context window, which is crucial for handling long sequences of text and other data types.
Qwen3.5-9B: A compact reasoning model that outperforms OpenAI’s GPT-OSS-120B on several key benchmarks, including multilingual knowledge and graduate-level reasoning. This is particularly noteworthy given that the Qwen3.5-9B has only 9 billion parameters compared to the 120 billion in GPT-OSS-120B.

Technical Foundation: Efficient Hybrid Architecture

The technical underpinnings of the Qwen3.5 small models represent a significant departure from traditional Transformer architectures. Alibaba’s researchers have developed an Efficient Hybrid Architecture that combines Gated Delta Networks (a form of linear attention) with sparse Mixture-of-Experts (MoE).

Gated Delta Networks: These networks help address the "memory wall" issue, which often limits the performance of small models. By using linear attention, these networks achieve higher throughput and significantly lower latency during inference.
Sparse Mixture-of-Experts (MoE): This technique allows the model to dynamically allocate computational resources to different parts of the network based on the input, leading to more efficient use of resources.

Native Multimodality

One of the standout features of Qwen3.5 is its native multimodal capability. Unlike previous models that often "bolted on" a vision encoder to a text model, Qwen3.5 was trained using early fusion on multimodal tokens. This means that during training, both visual and textual data were processed together from the start. As a result, the 4B and 9B models exhibit superior performance in tasks involving multiple modalities, such as image captioning and visual question answering.

Availability and Licensing

The weights for these models are available under the Apache 2.0 license, making them suitable for enterprise and commercial use. You can access the models on popular platforms like Hugging Face and ModelScope. This open-source approach encourages collaboration and innovation, allowing developers to customize and build upon these models as needed.

Benchmarks and Performance

To put the performance of Qwen3.5 in perspective, consider its benchmarks:

Qwen3.5-9B outperforms OpenAI’s GPT-OSS-120B on key third-party benchmarks, including:
- Multilingual Knowledge: Demonstrates superior understanding and handling of multiple languages.
- Graduate-Level Reasoning: Exhibits advanced reasoning capabilities, making it suitable for complex tasks.

These models are among the smallest general-purpose models recently released by any lab globally. They are comparable in size to MIT offshoot LiquidAI’s LFM2 series, which also have several hundred million or billion parameters, rather than the trillion parameters used in flagship models from OpenAI, Anthropic, and Google's Gemini series.

Conclusion

Alibaba’s Qwen3.5 Small Model Series represents a significant advancement in AI research, particularly for applications