Qwen3.5: A Native Multimodal Agent with Efficient Sparse Mixture-of-Experts and Linear Attention

Models & Research

The Engineer

17 Feb 2026 · 2 min read

Qwen3.5 introduces a groundbreaking hybrid architecture that combines linear attention and sparse mixture-of-experts, making it highly efficient and capable of advanced multimodal tasks.

We're excited to announce the official release of Qwen3.5, particularly the open-weight model Qwen3.5-397B-A17B. This new version marks a significant step forward in multimodal AI, offering impressive capabilities across reasoning, coding, agent functionalities, and multimodal understanding. Here's what you need to know:

Technical Overview

Hybrid Architecture

Linear Attention via Gated Delta Networks: Qwen3.5 leverages linear attention mechanisms, which are more efficient for long sequences compared to traditional self-attention (O(N) vs O(N^2)).
Sparse Mixture-of-Experts (MoE): The model uses a sparse MoE architecture where only 17 billion parameters out of the total 397 billion are activated per forward pass. This approach significantly reduces computational overhead while maintaining high performance.

Language and Dialect Support

Expanded from 119 to 201 languages and dialects, making Qwen3.5 more accessible and useful for a global audience.

Performance Benchmarks

Qwen3.5-397B-A17B was evaluated against leading models in various tasks, demonstrating competitive and often superior performance:

Knowledge Tasks

MMLU-Pro: 87.8 (vs GPT5.2: 87.4)
MMLU-Redux: 94.9 (vs Gemini-3 Pro: 95.9)
SuperGPQA: 70.4 (vs K2.5-1T-A32B: 69.2)

Instruction Following

IFEval: 92.6 (vs Claude 4.5 Opus: 90.9)
IFBench: 76.5 (vs GPT5.2: 75.4)
MultiChallenge: 67.6 (vs K2.5-1T-A32B: 62.7)

Long Context

AA-LCR: 68.7 (vs K2.5-1T-A32B: 70.0)
LongBench v2: 63.2 (vs Gemini-3 Pro: 68.2)

STEM Tasks

GPQA: 88.4 (vs GPT5.2: 92.4)
HLE: 28.7 (vs Claude 4.5 Opus: 30.8)
HLE-Verified: 37.6 (vs Claude 4.5 Opus: 38.8)

Reasoning

LiveCodeBench v6: 83.6 (vs GPT5.2: 87.7)
HMMT Feb 25: 94.8 (vs Gemini-3 Pro: 97.3)
HMMT Nov 25: 94.8 (vs Gemini-3 Pro: 93.3)

Hosted Model: Qwen3.5-Plus

For those who prefer a hosted solution, Qwen3.5-Plus is available via Alibaba Cloud Model Studio:

1M Context Window: By default, it supports long-context tasks efficiently.
Built-in Tools and Adaptive Tool Use: Comes with official tools to enhance its capabilities.

Why It Matters

For practitioners, Qwen3.5 represents a significant advancement in multimodal AI. The combination of linear attention and sparse MoE ensures that the model is both powerful and efficient, making it suitable for real-world applications. Whether you're working on language understanding, coding tasks, or building complex agents, Qwen3.5 offers a robust foundation.

Getting Started

You can try out Qwen3.5 through various platforms:

QWEN CHAT
GitHub
[Hugging Face](https://huggingface.co/Qwen/Qwen3