
Share
Training massive models like Qwen2.5-Max on 20 trillion tokens pushes the boundaries of AI scalability, offering new insights into efficient MoE architectures and fine-tuning techniques that could revolutionize natural language processing.
January 28, 2025 · 3 min read · 561 words · By Qwen Team
The quest for more intelligent models often involves scaling both data and model sizes. However, the practical challenges of training extremely large models-whether dense or Mixture-of-Expert (MoE)-remain significant. Recent advancements in this area, such as DeepSeek V3's detailed disclosures, have provided valuable insights. In parallel, we've been developing Qwen2.5-Max, a large-scale MoE model trained on over 20 trillion tokens and further refined with Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). Today, we're excited to share the performance results of Qwen2.5-Max and announce its availability via Alibaba Cloud's API.
Qwen2.5-Max has been evaluated against leading models, both proprietary and open-weight, across a range of benchmarks that are crucial for assessing model intelligence. These include:
We focus on the performance of instruct models, which are suitable for downstream applications like chat and coding. Here’s how Qwen2.5-Max stacks up against state-of-the-art models:
Qwen2.5-Max outperforms DeepSeek V3 in several key benchmarks:
Competitive Results:
When comparing base models, we face limitations due to proprietary restrictions on models like GPT-4o and Claude-3.5-Sonnet. Therefore, our evaluations include:

Qwen2.5-Max leverages the MoE architecture to efficiently scale to a large number of parameters without incurring excessive computational costs. Key technical details include:
Training Data:
Model Architecture:
Quantization:
Benchmarks:
Qwen2.5-Max is now available via Alibaba Cloud's API, making it accessible for developers and researchers to integrate into their applications. We also invite you to explore Qwen2.5-Max on Qwen Chat for a hands-on experience.
Looking ahead, we will continue to refine and expand Qwen2.5-Max's capabilities, focusing on further improvements in efficiency, performance, and usability. We believe that large-scale MoE models like Qwen2.5-Max represent a significant step forward in the quest for more intelligent AI systems.
Tags
Original Sources
↗ https://qwenlm.github.io/blog/qwen2.5-max/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
30 January 2025
88 articles
Related Articles
Related Articles
More Stories