
Share
The Qwen team unveils the open-source release of Qwen2.5-1M, featuring two new models with 1 million token context support and an efficient inference framework, marking a significant leap in AI model scalability.
Two months after the release of Qwen2.5-Turbo, which extended context length support to one million tokens, the Qwen team is back with a significant update: the open-source release of Qwen2.5-1M models and their corresponding inference framework. This new series includes two checkpoints, Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, both designed to handle 1M-token contexts. Here’s what you need to know:
The Qwen2.5-1M series introduces two new models:
These models are the first in our open-source lineup to support context lengths of up to one million tokens. This is a significant leap from previous versions, which typically supported contexts of around 2048 tokens or less.
To ensure efficient deployment of these large-context models, we’ve fully open-sourced an inference framework based on vLLM. This framework integrates sparse attention methods, which are crucial for handling long sequences. Here’s what you can expect:
We’ve also published a technical report that delves into the design insights for both training and inference frameworks, as well as ablation experiments. This document is a valuable resource for anyone looking to understand the technical underpinnings of Qwen2.5-1M.
You can experience the Qwen2.5-1M models firsthand through our online demos:

Let’s dive into the performance of the Qwen2.5-1M series models, focusing on both long-context and short text tasks.
The Qwen2.5-1M models are designed to excel in handling long sequences. Here are some key points:
While the primary focus is on long contexts, the Qwen2.5-1M series also performs well in short text tasks:
For those interested in exploring more, we’ve recently introduced Qwen Chat, an advanced AI assistant that leverages the Qwen2.5-Turbo model. Qwen Chat can:
Notably, Qwen Chat supports long-context processing with a context length of up to 1M tokens, making it a powerful tool for a variety of applications.
The release of Qwen2.5-1M marks a significant milestone in the development of large-context models. By open-sourcing these models and their inference framework,
Tags
Original Sources
↗ https://qwenlm.github.io/blog/qwen2.5-1m/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
27 January 2025
133 articles
Related Articles
Related Articles
More Stories