Qwen2.5-1M: Open-Sourcing 1M-Token Context Models and an Efficient Inference Framework

Models & Research

The Engineer

27 Jan 2025 · 3 min read

The Qwen team unveils the open-source release of Qwen2.5-1M, featuring two new models with 1 million token context support and an efficient inference framework, marking a significant leap in AI model scalability.

Two months after the release of Qwen2.5-Turbo, which extended context length support to one million tokens, the Qwen team is back with a significant update: the open-source release of Qwen2.5-1M models and their corresponding inference framework. This new series includes two checkpoints, Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, both designed to handle 1M-token contexts. Here’s what you need to know:

Opensource Models

The Qwen2.5-1M series introduces two new models:

Qwen2.5-7B-Instruct-1M: A 7 billion parameter model.
Qwen2.5-14B-Instruct-1M: A 14 billion parameter model.

These models are the first in our open-source lineup to support context lengths of up to one million tokens. This is a significant leap from previous versions, which typically supported contexts of around 2048 tokens or less.

Inference Framework

To ensure efficient deployment of these large-context models, we’ve fully open-sourced an inference framework based on vLLM. This framework integrates sparse attention methods, which are crucial for handling long sequences. Here’s what you can expect:

Performance Boost: The framework processes 1M-token inputs 3x to 7x faster compared to previous versions.
Integration with Sparse Attention: This integration is key to managing the computational complexity of processing such long contexts.

Technical Report

We’ve also published a technical report that delves into the design insights for both training and inference frameworks, as well as ablation experiments. This document is a valuable resource for anyone looking to understand the technical underpinnings of Qwen2.5-1M.

Online Demos

You can experience the Qwen2.5-1M models firsthand through our online demos:

Huggingface: Qwen2.5-1M Demo
Modelscope: Qwen2.5-1M Demo

Model Performance

Let’s dive into the performance of the Qwen2.5-1M series models, focusing on both long-context and short text tasks.

Long-Context Tasks

The Qwen2.5-1M models are designed to excel in handling long sequences. Here are some key points:

Evaluation: We evaluated these models on various long-context tasks, such as summarization, question answering, and document understanding.
Results: The models demonstrated strong performance, maintaining coherence and relevance even with very long inputs.

Short Text Tasks

While the primary focus is on long contexts, the Qwen2.5-1M series also performs well in short text tasks:

Versatility: These models can handle a wide range of tasks, from generating concise responses to complex reasoning.
Consistency: They maintain consistent performance across different task types and input lengths.

Additional Resources

For those interested in exploring more, we’ve recently introduced Qwen Chat, an advanced AI assistant that leverages the Qwen2.5-Turbo model. Qwen Chat can:

Engage in natural conversations.
Write code.
Perform searches.
Generate images and videos.
Utilize various tools.

Notably, Qwen Chat supports long-context processing with a context length of up to 1M tokens, making it a powerful tool for a variety of applications.

Conclusion

The release of Qwen2.5-1M marks a significant milestone in the development of large-context models. By open-sourcing these models and their inference framework,