
Share
Qwen2 debuts with five new model sizes, spearheaded by its colossal 72-billion-parameter variant, offering unparalleled multilingual proficiency and benchmark-leading performance in coding and math tasks.
After months of intensive development, the Qwen team is excited to announce the release of Qwen2, a significant leap forward from its predecessor, Qwen1.5. This new iteration brings enhanced multilingual support, state-of-the-art benchmark performance, and improved capabilities in coding and mathematics. Let's dive into the technical details and what they mean for practitioners.
Qwen2 introduces five different model sizes: Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and the largest, Qwen2-72B. Each model is designed to cater to a range of use cases, from resource-constrained environments to high-performance applications. Here's a breakdown:
All models use Group Query Attention (GQA), which improves inference speed and reduces memory usage. For smaller models, embedding weights are tied to further optimize resource utilization.
One of the standout features of Qwen2 is its expanded multilingual capabilities. In addition to English and Chinese, Qwen2 has been trained on data in 27 additional languages. This makes it a powerful tool for applications requiring broad language coverage, such as translation services, content generation, and cross-lingual information retrieval.

Qwen2 has achieved state-of-the-art performance across various benchmarks. Notably, the Qwen2-72B model excels in coding and mathematics tasks, demonstrating significant improvements over its predecessor. This enhanced performance is crucial for applications like code completion, bug detection, and mathematical problem-solving.
Another key improvement is the extended context length support. The Qwen2-7B-Instruct and Qwen2-72B-Instruct models can handle up to 128K tokens, a substantial increase from the 32K token limit of their base counterparts. This capability is essential for tasks that require understanding long contexts, such as summarizing lengthy documents or generating coherent narratives.
Here's a detailed overview of the Qwen2 models:
| Models | Qwen2-0.5B | Qwen2-1.5B | Qwen2-7B | Qwen2-57B-A14B | Qwen2-72B | | --- | --- | --- | --- | --- | --- | | # Params | 0.49B | 1.54B | 7.07B | 57.41B | 72.71B | | # Non-Emb Params | 0.35B | 1.31B | 5.98B | 56.32B | 70.21B | | GQA | True | True | True | True | True | | Tie Embedding | True | True | False | False | False | | Context Length | 32K | 32K | 128K | 64K | 128K |
Tags
Original Sources
↗ https://qwenlm.github.io/blog/qwen2/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
7 June 2024
133 articles
Related Articles
Related Articles
More Stories