
Share
Qwen2-Math tackles complex mathematical reasoning with unprecedented accuracy, surpassing both open-source and proprietary competitors, thanks to specialized training focused on arithmetic and logic.
The Qwen team has been hard at work over the past year, focusing on enhancing the reasoning capabilities of large language models (LLMs) for solving arithmetic and mathematical problems. Today, we are excited to introduce Qwen2-Math, a series of specialized math LLMs that significantly outperform existing open-source and even closed-source models like GPT-4o. This new suite includes Qwen2-Math and its instruction-tuned variants, Qwen2-Math-Instruct-1.5B/7B/72B.
The base models of Qwen2-Math are initialized with the Qwen2 LLMs in three sizes: 1.5B, 7B, and 72B parameters. These models are then pretrained on a carefully curated Mathematics-specific Corpus. The corpus includes:
The Qwen2-Math base models are evaluated on several widely used benchmarks:
English math benchmarks:
Chinese math benchmarks:
All evaluations are conducted using few-shot chain-of-thought prompting to assess the models' reasoning capabilities.

To further enhance the performance of Qwen2-Math, we developed instruction-tuned variants. The process involves:
The largest model, Qwen2-Math-72B-Instruct, demonstrates superior performance across all benchmarks compared to state-of-the-art models:
For the Chinese benchmarks, Qwen2-Math also excels:
Qwen2-Math represents a significant step forward in enhancing the mathematical reasoning capabilities of LLMs. By focusing on specialized training and instruction tuning, we have created models that can effectively solve complex mathematical problems, outperforming even leading commercial models. We hope these advancements will contribute to the broader AI community and help solve real-world challenges.
Tags
Original Sources
↗ https://qwenlm.github.io/blog/qwen2-math/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
9 August 2024
133 articles
Related Articles
Related Articles
More Stories