
Share
Chinese AI powerhouse MiniMax is set to revolutionize long-context decoding with its upcoming M3 model, boasting a custom sub-quadratic framework and a significant speed boost.
Among the many Chinese AI companies vying for global market share, MiniMax stands out for its commitment to providing cutting-edge intelligence across various modalities. The company's Hailuo series, for instance, excels in video generation under permissive open-source licenses. Now, MiniMax is raising the bar once again with a detailed technical report on its M2 series of language models and a sneak peek at the upcoming M3 model.
The M2 series, which includes the popular M2, M2.5, and M2.7 models, has consistently achieved top benchmarks in open-source AI performance. Despite being eclipsed by other Chinese labs like DeepSeek and Xiaomi, MiniMax's new report offers valuable insights into its engineering innovations and design approaches.
The core of the M3 model is a novel sparse attention mechanism that significantly accelerates decoding for long contexts. According to MiniMax, this approach can boost response speed by up to 15.6 times at one million tokens, making ultra-long-context AI agent deployment economically viable.

Adina Yakup of Hugging Face noted on X, "Beyond the benchmarks, they’ve done some really solid work on MoE efficiency and agent-oriented design. Excited to see where M3 goes next!"
The upcoming M3 model from MiniMax is poised to set new standards in long-context decoding. With its innovative sparse attention mechanism and efficient parameter activation, the M3 model offers a compelling solution for businesses looking to deploy advanced AI agents at scale.
Tags
Original Sources
MiniMax teases M3 model with new sparse attention mechanism, 15.6X long-context response speed boost
↗ https://venturebeat.com/technology/minimax-teases-upcoming-m3-model-with-new-sparse-attention-mechanism-and-15-6x-response-speed-boost
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
3 June 2026
133 articles
Related Articles
Related Articles
More Stories