
Share
Mixtral AI’s new Mixtral 8x22B slashes costs with sparse activation patterns, activating just 39 billion out of 141 billion parameters to deliver top-notch performance in a multilingual setting.
Mistral AI has just released Mixtral 8x22B, their latest open-source model that sets a new standard for performance and efficiency. This sparse Mixture-of-Experts (SMoE) model uses only 39 billion active parameters out of a total of 141 billion, making it highly cost-efficient while delivering top-tier performance.
Mixtral 8x22B leverages sparse activation patterns, which means that only a subset of the model's parameters is activated during inference. This approach significantly reduces computational requirements and makes the model faster than dense models with similar sizes (e.g., 70 billion parameter models). The key benefits include:
The model is fluent in multiple languages, including English, French, Italian, German, and Spanish. This multilingual capability is crucial for applications that need to handle diverse user bases or content.
One of the standout features of Mixtral 8x22B is its native ability to call functions. Combined with the constrained output mode on la Plateforme, this enables developers to build sophisticated applications and modernize tech stacks more efficiently.
With a context window of up to 64,000 tokens, Mixtral 8x22B can process and recall information from large documents with high precision. This is particularly useful for tasks that require understanding long-form content, such as summarization or question-answering.
Mistral AI believes in the power of openness to drive innovation and collaboration. Therefore, Mixtral 8x22B is released under the Apache 2.0 license, which is one of the most permissive open-source licenses. This allows anyone to use, modify, and distribute the model without restrictions.
Mixtral AI has a track record of building models that offer unmatched cost efficiency for their sizes. Mixtral 8x22B continues this tradition by outperforming other dense models while maintaining a lower computational footprint. Here’s how it stacks up:

Mixtral 8x22B excels in reasoning tasks, as demonstrated by its performance on various benchmarks:
The model's multilingual capabilities are particularly strong, outperforming LLaMA 2 70B on several benchmarks:
These benchmarks cover French, German, Spanish, and Italian.
In addition to its linguistic prowess, Mixtral 8x22B excels in coding and mathematics tasks. It outperforms other open models on popular benchmarks for these domains.
Mixtral 8x22B is a significant step forward in the development of efficient and capable AI models. Its sparse activation patterns, multilingual fluency, native function calling, and large context window make it a versatile tool for a wide range of applications. The model's open-source nature under Apache 2.0 ensures that it can be widely adopted and adapted by developers and researchers alike.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
18 April 2024
88 articles
Related Articles
Related Articles
More Stories