
Share
Codestral 25.01 from Mistral AI slashes coding time with an optimized architecture and advanced tokenizer, excelling in FIM tasks and setting new benchmarks across various programming languages.
Mistral AI has just released a significant upgrade to their state-of-the-art (SOTA) coding model, Codestral. The new version, Codestral 25.01, introduces an optimized architecture and an improved tokenizer, making it faster and more efficient for code generation tasks. This update is particularly notable for its performance in fill-in-the-middle (FIM) scenarios, where it outperforms leading models across multiple programming languages.
For developers, these improvements translate into significant productivity boosts. Codestral 25.01 can handle a wide range of tasks, including:

To validate these claims, Mistral AI benchmarked Codestral 25.01 against leading sub-100B parameter coding models that are widely considered best-in-class for FIM tasks. Here’s a breakdown of the results:
| Model | Context Length | HumanEval (Python) | MBPP (SQL) | CruxEval | LiveCodeBench | RepoBench | Spider | CanItEdit | HumanEval (Average) | HumanEvalFIM (Average) | |----------------------|----------------|--------------------|------------|----------|---------------|-----------|--------|-----------|---------------------|------------------------| | Codestral-2501 | 256k | 86.6% | 80.2% | 55.5%| 37.9% | 38.0% | 66.5% | 50.5% | 71.4% | 85.9% | | Codestral-2405 22B | 32k | 81.1% | 78.2% | 51.3% | 31.5% | 34.0% | 63.5% | 50.5% | 65.6% | 82.1% | | Codellama 70B instruct| 4k | 67.1% | 70.8% | 47.3% | 20.0% | 11.4% | 37.0% | 29.5% | 55.3% | - | | DeepSeek Coder 33B instruct | 16k | 77.4% | 80.2% | 49.5% | 27.0% | 28.4% | 60.0% | 47.6% | 65.1% | 85.3% | | DeepSeek Coder V2 lite| 128k | 83.5% | 83.2% | 49.7% | 28.1% | 20.0% | 72.0%| 41.0% | 65.9% | 84.1% |
| Model | HumanEval Python | HumanEval C++ | HumanEval Java | HumanEval Javascript | Human
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
14 January 2025
133 articles
Related Articles
Related Articles
More Stories