Codestral 25.01: A Major Upgrade for High-Speed Code Generation and FIM Tasks

Tools & Engineering

The Engineer

14 Jan 2025 · 2 min read

Codestral 25.01 from Mistral AI slashes coding time with an optimized architecture and advanced tokenizer, excelling in FIM tasks and setting new benchmarks across various programming languages.

Mistral AI has just released a significant upgrade to their state-of-the-art (SOTA) coding model, Codestral. The new version, Codestral 25.01, introduces an optimized architecture and an improved tokenizer, making it faster and more efficient for code generation tasks. This update is particularly notable for its performance in fill-in-the-middle (FIM) scenarios, where it outperforms leading models across multiple programming languages.

What Changed Technically?

Architecture Optimization: Codestral 25.01 features a more efficient architecture that reduces computational overhead and improves latency.
Improved Tokenizer: The new tokenizer is better at handling code-specific syntax, leading to faster and more accurate code generation.
Performance Gains: These changes result in the model generating and completing code about 2 times faster than its predecessor.

Why It Matters for Practitioners

For developers, these improvements translate into significant productivity boosts. Codestral 25.01 can handle a wide range of tasks, including:

Fill-in-the-Middle (FIM): Completing partial code snippets with high accuracy.
Code Correction: Identifying and fixing errors in existing code.
Test Generation: Automatically generating test cases for new or modified code.

Benchmarks

To validate these claims, Mistral AI benchmarked Codestral 25.01 against leading sub-100B parameter coding models that are widely considered best-in-class for FIM tasks. Here’s a breakdown of the results:

Overview

| Model | Context Length | HumanEval (Python) | MBPP (SQL) | CruxEval | LiveCodeBench | RepoBench | Spider | CanItEdit | HumanEval (Average) | HumanEvalFIM (Average) | |----------------------|----------------|--------------------|------------|----------|---------------|-----------|--------|-----------|---------------------|------------------------| | Codestral-2501 | 256k | 86.6% | 80.2% | 55.5%| 37.9% | 38.0% | 66.5% | 50.5% | 71.4% | 85.9% | | Codestral-2405 22B | 32k | 81.1% | 78.2% | 51.3% | 31.5% | 34.0% | 63.5% | 50.5% | 65.6% | 82.1% | | Codellama 70B instruct| 4k | 67.1% | 70.8% | 47.3% | 20.0% | 11.4% | 37.0% | 29.5% | 55.3% | - | | DeepSeek Coder 33B instruct | 16k | 77.4% | 80.2% | 49.5% | 27.0% | 28.4% | 60.0% | 47.6% | 65.1% | 85.3% | | DeepSeek Coder V2 lite| 128k | 83.5% | 83.2% | 49.7% | 28.1% | 20.0% | 72.0%| 41.0% | 65.9% | 84.1% |

Per-Language Performance