
Share
This benchmark pits MiniMax M2.7 against Claude Opus 4.6 in real-world coding tasks, revealing surprising cost-effectiveness without sacrificing performance on bug detection and security.
MiniMax M2.7, launched on March 18, scored a respectable 56.22% on the SWE-Pro benchmark, which is close to Claude Opus 4.6's performance. To see how these models fare in real-world coding tasks, we ran both through three tests using Kilo Code, an AI-assisted coding environment for VS Code. The results are compelling, especially when it comes to cost-effectiveness.
We created three TypeScript codebases to test both models in Kilo Code's Code mode, ensuring they received the same prompts without any hints. Each model was scored independently after all tests were completed.
For this test, we gave both models the following prompt:
"Build a real-time event processing system in TypeScript from the specification in @SPEC.md. Use Hono for the web framework, Prisma with SQLite for the database, Zod for input validation, and ws for WebSocket support."
The spec required 7 components:
Both models successfully implemented all 7 components. However, the score difference came from code organization and test coverage.

Claude Opus 4.6:
MiniMax M2.7:
In this test, both models were tasked with tracing 6 bugs from production log output to their root causes and fixing them. Both models identified and fixed all 6 bugs, but Claude Opus 4.6 provided more detailed explanations and better-documented fixes.
For the security audit, both models had to find and fix 10 planted security vulnerabilities across a team collaboration API. Both models successfully found and fixed all 10 vulnerabilities, with Claude Opus 4.6 again providing more thorough documentation and additional test cases.
The cost difference between MiniMax M2.7 and Claude Opus 4.6 is substantial:
For the coding tasks, MiniMax M2.7 cost a total of $0.27, while Claude Opus 4.6 cost $3.67. This represents a significant cost savings without a substantial loss in quality.
Both MiniMax M2.7 and Claude Opus 4.6 performed well in our
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
23 March 2026
133 articles
Related Articles
Related Articles
More Stories