
Share
The METR chart tracks AI's growing ability to tackle complex programming tasks, from simple fixes to intricate coding challenges, revealing exponential leaps in capability with each new model release.
If you've been following AI advancements over the past year, you’ve likely come across the famous "METR chart." METR, which stands for Model Evaluation and Threat Research, is a group based in Berkeley, California. This particular chart has become their signature, comparing AI models based on the complexity of software engineering tasks they can complete. Complexity is measured by how long it takes a human programmer to perform the same task.
Here’s a quick breakdown of the key data points:
The most striking figure is the estimate for Claude Opus 4.6, which is twice as long as the previous leader, GPT-5.2, released just two months earlier. This exponential progress has significantly contributed to the perception of accelerating AI development in recent months.
The METR chart uses a logarithmic scale, which means a straight line indicates exponential growth. While this visual representation is powerful, it also introduces some complexities:
The types of tasks used to evaluate AI models have evolved:

Benchmarking AI performance is becoming more complex:
For software engineers and researchers, the challenges in measuring AI performance have several implications:
The METR chart has been instrumental in highlighting the rapid progress of AI models. However, as we move into an era where tasks become increasingly complex and diverse, the methods for measuring performance must evolve. Staying ahead of these changes will be crucial for both advancing the field and ensuring that AI continues to serve human needs effectively.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
3 April 2026
133 articles
Related Articles
Related Articles
More Stories