
Share
Jeremy Berman's breakthrough uses evolutionary test-time compute to enhance multi-agent collaboration, boosting efficiency by 25 times and setting a new benchmark in AI research with a remarkable score on the ARC-AGI challenge.
In a significant update to the ongoing AI research landscape, Jeremy Berman has once again claimed the top spot on the ARC-AGI benchmark. This time around, his approach not only outperforms previous records but also introduces a novel method that leverages multi-agent collaboration and evolutionary test-time compute (ETTC). The new system achieves a 79.6% score on ARC v1 at $8.42 per task-25 times more efficient than the previous best-and sets a new state-of-the-art (SoTA) of 29.4% on ARC v2.
Berman’s latest achievement builds upon his earlier work with ETTC, which he used to win ARC-AGI v1 last December. However, this time he replaced Python functions with plain English instructions, a move that significantly enhances the system's flexibility and efficiency.

The success of Berman’s approach highlights several important points:
Berman’s work suggests that combining multi-agent collaboration with evolutionary test-time compute can lead to significant advancements in AI reasoning. As LLMs continue to improve, this approach could pave the way for more robust and versatile AI systems capable of tackling a wide range of complex tasks.
Tags
Original Sources
↗ https://jeremyberman.substack.com/p/how-i-got-the-highest-score-on-arc-agi-again?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
17 September 2025
88 articles
Related Articles
Related Articles
More Stories