
Share
This innovative algorithm enables diverse AI models to work together, boosting their collective intelligence and surpassing solo performance on complex benchmarks, marking a significant leap in collaborative machine learning.
At Sakana AI, we've been exploring how to leverage the collective intelligence of multiple AI models to achieve better performance during inference. Building on our 2024 research on evolutionary model merging, which combined existing open-source models through evolutionary computation and model merging, we now present AB-MCTS (Adaptive Branching Monte Carlo Tree Search). This new algorithm allows multiple frontier AI models to cooperate effectively, significantly outperforming individual models on the ARC-AGI-2 benchmark.
AB-MCTS is a novel inference-time scaling algorithm designed to enable multiple AI models to work together in a coordinated manner. Here’s how it works:
The ability to combine multiple frontier AI models during inference is significant for several reasons:
We tested AB-MCTS on the ARC-AGI-2 benchmark, which is designed to evaluate AI systems' ability to solve complex reasoning tasks. The combination of o4-mini, Gemini-2.5-Pro, and R1-0528-all current frontier AI models-achieved impressive results:

For those interested in the technical details, here’s a breakdown of the AB-MCTS implementation:
AB-MCTS represents a significant step forward in leveraging collective intelligence among AI models. By enabling multiple frontier models to cooperate during inference, we can achieve better performance and solve more complex problems. The initial results on the ARC-AGI-2 benchmark are promising, and we look forward to further advancements in this area.
For more details, you can check out our paper and implementation:
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
4 July 2025
133 articles
Related Articles
Related Articles
More Stories