
Share
This article explores how Anthropic enhanced Claude’s capabilities for long-running autonomous tasks through a novel multi-agent harness design, inspired by GAN principles, to ensure consistent performance over extended periods.
At Anthropic, we're always pushing the boundaries of what AI can do. One of our key focuses has been on developing long-running autonomous applications-software that can operate without human intervention over extended periods. This article delves into how we improved Claude's performance in frontend design and long-running application development using a multi-agent harness inspired by Generative Adversarial Networks (GANs).
Over the past few months, I've been tackling two interconnected problems: getting Claude to produce high-quality frontend designs and building complete applications autonomously. Our earlier work on frontend design skills and long-running coding agent harnesses showed significant improvements through prompt engineering and harness design. However, both approaches eventually hit performance ceilings.
To overcome these limitations, I drew inspiration from GANs, which are known for their ability to generate high-quality outputs by pitting a generator against an evaluator. In our context, this meant creating a multi-agent system where:
For frontend design, the challenge was to turn subjective judgments into concrete, gradable terms. We developed a set of criteria that could objectively evaluate aspects like layout, color harmony, and user experience. This evaluator provided feedback to the generator, helping it refine its designs over multiple iterations.
Applying this multi-agent approach to long-running autonomous coding involved carrying over lessons from our earlier harness work:

The final architecture consisted of three agents:
Previous approaches to long-running autonomous coding often fell short due to issues like context loss and drift over time. In an earlier experiment, we used an initializer agent to decompose a product spec into a task list, followed by a coding agent that implemented tasks one at a time. While this method improved performance, it still struggled with complex tasks.
By introducing the evaluator agent, we addressed these issues:
The result was a robust system capable of producing rich full-stack applications over multi-hour autonomous coding sessions. This approach not only improved the quality of the output but also increased reliability and consistency.
Our multi-agent harness design has significantly enhanced Claude's capabilities in both frontend design and long-running application development. By leveraging the strengths of GANs and structured feedback, we've created a more reliable and efficient system for autonomous software engineering.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
25 March 2026
133 articles
Related Articles
Related Articles
More Stories