
Share
As Anthropic reveals, aligning AI with human intent isn't just ethical; it's essential for creating truly capable and useful intelligent systems, challenging the notion that raw technical prowess alone suffices.
The ongoing debate over whether alignment is a constraint on capable artificial intelligence systems may be missing the mark. Emerging evidence suggests that alignment is, in fact, an intrinsic component of capability at deeper levels. This means that models which excel in benchmarks but fail to understand human intent are inherently less useful and, by extension, not truly advanced general intelligence (AGI) systems.
Anthropic and OpenAI have been exploring the relationship between alignment and capability for over two years, each taking distinct approaches. The results of these experiments are beginning to provide valuable insights into what makes an AI system both capable and aligned with human values.
At Anthropic, alignment researchers are deeply embedded in capability work, eliminating any clear distinction between the two. This integrated approach ensures that alignment is not treated as a separate or secondary concern but rather as a fundamental aspect of model development.
Jan Leike, former OpenAI Superalignment lead and now at Anthropic, highlighted this integration:
"Some people have been asking what we did to make Opus 4.5 more aligned. There are lots of details we're planning to write up, but most important is that alignment researchers are pretty deeply involved in post-training and get a lot of leeway to make changes."
Sam Bowman, an alignment researcher at Anthropic, further emphasized the seamless integration:
"Second: Alignment researchers are involved in every part of training. We don't have a clear split between alignment research and applied finetuning. Alignment-focused researchers are deeply involved in designing and staffing production training runs."
A critical aspect of this approach is the development of a coherent identity within the model. Sam Bowman noted:
"It's becoming increasingly clear that a model's self-image or self-concept has some real influence on how its behavior generalizes to novel settings."
To achieve this, Anthropic uses a 14,000-token document, referred to as the "soul document," designed to instill a thorough understanding of Anthropic’s goals and reasoning into Claude, their AI model. This method aims for alignment through deep understanding rather than external constraints.

In contrast, OpenAI has taken a more traditional route, often treating alignment as a separate phase following the initial development of capabilities. While both companies are making significant strides, Anthropic’s integrated approach may offer a more holistic and effective solution to creating broadly useful AI systems.
The distinction between alignment and capability is crucial for developing AI that can be trusted and widely adopted. A model that excels in benchmarks but fails to understand human values and intentions is of limited use. The emerging definition of AGI as "broadly useful and providing economic value across many tasks" underscores the importance of this integrated approach.
Despite the promising results from Anthropic, there are significant risks associated with this integrated approach:
The opportunity lies in creating AI systems that are not only highly capable but also deeply aligned with human values and intentions. This could lead to:
As the results from Anthropic’s experiment continue to emerge, it is clear that the future of AI development lies in the seamless integration of alignment and capability. This approach not only enhances the utility of AI systems but also ensures they are aligned with human values, making them more trustworthy and broadly applicable.
Tags
Original Sources
About the author
Marcus began tracking AI's market implications in 2016, noticing AI-related patent filings accelerating ahead of earnings upgrades before most of the sell-side had caught on. A former fixed-income quantitative analyst, he spent two decades building models that priced risk across emerging markets before pivoting to cover the economic impact of AI full-time. His writing translates opaque technical developments into clear risk/reward terms — and he's rarely diplomatic about the gap between AI valuations and underlying fundamentals. He believes most market participants still underestimate AI's long-run deflationary effect on knowledge work.
More from The Analyst →This Week's Edition
9 December 2025
133 articles
Related Articles
Related Articles
More Stories