
Share
As concerns mount over Claude Code's performance decline, new analysis suggests the issues may be more complex than initial reports indicated, challenging assumptions about Anthropic's intentions and practices.
On April 2, a GitHub issue emerged with detailed metrics that raised concerns about the performance of Claude Code, a tool by Anthropic. The user reported analyzing 6,852 session files, 17,871 thinking blocks, and 234,760 tool calls, suggesting that Claude had started reading less code before editing, stopping earlier, looping more often, and requiring more human correction in complex engineering tasks. This issue quickly gained traction, leading to broader accusations that Anthropic had "nerfed" the tool. However, a closer examination of the evidence reveals a more nuanced problem: the black box nature of Claude Code's operations.
The public case for a secret nerfing of Claude is weak. There is no concrete proof that Anthropic secretly swapped model weights, lowered precision, or degraded Opus 4.6 to save computational resources. The viral benchmark drop, while concerning, does not provide strong evidence on its own. Anthropic has previously denied demand-based quality reduction, and the strongest public reports still lack independent raw data.
However, the real risk lies in the opaque nature of Claude Code's operations. Adaptive thinking, effort defaults, cache duration, context compaction, quota policies, and status incidents can all significantly impact user experience without any changes to the model name. This black box approach makes it difficult for users to understand what has changed and why their workflows are regressing.
The strongest evidence against Claude Code is not the viral BridgeBench chart but a detailed GitHub issue that made a measurable claim about one demanding workflow. The reported fall in reads before edits, from 6.6 to 2.0, is particularly significant for a coding agent. Reading before editing is a critical metric, as it directly affects the tool's ability to understand and modify code accurately.

Anthropic has an opportunity to address these concerns by providing more transparency. Implementing session telemetry would allow customers to see which operating conditions have changed, thereby fostering trust and enabling users to optimize their workflows. This move could also help Anthropic gather valuable feedback for continuous improvement.
The GitHub issue at the heart of this controversy provides a narrow, measurable claim about a specific workflow. The user reported a significant decline in Claude's ability to read code before making edits, which is crucial for maintaining code integrity and reducing human intervention. This metric drop from 6.6 to 2.0 reads is particularly alarming.
While the viral benchmark chart that circulated on social media platforms like X (formerly Twitter) garnered attention, it does not offer the same level of detail or reliability as the GitHub issue. The benchmark chart's impact may have been exaggerated, but it highlights the broader user dissatisfaction and the need for more transparent performance metrics.
The public case for a secret nerfing of Claude is thin, but the product has indeed changed in ways that affect user experience. The black box nature of Claude Code's operations poses significant risks by making it difficult for users to understand and adapt to these changes. Anthropic can mitigate these risks and capitalize on the opportunity by providing more transparency through session telemetry. This approach would not only address user concerns but also enhance trust and foster a collaborative environment for continuous improvement.
Tags
Original Sources
About the author
Marcus began tracking AI's market implications in 2016, noticing AI-related patent filings accelerating ahead of earnings upgrades before most of the sell-side had caught on. A former fixed-income quantitative analyst, he spent two decades building models that priced risk across emerging markets before pivoting to cover the economic impact of AI full-time. His writing translates opaque technical developments into clear risk/reward terms — and he's rarely diplomatic about the gap between AI valuations and underlying fundamentals. He believes most market participants still underestimate AI's long-run deflationary effect on knowledge work.
More from The Analyst →This Week's Edition
16 April 2026
133 articles
Related Articles
Related Articles
More Stories