Claude Code Performance Decline: A Closer Look at Anthropic's Black Box

Security & Risk

The Analyst

16 Apr 2026 · 3 min read

As concerns mount over Claude Code's performance decline, new analysis suggests the issues may be more complex than initial reports indicated, challenging assumptions about Anthropic's intentions and practices.

Why it Matters

On April 2, a GitHub issue emerged with detailed metrics that raised concerns about the performance of Claude Code, a tool by Anthropic. The user reported analyzing 6,852 session files, 17,871 thinking blocks, and 234,760 tool calls, suggesting that Claude had started reading less code before editing, stopping earlier, looping more often, and requiring more human correction in complex engineering tasks. This issue quickly gained traction, leading to broader accusations that Anthropic had "nerfed" the tool. However, a closer examination of the evidence reveals a more nuanced problem: the black box nature of Claude Code's operations.

Key Risks

The public case for a secret nerfing of Claude is weak. There is no concrete proof that Anthropic secretly swapped model weights, lowered precision, or degraded Opus 4.6 to save computational resources. The viral benchmark drop, while concerning, does not provide strong evidence on its own. Anthropic has previously denied demand-based quality reduction, and the strongest public reports still lack independent raw data.

However, the real risk lies in the opaque nature of Claude Code's operations. Adaptive thinking, effort defaults, cache duration, context compaction, quota policies, and status incidents can all significantly impact user experience without any changes to the model name. This black box approach makes it difficult for users to understand what has changed and why their workflows are regressing.

The Opportunity

The strongest evidence against Claude Code is not the viral BridgeBench chart but a detailed GitHub issue that made a measurable claim about one demanding workflow. The reported fall in reads before edits, from 6.6 to 2.0, is particularly significant for a coding agent. Reading before editing is a critical metric, as it directly affects the tool's ability to understand and modify code accurately.

Anthropic has an opportunity to address these concerns by providing more transparency. Implementing session telemetry would allow customers to see which operating conditions have changed, thereby fostering trust and enabling users to optimize their workflows. This move could also help Anthropic gather valuable feedback for continuous improvement.

Detailed Analysis

The GitHub issue at the heart of this controversy provides a narrow, measurable claim about a specific workflow. The user reported a significant decline in Claude's ability to read code before making edits, which is crucial for maintaining code integrity and reducing human intervention. This metric drop from 6.6 to 2.0 reads is particularly alarming.

While the viral benchmark chart that circulated on social media platforms like X (formerly Twitter) garnered attention, it does not offer the same level of detail or reliability as the GitHub issue. The benchmark chart's impact may have been exaggerated, but it highlights the broader user dissatisfaction and the need for more transparent performance metrics.

Conclusion

The public case for a secret nerfing of Claude is thin, but the product has indeed changed in ways that affect user experience. The black box nature of Claude Code's operations poses significant risks by making it difficult for users to understand and adapt to these changes. Anthropic can mitigate these risks and capitalize on the opportunity by providing more transparency through session telemetry. This approach would not only address user concerns but also enhance trust and foster a collaborative environment for continuous improvement.