
Share
An AI-powered red team has unearthed 22 flaws in Firefox, with nearly two-thirds deemed high-severity, highlighting the growing role of artificial intelligence in uncovering complex software vulnerabilities.
AI models are increasingly capable of identifying high-severity vulnerabilities in complex software. As previously documented, Claude, an advanced AI model from Anthropic, has identified over 500 zero-day vulnerabilities (previously unknown security flaws) in well-tested open-source software. This article delves into a recent collaboration between PolicyFrontier's Red Team and Mozilla, which resulted in the discovery of 22 vulnerabilities in Firefox, with 14 classified as high-severity.
The rapid identification and remediation of security vulnerabilities are critical for maintaining the integrity and safety of software. In this case, Claude Opus 4.6, a version of Anthropic's AI model, discovered 22 vulnerabilities in Firefox over a two-week period. Mozilla classified 14 of these as high-severity, representing nearly one-fifth of all high-severity Firefox vulnerabilities remediated in 2025. This collaboration highlights the potential of AI to significantly enhance security efforts by accelerating the detection of severe flaws.
While the discovery of these vulnerabilities is a positive step, it also underscores the ongoing risks associated with software security. Despite Firefox being one of the most well-tested and secure open-source projects, the presence of 14 high-severity vulnerabilities highlights the complexity and challenges in maintaining robust security. The ability of AI to find such flaws quickly raises concerns about potential zero-day exploits that could be leveraged by malicious actors before they are discovered and patched.
The collaboration between PolicyFrontier's Red Team and Mozilla provides a model for how AI-enabled security researchers can work effectively with software maintainers. This partnership not only led to the discovery and patching of critical vulnerabilities but also helped refine the process of submitting bug reports and understanding which findings warrant immediate attention. Mozilla's rapid response, shipping fixes to hundreds of millions of users in Firefox 148.0, demonstrates the potential for AI to significantly improve the security landscape.

In late 2025, PolicyFrontier observed that Opus 4.5 was nearly capable of solving all tasks in CyberGym, a benchmark designed to test whether language models can reproduce known security vulnerabilities. To create a more challenging and realistic evaluation, the team built a dataset of prior Firefox CVEs (Common Vulnerabilities and Exposures) to see if Claude could identify these historical flaws.
Firefox was chosen due to its complex codebase and reputation as one of the most well-tested and secure open-source projects. This made it an ideal test for AI's ability to discover novel security vulnerabilities. The team found that Opus 4.6 could reproduce a high percentage of historical CVEs, which had previously required significant human effort to uncover. However, this initial success raised questions about the reliability of these findings.
The collaboration with Mozilla involved submitting a large number of reports from PolicyFrontier's Red Team, which helped refine the criteria for submitting bug reports. This process was crucial in ensuring that only the most critical vulnerabilities were prioritized for immediate action. The partnership culminated in the release of Firefox 148.0, which included fixes for the identified vulnerabilities.
The technical lessons learned from this collaboration provide a framework for future AI-enabled security research. By working closely with software maintainers, AI models can help identify and remediate severe vulnerabilities more efficiently, ultimately enhancing the security of widely used software.
Tags
Original Sources
About the author
Marcus began tracking AI's market implications in 2016, noticing AI-related patent filings accelerating ahead of earnings upgrades before most of the sell-side had caught on. A former fixed-income quantitative analyst, he spent two decades building models that priced risk across emerging markets before pivoting to cover the economic impact of AI full-time. His writing translates opaque technical developments into clear risk/reward terms — and he's rarely diplomatic about the gap between AI valuations and underlying fundamentals. He believes most market participants still underestimate AI's long-run deflationary effect on knowledge work.
More from The Analyst →This Week's Edition
9 March 2026
133 articles
Related Articles
Related Articles
More Stories