
Share
New research shows that offensive cybersecurity agents face a broader and more mutable risk landscape than previously thought, challenging the adequacy of current static assessment methods.
The "bubble" of risk associated with offensive cybersecurity agents is larger, cheaper, and more dynamic than many assessments currently suggest. At Princeton's POLARIS Lab, our research reveals that adversaries can adapt and modify models in ways that significantly expand the perceived safety profile captured by static evaluations. This is particularly concerning when evaluating AI systems for their potential use in cyberattacks.
Most frontier models today undergo some form of safety testing to determine whether they can assist adversaries in launching costly cyberattacks. However, these assessments often overlook a critical factor: the adaptability of adversaries. Adversaries can manipulate open-source or fine-tunable models to bypass safeguards, expanding the risk far beyond what static evaluations capture.
For example, our recent research, "Dynamic Risk Assessments for Offensive Cybersecurity Agents," demonstrates that using only 8 H100 GPU-hours of compute-approximately $36-an adversary could improve an agent's success rate on InterCode-CTF by over 40% using relatively simple methods. This flexibility means that model safety is not fixed; there is a "bubble" of risk defined by the degrees of freedom an adversary has to enhance an agent.
Cybersecurity tasks are particularly susceptible to growing the risk bubble due to several key factors:

The expanding risk bubble poses several significant risks:
Addressing the expanding risk bubble requires a shift from static to dynamic risk assessments. Here are some strategies:
The "bubble" of risk associated with offensive cybersecurity agents is a pressing concern. By recognizing the adaptability of adversaries and the dynamic nature of cyber threats, we can develop more robust and effective risk assessments. This shift from static to dynamic evaluations is crucial for ensuring the safety and security of AI systems in an increasingly complex threat landscape.
Tags
Original Sources
About the author
Marcus began tracking AI's market implications in 2016, noticing AI-related patent filings accelerating ahead of earnings upgrades before most of the sell-side had caught on. A former fixed-income quantitative analyst, he spent two decades building models that priced risk across emerging markets before pivoting to cover the economic impact of AI full-time. His writing translates opaque technical developments into clear risk/reward terms — and he's rarely diplomatic about the gap between AI valuations and underlying fundamentals. He believes most market participants still underestimate AI's long-run deflationary effect on knowledge work.
More from The Analyst →This Week's Edition
16 July 2025
133 articles
Related Articles
Related Articles
More Stories