Expanding Risk Bubble: Dynamic Assessments for Offensive Cybersecurity Agents

Security & Risk

The Analyst

16 Jul 2025 · 3 min read

New research shows that offensive cybersecurity agents face a broader and more mutable risk landscape than previously thought, challenging the adequacy of current static assessment methods.

The "bubble" of risk associated with offensive cybersecurity agents is larger, cheaper, and more dynamic than many assessments currently suggest. At Princeton's POLARIS Lab, our research reveals that adversaries can adapt and modify models in ways that significantly expand the perceived safety profile captured by static evaluations. This is particularly concerning when evaluating AI systems for their potential use in cyberattacks.

The Problem with Static Assessments

Most frontier models today undergo some form of safety testing to determine whether they can assist adversaries in launching costly cyberattacks. However, these assessments often overlook a critical factor: the adaptability of adversaries. Adversaries can manipulate open-source or fine-tunable models to bypass safeguards, expanding the risk far beyond what static evaluations capture.

For example, our recent research, "Dynamic Risk Assessments for Offensive Cybersecurity Agents," demonstrates that using only 8 H100 GPU-hours of compute-approximately $36-an adversary could improve an agent's success rate on InterCode-CTF by over 40% using relatively simple methods. This flexibility means that model safety is not fixed; there is a "bubble" of risk defined by the degrees of freedom an adversary has to enhance an agent.

Why Cybersecurity Tasks Amplify Risk

Cybersecurity tasks are particularly susceptible to growing the risk bubble due to several key factors:

Built-in Success Signals: When a vulnerability is successfully exploited, the attacker receives clear feedback, enabling fast, iterative improvements. This allows for simple retries and rapid enhancement of attack capabilities.
Financial Incentives: The financial incentives for cyberattacks are substantial. Ransomware attacks alone generate over $1 billion annually, making it economically viable for adversaries to invest in compute resources. These factors create a perfect storm where adversaries can quickly scale up performance to deploy offensive cybersecurity agents.

Key Risks

The expanding risk bubble poses several significant risks:

Increased Attack Surface: If model providers offer fine-tuning APIs or allow repeated queries, the attack surface dramatically increases. This is particularly true for AI systems used in offensive cybersecurity.
Dynamic Threat Landscape: The adaptability of adversaries means that the threat landscape is constantly evolving. Static assessments fail to capture this dynamic nature, leading to underestimation of potential risks.
Economic Viability: The low cost and high financial returns make it economically feasible for adversaries to invest in improving their attack capabilities.

The Opportunity

Addressing the expanding risk bubble requires a shift from static to dynamic risk assessments. Here are some strategies:

Continuous Monitoring: Implement continuous monitoring of model usage to detect and mitigate adversarial manipulation.
Adaptive Safeguards: Develop adaptive safeguards that can evolve alongside adversary tactics.
Collaborative Efforts: Foster collaboration between researchers, policymakers, and industry stakeholders to stay ahead of emerging threats.

Conclusion

The "bubble" of risk associated with offensive cybersecurity agents is a pressing concern. By recognizing the adaptability of adversaries and the dynamic nature of cyber threats, we can develop more robust and effective risk assessments. This shift from static to dynamic evaluations is crucial for ensuring the safety and security of AI systems in an increasingly complex threat landscape.