
Share
This article delves into creating a maliciousness classifier that scrutinizes the internal workings of language models to enhance security in AI-driven systems, addressing critical questions about monitoring and testing.
In the rapidly evolving landscape of AI agents, ensuring robust security is paramount. As developers and organizations integrate AI into their workflows, it's crucial to ask the right questions about the security systems in place:
Zenity Labs has taken a transparent approach to addressing these questions. In this article, we delve into their research and tools aimed at enhancing agentic security:
The core business case for the system is straightforward: classify user inputs (whether single prompts or multi-turn conversations) as malicious or benign. This classification enables real-time alerting or blocking based on predefined severity levels.
Definition of "Malicious":
Input Feeding:
Activations Collection:

The robustness of the system hinges on its training data and testing procedures:
Transparency in AI decision-making is crucial for building trust and ensuring accountability. The system provides insights into why an input was classified as malicious or benign, enhancing interpretability:
Data Representation:
OOD Performance:
Operational Challenges:
By leveraging advanced LLM internals and transparent methodologies, organizations can significantly enhance their agentic security. This approach not only improves the reliability and safety of AI agents but also fosters a culture of openness and collaboration within the
Tags
Original Sources
About the author
Marcus began tracking AI's market implications in 2016, noticing AI-related patent filings accelerating ahead of earnings upgrades before most of the sell-side had caught on. A former fixed-income quantitative analyst, he spent two decades building models that priced risk across emerging markets before pivoting to cover the economic impact of AI full-time. His writing translates opaque technical developments into clear risk/reward terms — and he's rarely diplomatic about the gap between AI valuations and underlying fundamentals. He believes most market participants still underestimate AI's long-run deflationary effect on knowledge work.
More from The Analyst →This Week's Edition
19 February 2026
133 articles
Related Articles
Related Articles
More Stories