
Share
OpenAI's new system card reveals how the company safeguards its Deep Research model during web browsing, addressing safety concerns and providing transparency for AI practitioners.
OpenAI has released a detailed system card outlining the safety measures and risk assessments conducted before launching their new agentic capability, Deep Research. This model is designed to perform multi-step research tasks on the internet, leveraging an early version of OpenAI's o3 architecture optimized for web browsing. Here’s what changed technically and why it matters to practitioners:
Model Capabilities:
Safety Enhancements:
OpenAI uses a Preparedness Framework to evaluate and mitigate risks. Here’s the scorecard for Deep Research:
Only models with a post-mitigation score of "medium" or below can be deployed. Models with a score of "high" or below can be developed further.
Deep Research is an agentic capability that excels in multi-step research tasks on the internet. Powered by an early version of OpenAI o3, it uses reasoning to search, interpret, and analyze vast amounts of data from various sources. Key features include:

OpenAI believes Deep Research will be valuable in a wide range of applications, from academic research to business intelligence.
Before making Deep Research available to Pro users, OpenAI conducted rigorous safety testing:
During safety testing, OpenAI identified opportunities to enhance their methods:
These efforts ensured that Deep Research was thoroughly evaluated and improved before its launch.
OpenAI's Deep Research system card provides a transparent look at the safety measures and risk assessments conducted for this powerful new capability. By addressing key areas of concern through rigorous testing and mitigation, OpenAI aims to ensure that Deep Research is both effective and safe for users.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
26 February 2025
133 articles
Related Articles
Related Articles
More Stories