
Share
OpenAI's new Responses API simplifies model interactions by retaining conversation context, allowing for more natural and efficient dialogues without sending full chat histories with each query.
Six months ago, OpenAI rolled out the Responses API, a significant upgrade to their previous /chat/completions API. This new API introduces stateful inference, which means you don’t have to send the entire conversation history with each request. Instead, you can pass around an ID representing the current state of the conversation, and OpenAI will manage it for you. This change has sparked a lot of discussion among developers and researchers.
OpenAI is heavily promoting the Responses API, emphasizing its performance and cost benefits. They also suggest that some advanced functionalities, like agentic behavior (models acting more autonomously), are better supported by this stateful approach. However, the real reason for the push might be less about technical superiority and more about a specific limitation of OpenAI's models.
Despite OpenAI’s claims, there is nothing inherently superior about a stateful API compared to a stateless one in terms of basic functionality. Prefix caching and parallel tool execution can be achieved with either approach. So why the strong push?
The key lies in reasoning traces. Most advanced models today are reasoning models, which means they think through problems before providing answers. These chains of thought are crucial for maintaining context and improving model performance over time.

Because OpenAI does not expose the chain of thought for their models, developers using the /chat/completions API cannot pass this context between requests. This means that while GPT-5 might be very capable internally, it appears less so when used through the /chat/completions API.
The Responses API solves this issue by maintaining the state of the conversation, including the reasoning traces, on OpenAI’s servers. This allows developers to build more coherent and context-aware applications using OpenAI’s models without needing to manage complex state themselves.
While the Responses API introduces some complexity, it addresses a critical limitation in OpenAI’s model ecosystem. By maintaining state and reasoning traces, it enables developers to build more sophisticated applications with OpenAI’s models. Whether this justifies the additional complexity is up to individual use cases, but for those who need the full capabilities of GPT-5, the Responses API is a clear step forward.
Tags
Original Sources
↗ https://www.seangoedecke.com/responses-api/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
10 September 2025
133 articles
Related Articles
Related Articles
More Stories