OpenAI's Responses API: The Stateful Upgrade for Model Conversations

Models & Research

The Engineer

10 Sept 2025 · 3 min read

OpenAI's new Responses API simplifies model interactions by retaining conversation context, allowing for more natural and efficient dialogues without sending full chat histories with each query.

Six months ago, OpenAI rolled out the Responses API, a significant upgrade to their previous /chat/completions API. This new API introduces stateful inference, which means you don’t have to send the entire conversation history with each request. Instead, you can pass around an ID representing the current state of the conversation, and OpenAI will manage it for you. This change has sparked a lot of discussion among developers and researchers.

What Changed Technically?

Stateful Inference: The Responses API is designed to maintain the state of conversations across multiple requests. You provide an ID that references the ongoing conversation, and the API handles the context management.
Built-in Tools: The new API comes with a suite of built-in tools that can be used within the conversation flow, enhancing the functionality without additional setup.

Why Does This Matter?

OpenAI is heavily promoting the Responses API, emphasizing its performance and cost benefits. They also suggest that some advanced functionalities, like agentic behavior (models acting more autonomously), are better supported by this stateful approach. However, the real reason for the push might be less about technical superiority and more about a specific limitation of OpenAI's models.

The Secret Behind Stateful Inference

Despite OpenAI’s claims, there is nothing inherently superior about a stateful API compared to a stateless one in terms of basic functionality. Prefix caching and parallel tool execution can be achieved with either approach. So why the strong push?

The key lies in reasoning traces. Most advanced models today are reasoning models, which means they think through problems before providing answers. These chains of thought are crucial for maintaining context and improving model performance over time.

Reasoning Models: Models like Claude, DeepSeek, and Qwen expose their chain of thought in API responses. This allows developers to include these thoughts in the conversation history, giving the model more context.
OpenAI’s Secret: OpenAI, however, keeps the reasoning traces of their models (like GPT-5-Thinking) secret. This is likely due to concerns about safety and potential leaks of private information or implementation details.

The Problem with /chat/completions

Because OpenAI does not expose the chain of thought for their models, developers using the /chat/completions API cannot pass this context between requests. This means that while GPT-5 might be very capable internally, it appears less so when used through the /chat/completions API.

The Solution: Responses API

The Responses API solves this issue by maintaining the state of the conversation, including the reasoning traces, on OpenAI’s servers. This allows developers to build more coherent and context-aware applications using OpenAI’s models without needing to manage complex state themselves.

Performance and Cost: Stateful management can lead to better performance and cost efficiency, as less data needs to be transmitted with each request.
Advanced Functionality: The API also supports advanced features like parallel tool execution and agentic behavior more seamlessly.

Conclusion

While the Responses API introduces some complexity, it addresses a critical limitation in OpenAI’s model ecosystem. By maintaining state and reasoning traces, it enables developers to build more sophisticated applications with OpenAI’s models. Whether this justifies the additional complexity is up to individual use cases, but for those who need the full capabilities of GPT-5, the Responses API is a clear step forward.