
Share
OpenAI's new Realtime API and `gpt-realtime` model offer unprecedented audio clarity and natural conversation flow, revolutionizing how developers build robust voice agents for enterprises.
OpenAI has announced the general availability of its Realtime API, accompanied by a suite of new features designed to empower developers and enterprises in building reliable, production-ready voice agents. Among these updates is the introduction of gpt-realtime, OpenAI's most advanced speech-to-speech model yet. This release marks significant improvements in audio quality, instruction following, and natural language processing, making it easier to deploy sophisticated voice applications.
The Realtime API now supports several key features that enhance its functionality and flexibility:
gpt-realtime represents a significant leap forward in speech-to-speech technology. Here are the key improvements:
gpt-realtime excels at following complex instructions and executing multi-step tasks with precision. It can handle detailed commands, such as narrowing down listings based on specific criteria or guiding users through financial calculations.
The enhancements in gpt-realtime and the Realtime API have already been put to use by early adopters. For instance, Zillow, a leading real estate platform, has integrated these technologies into their services:
“The new speech-to-speech model in OpenAI's Realtime API shows stronger reasoning and more natural speech-allowing it to handle complex, multi-step requests like narrowing listings by lifestyle needs or guiding affordability discussions with tools like our BuyAbility score. This could make searching for a home on Zillow or exploring financing options feel as natural as a conversation with a friend, helping simplify decisions like buying, selling, and renting a home.”
– Josh Weisberg, Head of AI at Zillow
gpt-realtime is built on an advanced transformer architecture optimized for real-time processing. This allows it to handle audio streams efficiently, reducing latency and improving response times.gpt-realtime processes and generates audio directly through a single model. This streamlined approach not only reduces latency but also preserves the nuance in speech, leading to more natural and expressive responses.The introduction of gpt-realtime and the updated Realtime API by OpenAI marks a significant step forward in the development of voice agents. These enhancements offer developers and enterprises the tools they need to build sophisticated, reliable, and high-quality voice applications that can handle complex tasks with ease. Whether it’s customer support, personal assistance, or education, gpt-realtime is set to revolutionize how we interact with voice technology.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
29 August 2025
88 articles
Related Articles

OpenEvidence Targets Hospitals to Expand Its AI Chatbot for Doctors
Products & Applications · 3 min

OpenEvidence Launches Voice AI to Enhance Physician Workflow
Products & Applications · 3 min

Doximity Accelerates AI Investment in 2026, Targeting Multibillion-Dollar Market
Products & Applications · 3 min
Related Articles

OpenEvidence Targets Hospitals to Expand Its AI Chatbot for Doctors
Products & Applications · 3 min

OpenEvidence Launches Voice AI to Enhance Physician Workflow
Products & Applications · 3 min

Doximity Accelerates AI Investment in 2026, Targeting Multibillion-Dollar Market
Products & Applications · 3 min
More Stories