
Share
Hibiki, developed by Meta AI, revolutionizes speech-to-speech translation with real-time processing, decoding chunks of speech as they come for seamless cross-language communication.
Hibiki, a new decoder-only model introduced by researchers from Meta AI, tackles the challenging task of simultaneous speech-to-speech translation with impressive results. Unlike traditional models that wait for the entire source utterance to complete before translating, Hibiki processes and translates speech in real-time, chunk by chunk. This capability is crucial for applications like live interpretation and real-time communication across languages.

On a French-English simultaneous speech translation task, Hibiki demonstrates state-of-the-art performance in several key metrics:
The ability to perform high-fidelity simultaneous speech-to-speech translation opens up a range of applications:
Hibiki represents a significant advancement in the field of speech-to-speech translation. Its decoder-only design, multistream processing capabilities, and adaptive inference process make it a powerful tool for real-time translation tasks. The model's performance on French-English translation tasks is particularly noteworthy, demonstrating its potential for practical applications.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
7 February 2025
133 articles
Related Articles
Related Articles
More Stories