
Share
TangoFlux harnesses advanced Flow Matching and CLAP-Ranked Preference Optimization to revolutionize text-to-audio generation, producing high-quality audio in record time-setting a new standard for synthetic sound.
TangoFlux, a new Text-to-Audio (TTA) generative model developed by researchers from DeCLaRe Lab at the Singapore University of Technology and Design (SUTD), NVIDIA, and Lambda Labs, is making waves in the audio generation community. This 515M parameter model can generate up to 30 seconds of 44.1kHz stereo audio in just 3.7 seconds on a single A40 GPU. The key innovation lies in its use of Flow Matching and CLAP-Ranked Preference Optimization (CRPO), which significantly enhances the alignment and quality of generated audio.

| Text Description | Stable Audio Open | TANGO 2 | AudioLDM2 | AudioBox | TangoFlux (Ours) | | --- | --- | --- | --- | --- | --- | | Melodic human whistling harmonizing with natural birdsong | Your browser does not support the audio element. | Your browser does not support the audio element. | Your browser does not support the audio element. | Your browser does not support the audio element. | Your browser does not support the audio element. | | A basketball bounces rhythmically on a court, shoes squeak against the floor, and a referee’s whistle cuts through the air. | Your browser does not support the audio element. | Your browser does not support the audio element. | Your browser does not support the audio element. | Your browser does not support the audio element. | Your browser does not support the audio element. |
TangoFlux represents a significant advancement in text-to-audio generation, combining cutting-edge techniques like Flow Matching and CLAP-Ranked Preference Optimization to produce high-quality, contextually aligned audio at unprecedented speeds. With its open-source code and models, researchers and practitioners can further explore and build upon this groundbreaking work.
Tags
Original Sources
↗ https://tangoflux.github.io/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
6 January 2025
88 articles
Related Articles
Related Articles
More Stories