
Share
Auffusion harnesses the power of diffusion models and LLMs to create more accurate and high-quality audio from text, outperforming current systems in both generation quality and alignment for complex inputs.
Recent advancements in diffusion models and large language models (LLMs) have significantly advanced the field of Artificial Intelligence Generated Content (AIGC). One exciting application within this domain is Text-to-Audio (TTA), which aims to generate audio from natural language prompts. However, existing TTA systems often struggle with generation quality and text-audio alignment, especially for complex inputs. Enter Auffusion, a new TTA system developed by researchers at Beijing University of Posts and Telecommunications.
The Auffusion system involves back-and-forth transformations between four feature spaces: audio, spectrogram, pixel, and latent space. Here’s a breakdown of the process:

To evaluate Auffusion’s effectiveness, the researchers conducted both objective and subjective assessments. Key findings include:
Auffusion’s strong text-audio alignment enables several advanced applications:
Auffusion represents a significant step forward in the field of Text-to-Audio generation. By leveraging the strengths of diffusion models and LLMs, it achieves high-quality audio generation and precise text-audio alignment. The system’s efficiency and versatility make it a promising tool for various applications, from content creation to audio repair.
Tags
Original Sources
↗ https://auffusion.github.io/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
4 January 2024
88 articles
Related Articles
Related Articles
More Stories