
Share
AudioCraft now offers stereo sound generation without extra costs or computational burden, thanks to a clever coding technique that intertwines left and right audio channels during training and inference.
Facebook Research's AudioCraft project has recently seen a significant update with the addition of stereo models. This new feature introduces left/right codebook interleaving using a variant of the delay pattern, enabling stereo generation without any additional computational overhead during both training and inference. Let’s dive into the technical details and what this means for practitioners.
Along with the core feature of stereo generation, several other enhancements have been made to improve the overall user experience and demo quality:

The pull request includes a series of commits that address various aspects of the project:
For practitioners working with audio generation models, this update is significant for several reasons:
The latest update to AudioCraft’s MusicGen introduces stereo generation with no extra cost at both training and inference times. This innovation, along with several other improvements, enhances the efficiency and user experience of the model. Whether you’re a researcher or a developer working with audio models, this update is worth checking out.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
9 November 2023
88 articles
Related Articles
Related Articles
More Stories