
Share
This paper presents DiffuSSM, a novel approach that sidesteps attention mechanisms to generate high-resolution images efficiently, offering a promising solution to the scalability issues faced by current DDPMs.
In the rapidly evolving field of high-fidelity image generation, Denoising Diffusion Probabilistic Models (DDPMs) have become a cornerstone. However, their application at high resolutions has been hindered by significant computational challenges. Traditional methods like patchifying in UNet and Transformer architectures can expedite processes but often at the cost of representational capacity. A new paper from Jing Nathan Yan, Jiatao Gu, and Alexander M. Rush introduces the Diffusion State Space Model (DiffuSSM), which replaces attention mechanisms with a more scalable state space model backbone. This approach not only handles higher resolutions without global compression but also preserves detailed image representation throughout the diffusion process.
State Space Model Backbone: The core innovation in DiffuSSM is the use of a state space model (SSM) instead of attention mechanisms. SSMs are known for their efficiency in handling sequential data and can be extended to 2D images by treating them as sequences of patches.
Scalability and Efficiency: By leveraging the structure of SSMs, DiffuSSM reduces the computational burden typically associated with high-resolution image generation. This is particularly important for large-scale datasets like ImageNet and LSUN.
Model Architecture:
Training and Inference:

The introduction of DiffuSSM marks a significant step forward in the field of image generation. By replacing attention mechanisms with state space models, the authors have created a model that is both computationally efficient and capable of generating high-quality images at high resolutions. This approach not only addresses the limitations of current methods but also paves the way for future advancements in scalable diffusion models.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
4 December 2023
88 articles
Related Articles
Related Articles
More Stories