
Share
Researchers present a novel method for generating stylistically consistent images without the need for fine-tuning, streamlining the process for artists and designers.
By Amir Hertz*, Andrey Voynov*, Shlomi Fruchter†, and Daniel Cohen-Or†
1 Google Research, 2 Tel Aviv University
*Indicates Equal Contribution, †Indicates Equal Advising
[Paper] [Code]
Large-scale Text-to-Image (T2I) models have become a cornerstone in creative fields, generating visually compelling images from textual prompts. However, ensuring consistent style across multiple images remains a significant challenge. Traditional methods often require fine-tuning and manual intervention to disentangle content and style. In this paper, "Style Aligned Image Generation via Shared Attention," researchers from Google Research introduce StyleAligned, a novel technique that achieves consistent style generation using a pretrained diffusion model without the need for fine-tuning.
State-of-the-art T2I models often produce images that diverge significantly in their interpretations of the same stylistic descriptor. For example, given the style description "minimal origami," standard T2I generation might output images with vastly different styles (left). StyleAligned addresses this by making the model's generation style persistent (right).

StyleAligned enables style-consistent content generation across different prompts without fine-tuning. Here are some key findings:
StyleAligned is versatile and can be easily combined with other methods to enhance its capabilities:
For practitioners in the field of generative AI, StyleAligned offers several advantages:
Style Aligned Image Generation via Shared Attention is a significant step forward in the field of text-to-image generation. By leveraging minimal attention sharing and reference image inversion, it achieves style consistency without the need for fine-tuning. This method has the potential to revolutionize creative workflows by providing reliable and high-quality style-consistent image generation.
Tags
Original Sources
↗ https://style-aligned-gen.github.io/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
6 December 2023
133 articles
Related Articles
Related Articles
More Stories