
Share
Researchers unveil HAD, a groundbreaking dataset, and HADM, models that detect human artifacts in text-to-image creations, aiming to eliminate distorted or missing body parts for more realistic visuals.
Recent advancements in text-to-image generation models have been impressive, but one persistent issue remains: human artifacts. These artifacts manifest as poorly generated human bodies, including distorted, missing, or extra body parts, which can significantly impair the visual fidelity of the images. A new paper by Kaihong Wang, Lingzhi Zhang, and Jianming Zhang introduces a solution to this problem through the Human Artifact Dataset (HAD) and the Human Artifact Detection Models (HADM).
The researchers have created HAD, the first large-scale dataset specifically designed to identify and localize human artifacts in generated images. This dataset comprises over 37,000 images from several popular text-to-image models, each annotated for artifact localization. Using HAD, they trained HADM, a model that can detect various types of artifacts across multiple generative domains.
Dataset Creation:
Model Training:
Applications:

Finetuning Diffusion Models:
Iterative Inpainting Framework:
Quantitative Analysis:
Qualitative Analysis:
The introduction of HAD and HADM represents a significant step forward in addressing one of the most challenging issues in text-to-image generation. By providing a large-scale dataset and a robust detection model, researchers and practitioners can now better understand and mitigate human artifacts, leading to higher-quality generated images. The applications of this work, from finetuning generative models to iterative inpainting, demonstrate its practical utility in improving the fidelity of AI-generated content.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
25 November 2024
88 articles
Related Articles
Related Articles
More Stories