
Share
MVInpainter addresses the gap in realistic 2D-3D editing by ensuring multi-view consistency, making it a game-changer for handling unpredictable real-world scenarios beyond lab conditions.
Authors: Chenjie Cao, Chaohui Yu, Yanwei Fu, Fan Wang, Xiangyang Xue
Affiliations: Fudan University, Alibaba DAMO Academy, Hupan Lab
Links: arXiv, Code and Model
Recent advancements in Novel View Synthesis (NVS) and 3D generation have been impressive, but they often fall short when applied to real-world, uncontrolled environments. These methods typically focus on specific categories or synthetic assets and rely heavily on camera poses, which limits their practicality.
To address these challenges, researchers from Fudan University, Alibaba DAMO Academy, and Hupan Lab have introduced MVInpainter, a novel approach that reformulates 3D editing as a multi-view 2D inpainting task. By partially inpainting multiple views with reference guidance, MVInpainter simplifies the complexity of in-the-wild NVS and leverages unmasked clues instead of explicit pose conditions.
MVInpainter is designed with two variants:
Both variants share a common SD-inpainting backbone but differ in LoRA/motion weights and masking strategies. The key components include:

To ensure accurate mask shapes for inference, MVInpainter employs a masking adaption technique. This process starts with a simple 4-point bottom face of the object and applies perspective warping through dense matching to warp the mask into the correct shape. This ensures that the inpainting is contextually appropriate and consistent across multiple views.
MVInpainter demonstrates impressive results in various scene editing tasks, including:
The multi-view inpainting results show high fidelity and consistency across different views. This is particularly evident in complex, real-world scenarios where traditional methods often struggle.
MVInpainter also excels in 3D generation synthesis, producing coherent and realistic 3D scenes from multiple inpainted views. The ability to generate high-quality 3D content without explicit pose information is a significant advancement in the field.
MVInpainter represents a significant step forward in bridging the gap between 2D and 3D editing. By leveraging multi-view consistent inpainting, it simplifies the complexity of real-world NVS tasks and opens up new possibilities for practical applications in 3D generation and scene editing.
Tags
Original Sources
↗ https://ewrfcas.github.io/MVInpainter/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
19 August 2024
133 articles
Related Articles
Related Articles
More Stories