
Share
Researchers from leading institutions have developed LVSM, a transformer-based model for high-quality novel view synthesis that operates with minimal 3D bias, marking a significant advancement in computer vision.
The Large View Synthesis Model (LVSM) is a groundbreaking transformer-based approach to novel view synthesis from sparse input views. Developed by researchers from Cornell University, The University of Texas at Austin, Adobe Research, and MIT, LVSM achieves high-quality results in a feed-forward manner with minimal 3D inductive bias. This makes it a significant step forward in the field of computer vision and scene representation.
LVSM introduces two main architectures:
Both models bypass traditional 3D inductive biases (e.g., NeRF, 3DGS) and network designs (e.g., epipolar projections, plane sweeps), adopting a fully data-driven approach. This is particularly noteworthy because it addresses the limitations of previous methods that often rely heavily on 3D geometry.
LVSM outperforms previous state-of-the-art methods by 1.5 to 3.5 dB PSNR. This is a significant improvement, especially considering that LVSM achieves these results with reduced computational resources (1-2 GPUs).
| Method | PSNR (dB) | |----------------|-----------| | Previous SOTA | 28.0 | | LVSM Encoder | 30.5 | | LVSM Decoder | 31.5 |

LVSM is particularly effective in handling sparse input views. Here are some key results:
Scene-Level Novel View Synthesis (2 Views):
Object-Level Novel View Synthesis (4 Views):
Encoder-Decoder LVSM:
Decoder-Only LVSM:
LVSM has been evaluated across multiple datasets, including:
In all these datasets, LVSM consistently outperforms previous methods in terms of PSNR and visual quality.
LVSM represents a significant advancement in the field of novel view synthesis. By leveraging transformer models and minimal 3D inductive bias, it offers scalable, efficient, and high-quality results. Whether you're working on scene-level or object-level tasks, LVSM is a powerful tool that can enhance your projects with state-of-the-art performance.
Tags
Original Sources
↗ https://haian-jin.github.io/projects/LVSM/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
29 October 2024
88 articles
Related Articles
Related Articles
More Stories