
Share
Researchers have developed Prompt Depth Anything, a technique that uses iPhone LiDAR to create high-resolution depth maps, marking a breakthrough in leveraging inexpensive hardware for precise visual data.
In a significant leap for depth estimation, the team behind Prompt Depth Anything has introduced a method that leverages prompting to achieve high-resolution and accurate metric depth maps. This approach, inspired by the success of prompting in vision-language models (VLMs) and large language models (LLMs), uses an iPhone LiDAR as the prompt to guide a foundation model for up to 4K resolution depth estimation.
The core innovation lies in the prompt fusion design. The Depth Anything model integrates LiDAR data at multiple scales within the depth decoder. This multi-scale integration ensures that the model can leverage the detailed and accurate depth information provided by the LiDAR, even at high resolutions.
Training a depth estimation model requires large and diverse datasets. To address this, the team developed a scalable data pipeline:
This pipeline ensures that the model can be trained on a wide range of scenarios, improving its generalization and robustness.
Prompt Depth Anything sets new state-of-the-art results on the ARKitScenes and ScanNet++ datasets:

Monocular depth methods can generate high-resolution depth maps but often struggle with consistent metric scale information. Even after aligning with LiDAR data, these methods may still lack the accuracy and consistency provided by Prompt Depth Anything.
The ARKit LiDAR provides high-quality depth data but is limited by its hardware capabilities. Prompt Depth Anything enhances this data, achieving higher resolution and more accurate metric depth:
The high-resolution and accurate metric depth maps generated by Prompt Depth Anything have significant implications for various applications:
Prompt Depth Anything represents a significant advancement in depth estimation by leveraging the power of prompting with low-cost LiDAR data. The method's high-resolution and accurate metric depth maps open up new possibilities for 3D reconstruction and robotic applications, making it a valuable tool for researchers and practitioners alike.
Tags
Original Sources
↗ https://promptda.github.io/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
20 December 2024
88 articles
Related Articles
Related Articles
More Stories