Rethinking Inductive Biases for Efficient Surface Normal Estimation

Models & Research

The Engineer

4 Mar 2024 · 3 min read

Researchers at Imperial College London challenge traditional methods by using per-pixel ray direction and relative pixel rotations to estimate surface normals more efficiently, slashing data and compute needs.

In a recent paper presented at CVPR 2024, researchers from the Dyson Robotics Lab at Imperial College London have introduced a novel approach to surface normal estimation by rethinking the inductive biases used in training models. The key contributions of this work are the utilization of per-pixel ray direction and the estimation of surface normals through learning the relative rotation between nearby pixels. This approach significantly reduces the data and computational requirements, making it a compelling alternative to existing methods.

Key Technical Changes

Per-Pixel Ray Direction: The authors propose using the per-pixel ray direction as an additional input to the network. This provides crucial information about the orientation of surfaces, especially near occluding boundaries where normals should be perpendicular to the rays.
Rotation Estimation: Instead of directly estimating surface normals, the model learns to estimate the relative rotation between nearby pixels. This recasting of the problem reduces the complexity and improves training efficiency.

Why It Matters

Traditional methods for surface normal estimation often require large datasets and extensive computational resources. For instance, Omnidata V2, which is based on the DPT architecture, is trained on 12 million images over two weeks using four NVIDIA V100 GPUs. In contrast, the proposed model in this paper is trained on just 160,000 images for 12 hours on a single NVIDIA 4090 GPU. This efficiency makes it more accessible and practical for real-world applications.

Implementation Details

Ray ReLU Activation: To incorporate the bias provided by the ray direction, the authors introduce a Ray ReLU activation function. This activation effectively halves the output space of possible normals, making the model more efficient and accurate.
Relative Rotation Estimation: By focusing on relative rotations, the model can better capture local surface structures. This approach leverages the fact that nearby pixels are likely to have similar or related normal vectors.

Motivation

Surface normal estimation is a fundamental task in computer vision with applications in various domains:

Image Generation: Enhancing realism and detail in generated images.
Object Grasping: Improving robotic manipulation by understanding object surfaces.
Multi-task Learning: Integrating surface normals into multi-modal learning systems.
Depth Estimation: Enhancing the accuracy of depth maps.
Simultaneous Localization and Mapping (SLAM): Improving the robustness of SLAM algorithms.
Human Body Shape Estimation: Accurately modeling human body shapes for applications like AR/VR.
CAD Model Alignment: Aligning 3D models with real-world scenes.

Despite its importance, there has been limited discussion on the right inductive biases needed for surface normal estimation. This paper addresses that gap by proposing practical and efficient solutions.

Demonstration

The researchers have provided a video demonstration of their model's performance on input videos from the DAVIS dataset. The predictions are made per-frame, and the results can be viewed in 4K resolution. Here is the link to the demo:

[CVPR 2024] Rethinking Inductive Biases for Surface Normal Estimation - YouTube

Conclusion

By introducing per-pixel ray direction and relative rotation estimation as key inductive biases, this work significantly advances the field of surface normal estimation. The efficiency gains make it a promising approach for both research and practical applications.