DeepMesh: Enhancing 3D Mesh Generation with Reinforcement Learning and Auto-Regressive Transformers

Models & Research

The Engineer

21 Mar 2025 · 3 min read

DeepMesh leverages reinforcement learning and auto-regressive transformers to generate high-quality 3D meshes from point clouds and images, outpacing current techniques in both precision and detail.

DeepMesh, a novel framework developed by researchers from Tsinghua University and ShengShu, aims to revolutionize the way we generate high-quality 3D meshes. This approach combines auto-regressive transformers with reinforcement learning (RL) to create intricate and precise meshes conditioned on point clouds and images. The result is a significant improvement in both the visual appeal and geometric accuracy of generated meshes compared to existing state-of-the-art methods.

Technical Overview

Key Innovations

Efficient Pre-training Strategy: DeepMesh introduces an improved tokenization algorithm that generates discrete mesh tokens, which are used for pre-training the model. This strategy also includes enhancements in data curation and processing.
Reinforcement Learning (RL) Integration: By incorporating RL, specifically Direct Preference Optimization (DPO), DeepMesh aligns its outputs with human preferences, ensuring both visual appeal and geometric accuracy.

Methodology

Auto-Regressive Transformer

DeepMesh is built on an auto-regressive transformer architecture, which includes both self-attention and cross-attention layers. The model generates meshes by predicting discrete vertex tokens in a sequential manner. Here are the key steps:

Tokenization Algorithm: The improved tokenization algorithm converts 3D meshes into sequences of discrete tokens. These tokens are then used to pre-train the transformer model.
Data Curation and Processing: Enhanced data curation techniques ensure that the training data is clean and diverse, leading to better generalization.

Reinforcement Learning (RL)

To further refine the generated meshes, DeepMesh incorporates RL through Direct Preference Optimization (DPO). Here’s how it works:

Scoring Standard: A scoring standard is designed that combines 3D metrics with human evaluation. This standard helps in collecting preference pairs, which are used to train the model.
Preference Pairs: The team annotated 5,000 preference pairs, where each pair consists of two generated meshes. These pairs are then used to post-train the model using DPO, aligning its outputs with human preferences.

Implementation Details

Model Architecture

Self-Attention Layers: Capture dependencies within the sequence of mesh tokens.
Cross-Attention Layers: Allow the model to condition on additional inputs like point clouds and images, enhancing the context for mesh generation.

Training Process

Pre-training: The model is pre-trained on a large dataset of discrete mesh tokens generated by the improved tokenization algorithm.
Post-training with DPO: After pre-training, the model is fine-tuned using preference pairs annotated with the scoring standard. This step ensures that the generated meshes are not only geometrically accurate but also visually appealing.

Results and Benchmarks

DeepMesh outperforms existing state-of-the-art methods in both precision and quality. The key metrics used to evaluate the performance include:

Geometric Accuracy: Measured using 3D metrics such as Chamfer Distance and Normal Consistency.
Visual Appeal: Assessed through human evaluation, where DeepMesh-generated meshes are preferred over those from other methods.

Demo and Animation

A demo video showcases the mesh generation process, demonstrating how DeepMesh creates high-quality meshes conditioned on point clouds. The animation shows the sequential generation of all faces of the mesh, highlighting the model’s ability to capture intricate details and precise topology.

Conclusion

DeepMesh represents a significant advancement in 3D mesh generation by combining auto-regressive transformers with reinforcement learning. The integration of DPO ensures that the generated meshes are not only geometrically accurate but also visually appealing, making it a valuable tool for various 3D applications.