DeepSeek-Prover-V1.5: Enhancing Theorem Proving with Proof Assistant Feedback and Monte-Carlo Tree Search

Models & Research

The Engineer

19 Aug 2024 · 3 min read

Upgraded with advanced training techniques and Monte-Carlo Tree Search, DeepSeek-Prover-V1.5 excels in generating diverse proof paths, significantly outperforming its predecessor on educational benchmarks.

DeepSeek-Prover-V1.5 is the latest iteration in a series of models designed to assist in theorem proving within the Lean 4 proof assistant. This new version builds on its predecessor, DeepSeek-Prover-V1, by optimizing both training and inference processes and introducing a novel approach to generating diverse proof paths. The result? Significant improvements in performance on high school and undergraduate-level benchmarks.

What Changed Technically?

DeepSeek-Prover-V1.5 introduces several key changes that enhance its theorem-proving capabilities:

Enhanced Training Data: The model is pre-trained on DeepSeekMath-Base, a dataset specialized in formal mathematical languages. This foundational training is followed by supervised fine-tuning using an enhanced formal theorem proving dataset derived from DeepSeek-Prover-V1.
Reinforcement Learning with Proof Assistant Feedback (RLPAF): A new reinforcement learning technique that leverages feedback from the Lean 4 proof assistant to refine the model's performance. This approach helps the model learn from its mistakes and improve over time.
RMaxTS for Diverse Proof Paths: Instead of generating a single proof path in one pass, DeepSeek-Prover-V1.5 uses RMaxTS, a variant of Monte-Carlo tree search (MCTS). RMaxTS employs an intrinsic-reward-driven exploration strategy to generate multiple diverse proof paths, increasing the likelihood of finding a valid proof.

Why It Matters to Practitioners

For researchers and practitioners in formal theorem proving, these changes offer several practical benefits:

Improved Performance: DeepSeek-Prover-V1.5 outperforms its predecessor on key benchmarks:
- miniF2F (High School Level): Achieves a success rate of 63.5%.
- ProofNet (Undergraduate Level): Achieves a success rate of 25.3%.
Enhanced Robustness: The use of RLPAF and RMaxTS makes the model more robust in handling complex proofs, reducing the likelihood of getting stuck on difficult problems.
Versatility: By generating multiple proof paths, the model can adapt to different problem structures and provide a richer set of solutions.

Technical Details

Training Process

Pre-training: The model is pre-trained on DeepSeekMath-Base, a large dataset of formal mathematical expressions.
Fine-tuning: Supervised fine-tuning is performed using an enhanced dataset derived from DeepSeek-Prover-V1. This dataset includes a diverse set of theorems and proofs, ensuring that the model can handle a wide range of problems.

Reinforcement Learning

RLPAF Mechanism: The model receives feedback from the Lean 4 proof assistant during training. This feedback is used to adjust the model's parameters, encouraging it to generate more valid proofs.
Reward System: The intrinsic-reward-driven exploration strategy in RMaxTS helps the model explore different branches of the search tree, increasing the diversity of generated proofs.

Inference Process

RMaxTS Algorithm:
- Initialization: Start with an initial state representing the theorem to be proved.
- Selection: Use a selection policy to choose the next action (proof step) based on the current state.
- Expansion: Expand the search tree by adding new states that result from applying the chosen action.
- Simulation: Simulate the proof process from the expanded state, using the intrinsic reward system to guide exploration.
- Backpropagation: Update the values of nodes in the search tree based on the outcome of the simulation.

Benchmarks and Performance

DeepSeek-Prover-V1.5 has been evaluated on two key benchmarks:

miniF2F (High School Level): A benchmark consisting of high school-level mathematical problems. DeepSeek-Prover-V1.5 achieves a success rate of 63.5%, significantly outperforming its predecessor.
ProofNet (Undergraduate Level): A more challenging benchmark with undergraduate-level theorems. The model achieves a success rate of 25.3%, setting a new state-of-the-art result.

Conclusion

DeepSeek-Prover-V1.5 represents a significant step forward in automated theorem proving, thanks to its