Prover-Verifier Games Enhance Legibility of Language Model Outputs

Models & Research

The Engineer

18 Jul 2024 · 3 min read

Researchers at OpenAI have developed Prover-Verifier Games, a technique that prioritizes clarity in AI-generated text, ensuring it is easily verifiable by both humans and other models, enhancing its practicality and reliability.

Prover-Verifier Games Enhance Legibility of Language Model Outputs

July 17, 2024

At OpenAI, we've been exploring ways to make language models not just more accurate but also more understandable. Our latest research introduces a novel approach called Prover-Verifier Games, which significantly improves the legibility of AI-generated text. This method ensures that the solutions produced by advanced models are easier for both humans and weaker models to verify, making them more practical for real-world applications.

The Problem: Optimizing for Correctness vs. Legibility

When we optimize language models solely for correctness, the resulting outputs can become complex and difficult to understand. For instance, in tasks like solving math problems, highly optimized solutions often lead to increased errors when evaluated by human assessors with limited time. Our experiments showed that human evaluators made nearly twice as many errors when assessing these highly optimized solutions compared to less optimized ones. This highlights the critical balance between accuracy and clarity.

The Solution: Prover-Verifier Games

To address this, we developed a training framework called Prover-Verifier Games. In this setup, two models are involved:

Prover: Generates a solution.
Verifier: Checks the solution for accuracy.

The goal is to train the prover to produce solutions that are not only correct but also easy for the verifier to validate. This dual-objective optimization ensures that the text remains legible and verifiable, which in turn benefits human evaluators.

Implementation Details

Training Procedure:
- We used a chain-of-thought (CoT) approach for grade-school math problems.
- The prover generates step-by-step solutions, while the verifier checks each step.
- Both models are trained together, with the prover learning to produce verifiable steps and the verifier improving its accuracy in validation.
Performance Metrics:
- We compared the performance of models optimized for correctness alone versus those trained using Prover-Verifier Games.
- Models trained with this method achieved half the performance boost seen when optimizing only for answer correctness, while maintaining high legibility.
Human Evaluation:
- Human evaluators were asked to assess solutions from both types of models.
- Evaluators made fewer errors and found the solutions generated by Prover-Verifier Games more understandable.

Example: Solving a Math Problem

Let's consider a simple math problem to illustrate the difference:

Question: Shawna’s father is five times as old as Shawna. Shawna is currently three times as old as Aliya. If Aliya is 3 years old, how old is Shawna’s father?

| Process | Answer | | --- | --- | | Initial Model (Optimized for Correctness) | Shawna = 3 * 3 = 9; Father = 5 * 9 = 45 | | Prover-Verifier Game Model | Step 1: Aliya is 3 years old. <br> Step 2: Shawna is 3 times as old as Aliya, so Shawna = 3 * 3 = 9 years old. <br> Step 3: Shawna’s father is 5 times as old as Shawna, so Father = 5 * 9 = 45 years old. |

The Prover-Verifier Game model provides a more detailed and clear step-by-step solution, making it easier for both humans and weaker models to follow and verify.

Conclusion

Improving the legibility of language model outputs is crucial for enhancing their practical utility. By training models using Prover-Verifier Games, we can achieve a balance between accuracy and clarity, leading to more effective and trustworthy AI applications. This approach has significant implications for fields where precise and clear communication is essential, such as education, healthcare, and legal documentation.