Understanding Why Language Models Hallucinate and How to Mitigate Them

Models & Research

The Engineer

8 Sept 2025 · 3 min read

Our research reveals how and why AI language models generate convincing yet inaccurate information, challenging conventional views on training and evaluation practices to reduce these misleading outputs.

At OpenAI, we're continuously working on making AI systems more reliable and useful. One of the most persistent challenges in this effort is the issue of hallucinations-instances where a model confidently generates answers that are simply not true. Our new research paper delves into why language models hallucinate and suggests that standard training and evaluation methods may be part of the problem.

What Are Hallucinations?

Hallucinations in language models are plausible but false statements. These can manifest in surprising ways, even for straightforward questions. For example, when we asked a widely used chatbot for the title of Adam Tauman Kalai’s (an author of this paper) PhD dissertation, it confidently produced three different answers-none of which were correct. Similarly, when queried about his birthday, the model provided three incorrect dates.

The Root Cause: Evaluation Methods

Hallucinations persist partly because current evaluation methods set the wrong incentives. While evaluations themselves do not directly cause hallucinations, they often measure model performance in a way that encourages guessing over acknowledging uncertainty.

Consider this analogy: Imagine taking a multiple-choice test. If you don’t know the answer but take a wild guess, you might get lucky and be right. Leaving it blank guarantees a zero. Similarly, when models are graded solely on accuracy-how many questions they get exactly right-they are incentivized to guess rather than say “I don’t know.”

For instance, if a language model is asked for someone’s birthday but doesn’t know the answer, guessing “September 10” gives it a 1-in-365 chance of being correct. Saying “I don’t know,” however, guarantees zero points. Over thousands of test questions, a model that guesses will generally score higher on accuracy metrics than one that admits uncertainty.

Categories of Responses

When evaluating models, we can consider three categories of responses:

Accurate Responses: The model provides the correct answer.
Errors: The model gives an incorrect but confident answer (hallucination).
Abstentions: The model does not hazard a guess and instead expresses uncertainty.

Mitigation Strategies

To reduce hallucinations, we need to change how models are trained and evaluated. Here are some strategies:

Rewarding Humility: Modify evaluation metrics to reward models that express uncertainty when they don’t know the answer. This can be done by assigning partial credit for abstentions or penalizing confident but incorrect answers more heavily.
Training with Uncertainty: During training, include data points where the correct response is “I don’t know” and ensure the model learns to recognize these cases.
Human-in-the-Loop Feedback: Incorporate human feedback during both training and evaluation to help models learn when they should be uncertain.

Progress in GPT-5

While hallucinations remain a fundamental challenge for all large language models, we have made significant progress with GPT-5. This model has significantly fewer hallucinations, especially when reasoning, but they still occur. Our ongoing research aims to further reduce these instances and make AI systems more reliable.

Conclusion

Hallucinations in language models are a complex issue that requires careful consideration of training and evaluation methods. By incentivizing models to be humble and uncertain when appropriate, we can move closer to building more robust and trustworthy AI systems.