
Share
Our research reveals how and why AI language models generate convincing yet inaccurate information, challenging conventional views on training and evaluation practices to reduce these misleading outputs.
At OpenAI, we're continuously working on making AI systems more reliable and useful. One of the most persistent challenges in this effort is the issue of hallucinations-instances where a model confidently generates answers that are simply not true. Our new research paper delves into why language models hallucinate and suggests that standard training and evaluation methods may be part of the problem.
Hallucinations in language models are plausible but false statements. These can manifest in surprising ways, even for straightforward questions. For example, when we asked a widely used chatbot for the title of Adam Tauman Kalai’s (an author of this paper) PhD dissertation, it confidently produced three different answers-none of which were correct. Similarly, when queried about his birthday, the model provided three incorrect dates.
Hallucinations persist partly because current evaluation methods set the wrong incentives. While evaluations themselves do not directly cause hallucinations, they often measure model performance in a way that encourages guessing over acknowledging uncertainty.
Consider this analogy: Imagine taking a multiple-choice test. If you don’t know the answer but take a wild guess, you might get lucky and be right. Leaving it blank guarantees a zero. Similarly, when models are graded solely on accuracy-how many questions they get exactly right-they are incentivized to guess rather than say “I don’t know.”
For instance, if a language model is asked for someone’s birthday but doesn’t know the answer, guessing “September 10” gives it a 1-in-365 chance of being correct. Saying “I don’t know,” however, guarantees zero points. Over thousands of test questions, a model that guesses will generally score higher on accuracy metrics than one that admits uncertainty.

When evaluating models, we can consider three categories of responses:
To reduce hallucinations, we need to change how models are trained and evaluated. Here are some strategies:
While hallucinations remain a fundamental challenge for all large language models, we have made significant progress with GPT-5. This model has significantly fewer hallucinations, especially when reasoning, but they still occur. Our ongoing research aims to further reduce these instances and make AI systems more reliable.
Hallucinations in language models are a complex issue that requires careful consideration of training and evaluation methods. By incentivizing models to be humble and uncertain when appropriate, we can move closer to building more robust and trustworthy AI systems.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
8 September 2025
88 articles
Related Articles
Related Articles
More Stories