
Share
FACTS Grounding aims to tackle the pesky problem of AI misinformation by assessing whether large language models can reliably provide accurate, context-relevant answers-a crucial step towards more trustworthy AI.
Large language models (LLMs) have revolutionized how we interact with information, but their Achilles' heel remains factual accuracy. These models can sometimes "hallucinate" or generate false information, especially when dealing with complex inputs. This issue not only erodes trust in LLMs but also limits their practical applications in real-world scenarios.
To address this challenge, the FACTS team at DeepMind has introduced FACTS Grounding, a comprehensive benchmark designed to evaluate how well LLMs can generate factually accurate and contextually grounded responses. The benchmark is complemented by an online leaderboard on Kaggle, providing a transparent way to track progress in the field.
Dataset Size and Structure:
Diverse Input Types:

The initial leaderboard has been populated with scores from leading LLMs. These scores are the average performance across both the public and private sets, providing a comprehensive view of each model's capabilities.
The FACTS team encourages researchers and practitioners to use the public set for evaluating their LLMs and to contribute to the leaderboard. By working together, the community can drive significant progress in improving the factuality and grounding of large language models.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
7 January 2025
133 articles
Related Articles
Related Articles
More Stories