Fine-Tuning LLMs for Factuality Without Human Labels Improves Accuracy by 58% and 40%

Models & Research

The Engineer

16 Nov 2023 · 3 min read

Researchers have developed a novel method to enhance the accuracy of large language models by reducing factual errors without relying on human-labeled data, achieving impressive gains with minimal oversight.

The widespread adoption of large pre-trained language models (LLMs) has brought about a new era in natural language processing. These models can generate fluent and creative text, often rivaling human output. However, one significant drawback is their tendency to produce factually inaccurate claims, known as "hallucinations." This issue can lead to the spread of misinformation and the perpetuation of misconceptions. Manual fact-checking, while effective, is time-consuming and costly.

In a recent paper titled "Fine-tuning Language Models for Factuality," researchers Katherine Tian, Eric Mitchell, Huaxiu Yao, Christopher D. Manning, and Chelsea Finn propose a method to fine-tune LLMs to be more factual without the need for human labels. Their approach leverages two key innovations in NLP: methods for judging factuality and direct preference optimization (DPO).

Key Technical Innovations

Factuality Judgment: Recent works have introduced techniques to assess the factuality of open-ended text by measuring consistency with an external knowledge base or using large model confidence scores.
Direct Preference Optimization (DPO): This algorithm allows for straightforward fine-tuning of language models on objectives other than supervised imitation, using a preference ranking over possible model responses.

Methodology

The researchers used the following steps to improve the factuality of LLMs:

Automated Factuality Preference Rankings: They generated these rankings through two methods:
- Retrieval Systems: Using existing retrieval systems to compare generated text with known facts.
- Retrieval-Free Approach: A novel method that does not rely on external knowledge bases, making it more scalable and flexible.
Fine-Tuning with DPO: The models were fine-tuned using the preference rankings as a guide, optimizing for factuality rather than just fluency or coherence.

Results

The researchers tested their approach on Llama-2, a popular LLM, at the 7B parameter scale. They compared the performance of their fine-tuned model against two baselines: Llama-2-chat and reinforcement learning with human feedback (RLHF).

Biographies: The fine-tuned model showed a 58% reduction in factual error rate when generating biographies.
Medical Questions: For answering medical questions, the reduction in factual error rate was 40%.

These improvements highlight the effectiveness of their approach in reducing hallucinations and enhancing factuality without the need for expensive human labels.

Implementation Details

Model Architecture: The researchers used Llama-2 as the base model, which is known for its strong performance across various NLP tasks.
Training Data: They utilized a combination of synthetic data generated by their methods and existing datasets to create preference rankings.
Evaluation Metrics: Factuality was measured using the percentage of generated claims that were correct, validated against ground truth or external knowledge bases.

Implications for Practitioners

For practitioners working with LLMs, this research offers several practical benefits:

Reduced Manual Effort: By automating the fact-checking process, teams can focus on higher-value tasks.
Improved User Trust: Enhancing factuality reduces the risk of spreading misinformation, building user trust and credibility.
Scalability: The retrieval-free approach makes it easier to scale factuality improvements across different domains and languages.

Conclusion

The work by Tian et al. demonstrates that fine-tuning LLMs for factuality without human labels is not only feasible but also highly effective. By leveraging automated methods and direct preference optimization, they achieved significant reductions in factual error rates. This approach has the potential to revolutionize how we use and trust large language models in various applications.