
Share
Researchers at EPFL upend conventional wisdom by showing long, straightforward instructions can surpass complex fine-tuning techniques for large language models, questioning the necessity of high-quality data in achieving effective model alignment.
In a surprising turn of events, researchers from EPFL have demonstrated that simply selecting the longest instructions from standard datasets can outperform state-of-the-art methods for instruction fine-tuning in large language models (LLMs). The paper, titled "Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning," was accepted at ICML 2024 and challenges the prevailing consensus that high-quality data is essential for effective LLM alignment.
The key insight from this research is that longer instructions, which intuitively contain more learnable information and are harder to overfit, can consistently outperform sophisticated methods like LIMA (NeurIPS 2023) and AlpaGasus (ICLR 2024). These methods typically rely on manual curation or using GPT-3.5-Turbo as a quality scorer to select high-quality examples.

To ensure that the enhanced performance was not merely due to GPT-4's preference for longer responses, the researchers conducted a thorough analysis. They found that:
This research suggests that fine-tuning on the longest responses should be considered a default baseline for any work on instruction fine-tuning. The simplicity and effectiveness of this approach make it accessible and practical for researchers and practitioners alike.
The findings from this study challenge the notion that high-quality data must be meticulously curated or scored to achieve effective LLM alignment. By leveraging the inherent richness of longer instructions, researchers and practitioners can achieve robust results with a straightforward and efficient approach. The code for this research is available at this GitHub repository.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
19 February 2024
88 articles
Related Articles
Related Articles
More Stories