
Share
Researchers at EPFL introduce a novel training method for machine learning models, arguing that constant learning rates paired with cooldown periods can match the performance of complex cosine schedules while reducing computational costs.
In a recent paper titled "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations," researchers from EPFL (Ecole Polytechnique Fédérale de Lausanne) challenge the conventional reliance on cosine learning rate schedules in machine learning. The authors, Alexander Hägele, Elie Bakouch, Atli Kosson, Loubna Ben Allal, Leandro Von Werra, and Martin Jaggi, argue that this approach unnecessarily complicates scaling experiments and increases compute costs. Instead, they propose a simpler method using constant learning rates with cooldowns, which they show to be equally effective and more efficient.
Constant Learning Rate with Cooldowns:
Stochastic Weight Averaging (SWA):
Reduced Compute Costs:
Predictable Scaling:

Performance Parity:
Improved Efficiency:
Training Setup:
Benchmarks:
The paper "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations" offers a compelling alternative to the traditional cosine schedule for machine learning practitioners. By simplifying the training process and reducing compute costs, this approach makes it easier to conduct scaling experiments and optimize models. The authors' findings are supported by robust benchmarks and practical implementation details, making it a valuable resource for anyone looking to improve their training efficiency.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
31 May 2024
88 articles
Related Articles
Related Articles
More Stories