
Share
Starling-7B harnesses Reinforcement Learning from AI Feedback to enhance helpfulness and safety, outperforming peers in GPT-4 evaluations while using a unique dataset that pushes the boundaries of LLM capabilities.
Starling-LM-7B, a new open-source large language model (LLM), is making waves by leveraging Reinforcement Learning from AI Feedback (RLAIF). Developed by researchers at UC Berkeley, this 7-billion parameter model uses a novel GPT-4 labeled ranking dataset called Nectar and an advanced reward training pipeline. Starling-7B-alpha scores 8.09 on the MT Bench evaluation with GPT-4 as the judge, outperforming all models to date except for OpenAI’s GPT-4 and GPT-4 Turbo.
Reinforcement Learning from AI Feedback (RLAIF)
Nectar Dataset
Reward Model and Policy Tuning Pipeline
Model Architecture:
Training Process:

Enhanced Model Performance:
High-Quality Dataset:
Open Source Contributions:
Starling-7B represents a significant step forward in the development of large language models. By leveraging RLAIF and a high-quality ranking dataset, it achieves state-of-the-art performance on MT Bench while being open-source and accessible to the community. This work not only showcases the potential of AI-generated feedback but also provides valuable resources for future research and development.
Tags
Original Sources
↗ https://starling.cs.berkeley.edu/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
28 November 2023
88 articles
Related Articles
Related Articles
More Stories