
Share
Researchers unveil GEPA, a novel genetic-pareto prompt optimizer that uses natural language to guide large language models more efficiently than traditional reinforcement learning methods, slashing the need for extensive computational resources.
Large language models (LLMs) are increasingly being fine-tuned for downstream tasks using reinforcement learning (RL) methods, such as Group Relative Policy Optimization (GRPO). These methods often require thousands of rollouts to learn new tasks, which can be computationally expensive and time-consuming. However, a recent study by researchers from UC Berkeley, Stanford University, and other institutions suggests that leveraging the interpretability of natural language can provide a richer learning medium for LLMs.
The team introduces GEPA (Genetic-Pareto), a novel prompt optimizer that uses natural language reflection to learn high-level rules from trial and error. Unlike traditional RL methods that rely on sparse, scalar rewards, GEPA samples system-level trajectories (e.g., reasoning steps, tool calls, and tool outputs) and reflects on them in natural language to diagnose issues, propose and test prompt updates, and combine complementary lessons from the Pareto frontier of its own attempts.
The researchers evaluated GEPA across four tasks and compared it against GRPO and MIPROv2, a state-of-the-art prompt optimizer. The results are compelling:

Figure 1 illustrates the learning behavior of GEPA compared to MIPROv2 and GRPO. As more rollouts are sampled, GEPA learns much more quickly than both GRPO and MIPROv2, achieving a substantial performance gap in the final score. The test-set star markers demonstrate this performance advantage on a held-out set of questions.
The study raises important questions about how LLMs should be optimized for downstream tasks. Traditional RL methods like GRPO rely on scalar rewards to estimate gradients for policy improvement, but GEPA's use of natural language reflection provides a more interpretable and efficient alternative. This approach not only reduces the number of required rollouts but also enhances the model's ability to learn from its own attempts.
GEPA represents a significant step forward in LLM prompt optimization by leveraging the interpretability of natural language. Its ability to turn even a few rollouts into substantial quality gains makes it a promising tool for researchers and practitioners looking to fine-tune LLMs for various applications. As the field continues to evolve, methods like GEPA could play a crucial role in making LLMs more efficient and effective.
Tags
Original Sources
↗ https://arxiv.org/pdf/2507.19457v1
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
31 July 2025
133 articles
Related Articles
Related Articles
More Stories