
Share
Researchers at SkyPilot show coding agents can boost their optimization skills by first delving into existing literature and competing projects, leading to substantial improvements within hours.
In a recent experiment, researchers at SkyPilot demonstrated that coding agents can produce more effective optimizations when they first conduct a literature search and study competing projects. By integrating this research phase into the autoresearch loop (using tools like autoresearch and pi-autoresearch), they were able to achieve significant performance gains in just a few hours.
Code-only context is effective for many tasks, especially when the problem domain is well-understood and the codebase is relatively simple. However, as projects grow more complex, the limitations of this approach become apparent. For instance, optimizing a large-scale machine learning model like llama.cpp requires a deep understanding of both the underlying algorithms and the latest research in the field.
When dealing with advanced models and optimizations, code-only context often falls short:
To address these limitations, SkyPilot added a literature search phase to the autoresearch loop. This involves:
The literature search revealed several promising optimization techniques, including:
One of the key insights was the importance of memory optimizations. Traditional approaches often focus on compute efficiency, but modern models can be heavily bottlenecked by memory access patterns.

The experiment, conducted using 4 cloud VMs over approximately 3 hours, produced the following results:
These optimizations were applied to the TinyLlama 1.1B model, demonstrating significant performance improvements.
Not all experiments led to successful outcomes:
Integrating a research phase into coding agents' workflows can lead to more effective and innovative optimizations. By leveraging the latest academic and industry insights, these agents can produce results that are both faster and more efficient.
The full setup is available for any project with a benchmark and test suite. Whether you're working on machine learning models, web applications, or other software projects, this approach can help you achieve better performance.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
10 April 2026
133 articles
Related Articles
Related Articles
More Stories