
Share
Bugbot starts as a code-review tool but evolves into an AI-driven optimizer, refining its bug-catching abilities through continuous experiments and metrics that prioritize efficiency and security.
As coding agents have become more sophisticated, we've found ourselves spending an increasing amount of time on code review. To address this, the team at Cursor built Bugbot, a code review agent designed to catch logic bugs, performance issues, and security vulnerabilities before they hit production. By last summer, it was performing so well that we decided to release it to users.
The development of Bugbot started with qualitative assessments but quickly transitioned to a more systematic approach using a custom AI-driven metric to iteratively improve quality. Since its launch, we've conducted 40 major experiments, which have significantly boosted Bugbot's resolution rate from 52% to over 70%. Additionally, the average number of bugs flagged per run has increased from 0.4 to 0.7, meaning that the number of resolved bugs per pull request (PR) has more than doubled, from about 0.2 to around 0.5.
When we first attempted to build a code review agent, the models were not advanced enough to provide useful reviews. However, as baseline models improved, we identified several strategies to enhance bug reporting quality:
One of the most effective early improvements was running multiple bug-finding passes in parallel and combining their results using majority voting. Each pass received a different ordering of the diff, which encouraged the model to consider different lines of reasoning. When several passes independently flagged the same issue, it served as a strong signal that the bug was real.
After several weeks of internal qualitative iterations, we developed a version of Bugbot that outperformed other code review tools on the market. The final workflow includes:

To make Bugbot practical for real-world use, we had to develop several foundational systems alongside the core review logic:
We released Bugbot's first version in July 2025 and the eleventh version in January 2026. Each new version improved bug detection without a corresponding rise in false positives, making it increasingly reliable and effective.
Bugbot has evolved from an initial prototype to a robust code review tool that significantly reduces the number of bugs reaching production. By leveraging AI-driven metrics and continuous experimentation, we've created a system that not only catches more bugs but does so with high accuracy and minimal false positives. As development practices continue to evolve, Bugbot will remain a critical tool for ensuring code quality.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
19 January 2026
88 articles
Related Articles
Related Articles
More Stories