Bugbot: A Code Review Agent That Evolves Through AI-Driven Metrics and Experiments

Tools & Engineering

The Engineer

19 Jan 2026 · 3 min read

Bugbot starts as a code-review tool but evolves into an AI-driven optimizer, refining its bug-catching abilities through continuous experiments and metrics that prioritize efficiency and security.

As coding agents have become more sophisticated, we've found ourselves spending an increasing amount of time on code review. To address this, the team at Cursor built Bugbot, a code review agent designed to catch logic bugs, performance issues, and security vulnerabilities before they hit production. By last summer, it was performing so well that we decided to release it to users.

The Technical Evolution of Bugbot

The development of Bugbot started with qualitative assessments but quickly transitioned to a more systematic approach using a custom AI-driven metric to iteratively improve quality. Since its launch, we've conducted 40 major experiments, which have significantly boosted Bugbot's resolution rate from 52% to over 70%. Additionally, the average number of bugs flagged per run has increased from 0.4 to 0.7, meaning that the number of resolved bugs per pull request (PR) has more than doubled, from about 0.2 to around 0.5.

Early Challenges and Initial Improvements

When we first attempted to build a code review agent, the models were not advanced enough to provide useful reviews. However, as baseline models improved, we identified several strategies to enhance bug reporting quality:

Model Configurations: We experimented with different configurations of models, pipelines, filters, and context management techniques.
Internal Polling: Engineers provided feedback on various configurations, helping us identify those with fewer false positives.

One of the most effective early improvements was running multiple bug-finding passes in parallel and combining their results using majority voting. Each pass received a different ordering of the diff, which encouraged the model to consider different lines of reasoning. When several passes independently flagged the same issue, it served as a strong signal that the bug was real.

The Final Workflow

After several weeks of internal qualitative iterations, we developed a version of Bugbot that outperformed other code review tools on the market. The final workflow includes:

Parallel Passes: Run eight parallel passes with randomized diff order.
Bug Bucketing: Combine similar bugs into one bucket.
Majority Voting: Filter out bugs found during only one pass.
Clear Descriptions: Merge each bucket into a single, clear description.
Category Filtering: Remove unwanted categories like compiler warnings or documentation errors.
False Positive Validation: Run results through a validator model to catch false positives.
Deduplication: Dedupe against bugs posted from previous runs.

From Prototype to Production

To make Bugbot practical for real-world use, we had to develop several foundational systems alongside the core review logic:

Repository Access: We rebuilt our Git integration in R to ensure fast and reliable access to repositories.
Scalability: Optimized the system to handle high volumes of PRs efficiently.
User Interface: Created a user-friendly interface for seamless integration into existing workflows.

Version Evolution

We released Bugbot's first version in July 2025 and the eleventh version in January 2026. Each new version improved bug detection without a corresponding rise in false positives, making it increasingly reliable and effective.

Conclusion

Bugbot has evolved from an initial prototype to a robust code review tool that significantly reduces the number of bugs reaching production. By leveraging AI-driven metrics and continuous experimentation, we've created a system that not only catches more bugs but does so with high accuracy and minimal false positives. As development practices continue to evolve, Bugbot will remain a critical tool for ensuring code quality.