Semantic Search Boosts Coding Agent Performance by 12.5% on Average

Tools & Engineering

The Engineer

6 Nov 2025 · 3 min read

Semantic search enables coding agents to navigate complex codebases more intuitively, answering queries with pinpoint accuracy and boosting efficiency by an average of 12.5%.

When coding agents receive a prompt, they need to understand the codebase thoroughly to provide accurate responses. This involves reading files and searching for relevant information. One tool that significantly enhances this process is semantic search, which retrieves code segments based on natural language queries like “where do we handle authentication?” in addition to traditional regex-based searches provided by tools like grep.

Cursor’s coding agents leverage semantic search to improve their performance over large codebases. Here’s a breakdown of how they achieve this:

Technical Implementation

Custom Embedding Model: Cursor trained its own embedding model specifically for codebase navigation. This model captures the semantic meaning of code segments, making it easier to match natural language queries.
Indexing Pipelines: Efficient indexing pipelines ensure fast retrieval of relevant code snippets. These pipelines are optimized for large-scale codebases.

Performance Improvements

Using semantic search, Cursor’s agents exhibit several notable improvements:

Higher Accuracy: On average, agents achieve 12.5% higher accuracy in answering questions (ranging from 6.5% to 23.5% depending on the model).
Better Code Retention: The code changes produced by agents are more likely to be retained in user codebases.
Fewer Iterations: Users require fewer iterations to arrive at a correct solution, reducing development time and effort.
Consistent Gains Across Models: All tested models, including frontier coding models, show improved accuracy with semantic search.

Offline Evaluations

Cursor maintains an evaluation dataset called Cursor Context Bench, which focuses on retrieving information in codebases with known correct answers. This dataset is used to evaluate all of the most-used models in Cursor, including their custom model, Composer.

Performance Comparison: The evaluations compare performance with and without semantic search. In every configuration, semantic search significantly improves outcomes.

Online A/B Tests

To understand the impact on end-user experience, Cursor conducted an A/B test where both groups used the same model, but one group's agent had access to semantic search while the other relied solely on traditional search tools like grep. The results were telling:

Code Retention: Code written by agents with access to semantic search is more likely to remain in user codebases. There was a 0.3% increase in code retention, which jumps to 2.6% for large codebases with 1,000 files or more.
Dissatisfied User Requests: Agents without semantic search required more follow-ups and corrections. The test showed a 2.2% increase in dissatisfied follow-up user requests when semantic search was not available.

The effect size is lower in the A/B tests because they cover all agent queries, many of which do not require search.

Custom Retrieval Models

A key factor enabling these results is Cursor’s custom embedding model. This model is trained on agent sessions, where each session involves multiple searches and file openings before finding the right code. By analyzing these traces, the model learns to identify relevant code segments more effectively:

Training Data: Agent sessions provide rich training data that captures the context in which searches are performed.
Retrospective Analysis: Post-session analysis helps refine the model by identifying patterns in successful and unsuccessful searches.

Conclusion

Semantic search is a powerful tool for enhancing coding agent performance. By integrating custom embedding models and efficient indexing pipelines, Cursor’s agents can navigate large codebases more accurately and efficiently, leading to better code retention and fewer user follow-ups. This approach not only improves the developer experience but also accelerates development cycles.