Cursor's Fast Apply Model: Revolutionizing Code Edits with Speculative Decoding

Models & Research

The Engineer

17 May 2024 · 3 min read

Cursor's fast apply model tackles coding headaches with a two-stage approach that plans and applies changes, reducing latency and boosting accuracy for seamless code editing.

Frontier models like GPT-4 and GPT-4o have made significant strides in natural language processing, but they still struggle with large-scale code edits. These issues include laziness, inaccuracy, and high latency, which can break a programmer's flow. This is particularly problematic for coding agents, where even small, isolated edits can lead to bugs or infinite loops.

To address these challenges, Cursor has developed a specialized model called fast apply. This model excels at the full-file code edit task by breaking down difficult edits into two stages: planning and applying.

Planning and Applying

In Cursor, the planning phase involves using a powerful frontier model in a chat interface to outline the necessary changes. The applying phase, where the actual changes are made to the current file, should be straightforward and instantaneous. This separation ensures that complex edits can be executed efficiently without disrupting the programmer's workflow.

Fast Apply Model Performance

Our fast-apply model outperforms GPT-4 and GPT-4o in both accuracy and latency. We achieve speeds of approximately 1000 tokens (around 3500 characters per second) on our 70B model using a speculative-decoding variant tailored for code edits, known as speculative edits. This results in a ~13x speedup over vanilla inference using Llama-3-70B and a ~9x speedup over our previous GPT-4 speculative edits deployment.

Training and Evaluation

To train the fast apply model, we focused on full-file code edit tasks. We constructed an evaluation set of approximately 450 full-file edits, each under 400 lines. The performance of several prompted models was measured using Claude-3 Opus as a grader.

Grading Guidelines

Claude-3 Opus demonstrated more agreement with our ratings compared to GPT-4-Turbo and GPT-4o across tens of curated examples. This alignment is likely due to the post-training process, where Claude models are better at generating large amounts of code in assistant messages, while GPT-4 tends to omit code or indicate missing regions with ... or comments.

Surprising Results

Surprisingly, Claude-3 Sonnet outperformed GPT-4-Turbo, and GPT-4o performed similarly to GPT-4-Turbo. We hypothesize that Claude's superior performance is an artifact of its post-training process, which allows it to handle large code outputs more effectively.

Speed Measurements

We measured speed by comparing the time taken for our fast apply model to generate a fully rewritten file conditioned on the current file, conversation history, and current code block. The results clearly show that speculative edits provide significant speedups over traditional inference methods.

Why Rewrite the File Instead of Using Diffs?

Rewriting the entire file ensures consistency and completeness, avoiding issues like partial updates or missed changes. This approach is particularly useful for large-scale edits where diff-based methods might introduce errors or inconsistencies.

Conclusion

Cursor's fast apply model represents a significant advancement in code editing by combining efficient planning with instantaneous application. By leveraging speculative decoding, we achieve unprecedented speeds and accuracy, making large-scale code edits more reliable and seamless. This innovation has the potential to transform how developers work, keeping them in flow and reducing frustration.