
Share
Cursor's fast apply model tackles coding headaches with a two-stage approach that plans and applies changes, reducing latency and boosting accuracy for seamless code editing.
Frontier models like GPT-4 and GPT-4o have made significant strides in natural language processing, but they still struggle with large-scale code edits. These issues include laziness, inaccuracy, and high latency, which can break a programmer's flow. This is particularly problematic for coding agents, where even small, isolated edits can lead to bugs or infinite loops.
To address these challenges, Cursor has developed a specialized model called fast apply. This model excels at the full-file code edit task by breaking down difficult edits into two stages: planning and applying.
In Cursor, the planning phase involves using a powerful frontier model in a chat interface to outline the necessary changes. The applying phase, where the actual changes are made to the current file, should be straightforward and instantaneous. This separation ensures that complex edits can be executed efficiently without disrupting the programmer's workflow.
Our fast-apply model outperforms GPT-4 and GPT-4o in both accuracy and latency. We achieve speeds of approximately 1000 tokens (around 3500 characters per second) on our 70B model using a speculative-decoding variant tailored for code edits, known as speculative edits. This results in a ~13x speedup over vanilla inference using Llama-3-70B and a ~9x speedup over our previous GPT-4 speculative edits deployment.
To train the fast apply model, we focused on full-file code edit tasks. We constructed an evaluation set of approximately 450 full-file edits, each under 400 lines. The performance of several prompted models was measured using Claude-3 Opus as a grader.

Claude-3 Opus demonstrated more agreement with our ratings compared to GPT-4-Turbo and GPT-4o across tens of curated examples. This alignment is likely due to the post-training process, where Claude models are better at generating large amounts of code in assistant messages, while GPT-4 tends to omit code or indicate missing regions with ... or comments.
Surprisingly, Claude-3 Sonnet outperformed GPT-4-Turbo, and GPT-4o performed similarly to GPT-4-Turbo. We hypothesize that Claude's superior performance is an artifact of its post-training process, which allows it to handle large code outputs more effectively.
We measured speed by comparing the time taken for our fast apply model to generate a fully rewritten file conditioned on the current file, conversation history, and current code block. The results clearly show that speculative edits provide significant speedups over traditional inference methods.
Rewriting the entire file ensures consistency and completeness, avoiding issues like partial updates or missed changes. This approach is particularly useful for large-scale edits where diff-based methods might introduce errors or inconsistencies.
Cursor's fast apply model represents a significant advancement in code editing by combining efficient planning with instantaneous application. By leveraging speculative decoding, we achieve unprecedented speeds and accuracy, making large-scale code edits more reliable and seamless. This innovation has the potential to transform how developers work, keeping them in flow and reducing frustration.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
17 May 2024
88 articles
Related Articles
Related Articles
More Stories