Optimize Your Prompts for New LLMs to Avoid Performance Pitfalls

Models & Research

The Engineer

15 Sept 2025 · 3 min read

Discover how adapting your prompts to new language models can unlock their full potential and avoid common performance traps discussed with real-world examples.

When a new language model (LLM) is released, it's tempting to plug in your existing prompts and expect immediate improvements. However, this often leads to disappointing results. The key to maximizing the potential of newer models lies in rewriting your prompts. This article explores why prompt optimization is crucial and provides specific reasons backed by practical examples.

Why Rewriting Prompts Matters

Prompt Format: Different LLMs are trained on different data formats, which can significantly impact performance.
Position Bias: Models may weigh the beginning or end of a prompt more heavily, affecting output quality.
Model Biases: Each model has its own set of biases that you should work with rather than against.

Reason #1: Prompt Format

One of the most straightforward differences between models is their handling of different data formats, such as Markdown vs. XML.

OpenAI Models and Markdown: Older OpenAI models were particularly adept at processing Markdown. This makes sense given the prevalence of Markdown on the internet and its simplicity in terms of token usage.
Claude 3.5 and XML: When Anthropic released Claude 3.5, they introduced an XML-based system prompt. According to Zack Witten, an Anthropic employee, this decision was driven by the fact that Claude's training data included a lot of XML content:
- "Claude was trained with a lot of XML in its training data and so it's sort of seen more of that than it's seen of other formats, so it just works a little bit better."

While OpenAI hasn't explicitly stated why they favor Markdown, their system prompts and tutorials consistently use this format. This suggests that sticking to Markdown for OpenAI models is likely to yield the best results.

Reason #2: Position Bias

Position bias refers to how different parts of a prompt are weighted by the model. Some models give more importance to the beginning of the prompt, while others prioritize the end. Understanding and leveraging this can significantly improve performance.

Example with Fine-Tuned Models: In September 2023, I noticed that a fine-tuned open-source model performed best when the most relevant examples were placed at the end of the list. This contrasts with OpenAI and Anthropic models, which generally perform better when the most relevant examples are at the beginning.

Reason #3: Model Biases

Each LLM has its own set of biases that can either enhance or hinder performance. Working with these biases is crucial for optimizing prompts.

Understanding Model Biases: For instance, if a model is biased towards certain types of input data (e.g., XML vs. Markdown), aligning your prompt format with these biases can lead to better results.
Practical Example: GPT-5 in Cursor: When GPT-5 was initially released in Cursor, users were disappointed with its performance. However, OpenAI and Cursor later identified and addressed specific issues, as detailed in the OpenAI gpt-5 cookbook. This highlights the importance of understanding and adapting to model-specific biases.

Conclusion

When a new LLM is released, don't just assume that your existing prompts will work as well. Take the time to rewrite and optimize them based on the model's specific requirements and biases. By doing so, you can avoid performance pitfalls and fully leverage the capabilities of the latest models.