GPT-4's Reign Challenged by New LLMs from Google, Mistral, Anthropic, and Inflection

Models & Research

The Engineer

26 Mar 2024 · 3 min read

As GPT-4's reign nears its end, new challengers like Google's Gemini and others are vying for supremacy with unprecedented features such as massive context lengths and advanced video processing capabilities.

Four weeks ago, GPT-4 was the undisputed leader in the large language model (LLM) space. It consistently topped benchmarks and was widely regarded as the most capable default model for a wide range of tasks. However, that dominance has been shaken by the recent release of four new models from different vendors, each pushing the boundaries of what LLMs can achieve.

New Entrants to the LLM Race

Google Gemini 1.5 (Released: February 15th)
- Key Feature: One million token context length, nearly 8 times that of GPT-4 Turbo.
- Video Processing: Breaks videos into one frame per second, with each frame represented by 258 tokens. This allows for extensive video processing capabilities within its massive context window.
- Impact: While not outperforming GPT-4 in every benchmark, Gemini 1.5's unique features make it a strong contender for specific use cases.
Mistral Large (Released: February 26th)
- Openly Licensed Models: Mistral has gained a reputation for its exceptional openly licensed models, such as the 7B model that runs on mobile devices and the Mixtral-8x7B model that performs well on laptops.
- Performance: The hosted Large model is in the same performance class as GPT-4, though it may not outperform it across all metrics.
- Future Potential: Mistral's commitment to open licensing and their track record of innovation make them a vendor to watch.
Claude 3 Opus (Released: March 4th)
- Vibes: Strong positive feedback from evaluators, with many rating it as the first clear GPT-4 beater.
- Use Case: Particularly strong in code generation. For instance, complex prompts that produced broken JavaScript in GPT-4 yielded perfect working answers in Claude 3 Opus.
- System Prompt: Anthropic research engineer Amanda Askell provided a detailed breakdown of the system prompt, offering insights into its design and effectiveness.
Inflection-2.5 (Released: March 7th)
- Surprise Entry: Inflection, known for their conversation-focused chat interface Pi, has introduced a new model that benchmarks favorably against GPT-4.
- Performance: While initially seen as more gimmicky, Inflection-2.5's performance metrics have caught the attention of the LLM community.

Technical Details and Benchmarks

Each of these models brings unique technical advancements to the table:

Gemini 1.5:
- Context Length: One million tokens.
- Video Processing: Efficiently handles video by breaking it into frames, each represented by 258 tokens.
- Use Case: Ideal for applications requiring extensive context or multimedia processing.

Mistral Large:
- Open Licensing: Models like Mistral 7B and Mixtral-8x7B are available for use on a wide range of devices.
- Performance Parity: Competes with GPT-4 in terms of general capabilities, making it a viable alternative.
Claude 3 Opus:
- Code Generation: Superior performance in generating functional code, addressing a common pain point with other models.
- System Prompt Design: Detailed and well-crafted, contributing to its overall effectiveness.
Inflection-2.5:
- Conversation Focus: Built on the foundation of Pi, which emphasizes natural language interaction.
- Benchmark Performance: Competitive with GPT-4 in various metrics, indicating strong general capabilities.

Impact on Practitioners

The emergence of these new models has significant implications for practitioners:

Diverse Options: More choices mean developers can select the model best suited to their specific needs, whether it's context length, video processing, or code generation.
Innovation Drive: Competition is likely to accelerate innovation, pushing vendors to continually improve their models.
Open Licensing: Models like those from Mistral encourage community-driven development and accessibility.

Conclusion

The landscape of large