
Share
Google unveils Gemini 2.5 Flash, a hybrid reasoning model that enhances cost-efficiency and control, building on the success of its predecessor while expanding its reasoning capabilities through fine-grained adjustments.
Today, Google is rolling out an early version of Gemini 2.5 Flash in preview through the Gemini API via Google AI Studio and Vertex AI. This new iteration builds on the foundation of Gemini 2.0 Flash, offering a significant upgrade in reasoning capabilities while maintaining a focus on speed and cost efficiency.
Gemini 2.5 Flash is Google’s first fully hybrid reasoning model. This means developers can toggle the "thinking" process on or off, providing flexibility to balance quality, cost, and latency. Here are the key features:
Even with "thinking off," Gemini 2.5 Flash retains the fast speeds of its predecessor, making it an excellent choice for scenarios where speed is critical.
When "thinking" is enabled, Gemini 2.5 Flash excels in complex tasks that require multi-step reasoning. This includes:
The model performs strongly on Hard Prompts in LMArena, ranking second only to Gemini 2.5 Pro. This demonstrates its ability to handle complex reasoning tasks effectively.
Gemini 2.5 Flash continues to lead as the model with the best price-to-performance ratio. Here are some key points:

Developers can access Gemini 2.5 Flash through:
While specific benchmarks are not provided, the model’s performance on Hard Prompts in LMArena suggests it can handle complex tasks efficiently. The ability to toggle thinking on or off provides a versatile tool for optimizing different use cases.
One of the standout features of Gemini 2.5 Flash is its fine-grained controls for managing the thinking process. This includes:
To start building with Gemini 2.5 Flash, developers can follow these steps:
Gemini 2.5 Flash represents a significant step forward in hybrid reasoning models, offering developers the flexibility to balance quality, cost, and speed. Whether you're working on complex research tasks or need fast, efficient responses, this model provides a powerful tool for your AI toolkit.
Tags
Original Sources
↗ https://developers.googleblog.com/en/start-building-with-gemini-25-flash/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
21 April 2025
88 articles
Related Articles
Related Articles
More Stories