One of the most notable features of the model is its context window: the AI can process up to 1 million tokens in a single go– equivalent to roughly 750,000 words, or more than the entire Lord of the Rings trilogy
read more
Google on Tuesday (March 25) unveiled Gemini 2.5, a new family of artificial intelligence (AI) models that aim to bring reasoning capabilities closer to human-level cognition.
At the heart of the launch is Gemini 2.5 Pro Experimental, a multimodal model that the company describes as its most capable yet.
The new model is being made available from Tuesday through Google AI Studio, the firm’s developer platform, and via the Gemini app for users of Gemini Advanced – a premium service costing $20 per month, according to
TechCrunch.
Gemini 2.5 Pro is the latest entrant in a fast-growing race among major AI players to develop models that can not only generate text or images, but pause to reason and fact-check before responding.
It follows OpenAI’s launch of its
o1 model in September 2024, widely seen as the first to introduce AI reasoning to the mainstream.
Since then, companies including Anthropic, DeepSeek, Google, and Elon Musk’s xAI have all rolled out their own reasoning-based systems, which use additional computing power and time to solve complex problems – particularly in mathematics and coding – with greater accuracy.
Google says all of its AI models going forward will incorporate these reasoning techniques by default.
While the company has experimented with such features in earlier versions of Gemini, this latest release is being touted as a significant leap forward. “This is our most serious effort yet to challenge the frontier,” said a Google spokesperson, referring to OpenAI’s leading “o” series models.
Initial benchmarks suggest the company may have reason to be confident. On Aider Polyglot, a coding evaluation focused on code editing tasks, Gemini 2.5 Pro achieved a score of 68.6 per cent – outperforming top models from OpenAI, Anthropic, and China’s DeepSeek. However, on SWE-bench Verified, which assesses software engineering capabilities, it fell short of Anthropic’s Claude 3.7 Sonnet, scoring 63.8 per cent to its rival’s 70.3 per cent.
A cut above
Gemini 2.0, Gemini 2.5 Pro also performed well on “Humanity’s Last Exam”– a wide-ranging, crowdsourced multimodal test covering maths, humanities, and the natural sciences– where it achieved a score of 18.8 per cent, ahead of most competing models.
One of the most notable features of the model is its context window: the AI can process up to 1 million tokens in a single go– equivalent to roughly 750,000 words, or more than the entire Lord of the Rings trilogy. Google says it plans to double that capacity to 2 million tokens in the near future.
Although Gemini 2.5 Pro is released, Google has yet to disclose pricing for its API access, promising more information “in the coming weeks.”