Google's Gemini now the top Ai model?
- snitzoid
- 2 days ago
- 2 min read
Excerpt from the WSJ this morning. There's a new Sheriff in town. At least for the moment.
I decided to ask ClaudeAi for it's opinion on the matter. To my surprise it appears to be pretty unbiased. Unlike most of the editorial staff here at the Report.
Me: Which Ai model performs best in bench tests? Gemini, ClaudeAi or ChatGPT?
Claude: There's no single "best" model - each excels in different areas based on recent 2024-2025 benchmarks:
General Knowledge & Reasoning: ChatGPT (GPT-4o) leads slightly on MMLU tests, scoring around 88-90% versus Claude and Gemini at roughly 85-86%
Drawpie
Coding: Claude 4 currently leads many coding benchmarks as of mid-2025
TS2, with Claude 4 Opus achieving about 72.5% on SWE-Bench Drawpie. Many developers find Claude particularly strong for complex, multi-step programming tasks.
Mathematics: The models show competitive performance on mathematical benchmarks, with tools-enabled versions performing best
Drawpie
Context & Data Analysis: Gemini 1.5 Pro handles 1-2 million tokens in a single session
Drawpie, making it ideal for analyzing extremely large documents or datasets that other models can't handle.
Key Strengths:
ChatGPT: Most versatile overall, best conversational AI, strong ecosystem integration
Claude: Top coding performance, careful reasoning with fewer hallucinations, excellent long-form writing
Gemini: Massive context windows, superior multimodal capabilities, real-time data access through Google integration
The "best" choice depends on your specific use case - coding projects favor Claude, broad general tasks favor ChatGPT, and working with huge documents or needing Google integration favors Gemini.
Call it America’s next top model.
With the release of its third version this week, Google’s Gemini large language model surged past ChatGPT and other competitors to become the most capable AI chatbot, as determined by consensus industry-benchmark tests.
The results represent public validation for Google employees who, for months, have been conducting their own, personal tests of the model—asking it for jokes, trying to stump it with math problems—and coming away convinced they had something that would finally tilt the LLM field in the company’s favor.
For one of her “vibe checks,” Tulsee Doshi, Gemini’s senior director of product management, asked the model to write in Gujarati, a language that is spoken widely in India but isn’t especially prevalent on the internet. The results were far better than what she had gotten from earlier models.
“I call it signs of life, right?” she said. “People were coming back and saying, ‘I feel it, I think we’ve hit on something.’”
Comments