Google Claims the Crown
Google's Gemini 2.5 Pro has reached the #1 position on the LMSYS Chatbot Arena, the most widely respected benchmark for large language model quality. The model achieved an Elo rating of 1380, edging out Anthropic's Claude Sonnet 4 (also 1380) and OpenAI's o1 (1370).
The LMSYS Arena works by having real humans compare anonymous model outputs side-by-side. With over 2 million votes cast, it's considered the gold standard for measuring real-world AI capability — more reliable than synthetic benchmarks.
Where Gemini 2.5 Pro Excels
The model shows particular strength in coding tasks, mathematical reasoning, and long-context understanding. Its 1 million token context window — the largest among top-tier models — gives it a significant advantage for document analysis.
At $1.25/$10.00 per million tokens, Gemini 2.5 Pro sits mid-range. It's significantly cheaper than OpenAI's o1 ($15/$60) while delivering comparable performance on most tasks.
The Benchmark Race Heats Up
The top of the leaderboard is incredibly competitive, with just 20 Elo points separating the top 5 models. This tight clustering means choosing between models should be driven by pricing and context window needs rather than raw benchmark scores alone.
Use our cost calculator to compare Gemini 2.5 Pro costs for your workload.