Friday, February 20, 2026

Google's "Small" Update Just Made Gemini the Smartest AI You Can Buy


DEEP DIVE:
Everything you need to know about Gemini 3.1 Pro.

 Google released Gemini 3.1 Pro today, and the ".1" is doing some serious heavy lifting. According to Artificial Analysis, an independent benchmarking firm, 3.1 Pro now sits at #1 on their overall Intelligence Index (which is like a giant benchmark of all the other major benchmarks put together), ahead of Claude Opus 4.6 and GPT-5.2.

From: https://www.theneurondaily.com/p/googles-sharpest-brain-yet?


Here's how the Big Three stack up right now:

  • Overall intelligence: Gemini 3.1 Pro (57) > Claude Opus 4.6 (53) > GPT-5.2 (51)

  • Coding: Gemini 3.1 Pro (56) > Claude Sonnet 4.6 (51) > GPT-5.2 (49)

  • Agentic tasks: Claude Opus 4.6 (68) > GPT-5.2 (60) > Gemini 3.1 Pro (59)

  • Hallucination resistance: Gemini 3.1 Pro (30) blows everyone away; the next best score is 13

  • The exact numbers here don’t matter; but the ORDER does.

Translation: Google now has the smartest and

most factually reliable model. Claude still dominates agentic work (complex multi-step tasks), and GPT-5.2 sits comfortably in between. Price-wise, Gemini 3.1 Pro costs $4.50 per million tokens, which is cheaper than GPT-5.2 ($4.80) and roughly half the price of Claude Opus 4.6 ($10).

Now here's what's actually new under the hood:

  • A "medium" thinking mode. Gemini 3 Pro only had "low" and "high." The new middle setting gives you solid reasoning without waiting minutes for an answer. On "high," the model now acts like a mini version of Deep Think, Google's advanced reasoning system.

  • Fewer hallucinations, by a lot. The model card shows meaningful improvement, and the Artificial Analysis numbers confirm it. Gemini 3.1 Pro's factual accuracy is in a league of its own right now.

  • Better coding, with a caveat. Benchmarks show 3.1 Pro leading on coding. But developers on Reddit note it's great at one-shot problem solving, but less great at extended back-and-forth sessions where Claude still has the edge.

  • AI Studio is now full-stack. AI Studio now supports servers, databases, and multiplayer apps (this is huge). Also, Google's Antigravity agent is now built in.

It's rolling out to the Gemini app, GitHub Copilot, NotebookLM, Vertex AI, Gemini CLI, and more. Harvey is apparently already testing it for legal research (as are others, we’re sure).

Want to try it yourself? AI Studio is free. Here are three things worth testing:

  • The thinking levels: Run the same hard question on low, medium, and high. Ask it to solve a tricky word problem or logic puzzle and watch the quality difference.

  • Hallucination stress test: Ask it for specific stats from a real report (e.g., "What were the key findings from Stanford's 2024 AI Index?"). See if it hedges when it should, or confidently makes things up.

  • Head-to-head: Take your most-used prompt / workflow and run it in Gemini, Claude, and ChatGPT side by side. That'll tell you more than any benchmark.

No comments:

Post a Comment